Migrating DNS Zones Without Downtime Using Zone Transfers

A zero-downtime authoritative server migration hinges on one principle: the new infrastructure must serve byte-identical answers before any client is told to look there. You achieve that by replicating the zone over a secure AXFR/IXFR channel, proving record parity with a strict diff, and only then flipping NS delegation at the registrar. This guide walks through the full procedure — TTL pre-warming, TSIG-authenticated transfers, serial synchronization, cutover, and rollback — using real BIND9 and PowerDNS configuration. Done correctly, recursive resolvers never observe a gap, and edge routing stays continuous throughout the window. The same discipline underpins broader DNS Zone Management and the wider DNS Fundamentals & Advanced Record Configuration practice.

Key objectives:

  • Pre-warm caches by lowering SOA minimum and all record TTLs to 300s at least 48 hours ahead of cutover.
  • Stand up an authenticated AXFR/IXFR replication channel using TSIG HMAC-SHA256, scoped to the target server IPs only.
  • Prove full zone parity with dig AXFR plus a strict diff, and confirm matching SOA serials before touching delegation.
  • Execute the NS cutover with a tested rollback path so a bad answer is reversible in minutes, not hours.
Zero-downtime zone migration sequence Four phases: lower TTL, replicate via TSIG-secured AXFR/IXFR, verify parity and serials, then cut over NS delegation with rollback. Phase 1 TTL → 300s Phase 2 TSIG AXFR/IXFR Phase 3 Parity + serials Phase 4 NS cutover Legacy primary SOA serial N New primary SOA serial N NOTIFY + IXFR TSIG hmac-sha256 diff legacy.txt new.txt → ZONES MATCH strict parity gate before delegation change Update NS at registrar resolvers adopt within 300s Rollback path revert NS, legacy still authoritative

Prerequisites and environment setup

Before you begin, confirm administrative access to both authoritative platforms and to the registrar where the parent zone delegates your domain. You will need:

  • Shell access to the legacy primary (the current source of truth) and the new primary, with rndc or the provider API available to force transfers.
  • BIND 9.16+ or PowerDNS Authoritative 4.5+ on the target, both of which support IXFR (RFC 1995) and TSIG (RFC 8945).
  • dig (from bind-utils/dnsutils) on a workstation outside both networks so you query as a real resolver would.
  • Synchronized clocks via chrony or ntpd on every authoritative node — TSIG signatures carry a timestamp and reject on skew beyond 300 seconds.
  • A maintenance window with a documented rollback, plus the registrar’s NS-edit interface open and tested.

A short note on TTLs: the speed of your cutover is bounded by the largest cached TTL on any record that points clients at infrastructure. Plan the reduction deliberately, the same way you would when picking best TTL values for high-traffic SaaS platforms — low enough for a fast switch, but not so low that you flood your resolvers in the days leading up to it.

Step 1: Reduce TTLs and capture a baseline

Lower the SOA minimum (the negative-cache TTL) and every record-level TTL to 300 seconds on the legacy primary, then bump the serial and let it propagate. Capturing the zone first gives you a reference snapshot to diff against later.

# Snapshot the live zone from the current authoritative server
dig @current-ns.example.com example.com AXFR +noall +answer +multiline > legacy_zone.txt
wc -l legacy_zone.txt

Expected output: a non-empty file containing one line per record. If you instead see ; Transfer failed., the legacy server is not yet permitting transfers from your client IP — add it to allow-transfer temporarily or capture record-by-record.

Verify the reduced TTL has actually reached public resolvers before you rely on it:

dig @8.8.8.8 example.com SOA +noall +answer

Expected output — the leading number in the answer is the TTL and should read 300:

example.com.   300   IN   SOA   ns1.example.com. hostmaster.example.com. 2026062001 7200 3600 1209600 300

If the TTL still shows the old value, the previous TTL has not expired in that resolver’s cache. Wait for the full original TTL to elapse before continuing — do not proceed on a partially propagated reduction.

Step 2: Configure secure AXFR/IXFR with TSIG

Authenticated transfers stop anyone from exfiltrating your full zone and stop a spoofed secondary from poisoning replication. Generate an HMAC-SHA256 key and share the identical secret with both ends.

tsig-keygen -a hmac-sha256 migration-key

Expected output is a ready-to-paste key block:

key "migration-key" {
	algorithm hmac-sha256;
	secret "y8Qm2Zr9k3Jt0Lf5wXc1aN6bV4dHs7e==";
};

On the BIND9 legacy primary, define the key, scope transfers to the new servers, and enable immediate NOTIFY so the secondary pulls an IXFR the moment a serial changes rather than waiting out the refresh interval:

key "migration-key" {
    algorithm hmac-sha256;
    secret "y8Qm2Zr9k3Jt0Lf5wXc1aN6bV4dHs7e==";
};

acl "new-dns-servers" {
    198.51.100.10;
    203.0.113.20;
};

zone "example.com" {
    type master;
    file "/etc/bind/zones/example.com.zone";
    allow-transfer { key "migration-key"; new-dns-servers; };
    notify yes;
    also-notify { 198.51.100.10; 203.0.113.20; };
};

Reload and confirm the config parses cleanly:

named-checkconf && rndc reconfig && echo "config OK"

Expected output: config OK. A non-zero exit from named-checkconf prints the offending line and file — fix it before reloading.

If your target runs PowerDNS instead, the equivalent primary-side settings live in pdns.conf, with the TSIG key stored in the database backend rather than a flat file:

master=yes
allow-axfr-ips=198.51.100.10,203.0.113.20
also-notify=198.51.100.10,203.0.113.20
pdnsutil import-tsig-key migration-key hmac-sha256 y8Qm2Zr9k3Jt0Lf5wXc1aN6bV4dHs7e==
pdnsutil set-meta example.com AXFR-MASTER-TSIG migration-key

Expected output: Imported TSIG key migration-key followed by Set 'example.com' meta 'AXFR-MASTER-TSIG' = 'migration-key'.

Step 3: Synchronize serials and prove parity

Force an immediate transfer to the new primary so it holds the current serial, then compare both zones byte-for-byte. This parity gate is the single most important checkpoint in the migration.

# On the new BIND primary, pull the latest from the legacy source
rndc retransfer example.com

Expected output: no output on success. Tail the log to confirm the transfer completed:

grep "transfer of 'example.com" /var/log/syslog | tail -2

Expected output includes transfer of 'example.com/IN' from 192.0.2.1#53: Transfer status: success.

Now run the strict diff across both authoritative servers:

dig @legacy-ns.example.com example.com AXFR +noall +answer +multiline | sort > legacy.txt
dig @new-ns.example.com   example.com AXFR +noall +answer +multiline | sort > new.txt
diff -u legacy.txt new.txt && echo "ZONES MATCH: Proceed with cutover"

Expected output: ZONES MATCH: Proceed with cutover. Any diff hunk means record drift — resolve it before going further. Sorting normalizes record ordering so only genuine content differences surface. Note that the SOA line itself may legitimately differ if serials are mid-update; confirm both report the same serial explicitly:

for ns in legacy-ns new-ns; do dig @$ns.example.com example.com SOA +short | awk '{print $3}'; done

Expected output: the same serial printed twice (for example 2026062001). Matching serials on both sides confirm the secondary is fully caught up.

Step 4: Cut over NS delegation and verify

With parity proven, update the NS records at the TLD registrar to point at the new authoritative servers. Because TTLs are at 300s, resolvers adopt the change within five minutes. Verify delegation from the parent down and confirm the application still answers:

dig example.com NS +trace | grep -A4 "example.com."
dig @1.1.1.1 example.com NS +short && curl -I https://example.com

Expected output: the NS set returned by the parent now lists ns1.new.example.com. and ns2.new.example.com., and curl returns HTTP/2 200. Finally, confirm the new primary is authoritative and current:

dig @new-ns-ip example.com SOA +short

Expected output: the SOA line with the serial you validated in Step 3. If anything looks wrong, revert the NS records at the registrar immediately — the legacy servers remain fully authoritative and continue serving until delegation propagates back. Keep the legacy zone live for at least 72 hours after cutover so that any long-cached delegations drain naturally.

Verification checklist

Run these after cutover and again the next morning to catch slow resolvers:

# Delegation points only at new servers, from several vantage points
for r in 8.8.8.8 1.1.1.1 9.9.9.9; do echo "== $r =="; dig @$r example.com NS +short; done

# Application reachable and not serving stale upstream cache
curl -I -H 'Cache-Control: no-cache' https://example.com

Expected output: every resolver returns the new NS set, and curl returns a 200/301 with no 502/504. Mismatched NS sets across resolvers mean propagation is still in flight — recheck after the 300s TTL window.

Troubleshooting

SOA serial out of sequence Symptom: IXFR fails and falls back to a full AXFR, or the secondary rejects updates entirely. Diagnosis: Serial numbers decreased, or crossed the RFC 1982 comparison window (wrap-around near 2³²−1). Compare dig SOA +short serials on both ends. Fix: Enforce YYYYMMDDNN formatting. If a wrap-around already happened, disable IXFR temporarily and force a clean AXFR with rndc retransfer example.com or the provider API, then resume incremental transfers.

TSIG key mismatch or clock skew Symptom: Transfers are silently dropped; logs show REFUSED, BADKEY, or BADTIME. Diagnosis: NTP drift exceeds the 300-second TSIG timestamp window, or the secret/algorithm strings differ between primary and secondary. Check chronyc tracking and grep logs for TSIG error. Fix: Re-sync clocks across all nodes, then cross-check the secret and algorithm values byte-for-byte on both ends. A trailing whitespace difference in the secret is a common culprit.

CDN/edge cache retention after cutover Symptom: Resolvers and CDN edge nodes serve stale origin IPs, producing intermittent 502/504 errors. Diagnosis: The CDN cached the legacy origin IP before the TTL expired and is still connecting there. Fix: Purge edge nodes via the CDN API after cutover and validate with curl -I -H 'Cache-Control: no-cache' https://example.com. Confirm origin DNS inside the CDN reflects the new address.

Transfer refused despite correct key Symptom: dig AXFR returns ; Transfer failed. even with the TSIG key supplied. Diagnosis: The client IP is not in allow-transfer, or the zone is type slave on the host you queried. Fix: Add the querying IP to the ACL, confirm you are hitting the primary, and retry with dig -y hmac-sha256:migration-key:SECRET @primary example.com AXFR.

DNSSEC validation breaks at cutover Symptom: Validating resolvers return SERVFAIL after delegation moves. Diagnosis: The new primary is not signing with keys that match the DS record published at the registrar. Fix: Keep both endpoints signing through the window and only rotate keys once the new DS is live. Treat key continuity the way you would when automating DNSSEC key rollover — never let signatures and published DS records diverge.

Frequently Asked Questions

How long should I wait after reducing TTL before initiating the zone transfer? Wait at least 48 hours or 2× the original TTL, whichever is longer. That guarantees recursive resolvers have expired their cached records and adopted the new 300s value, so the eventual NS change propagates within minutes rather than hours.

Can I use IXFR instead of AXFR for large enterprise zones? Yes. IXFR transfers only the changed records, which drastically cuts bandwidth and sync time on large zones. Both primary and secondary must support RFC 1995 and maintain strictly increasing SOA serials; otherwise the server silently falls back to a full AXFR.

What diagnostic command confirms the new authoritative server is live? Run dig @new-ns-ip example.com SOA +short and confirm the serial matches the primary, then dig example.com NS +trace to confirm the TLD delegation now points at the new servers from the parent zone down.

How do I handle DNSSEC during a zone transfer migration? Export the KSK/ZSK so the new primary can sign the zone, and keep both endpoints validating through the window. Only decommission the old keys after the corresponding DS records are published at the registrar, mirroring the safety margin used in routine key rollover work.

What is the fastest safe rollback if the cutover misbehaves? Revert the NS records at the registrar to the legacy servers. Because the legacy zone never stopped serving authoritatively and TTLs are at 300s, resolvers fall back within minutes with no data loss.

Back to DNS Zone Management