Automating DNSSEC Key Rollover

DNSSEC keys must be rotated periodically, but a rollover done at the wrong moment breaks validation for every resolver that cached your old keys or signatures. This guide shows you how to automate Zone Signing Key (ZSK) and Key Signing Key (KSK) rollovers so they happen on a schedule, respect every relevant TTL, and never leave a resolver unable to build a chain of trust. By the end you will have a BIND dnssec-policy that drives rollovers automatically, equivalent managed setups on Route 53 and Cloudflare, and a verification job that catches mismatches before they reach users.

Key objectives:

  • Understand pre-publish ZSK rollover and double-DS KSK rollover per RFC 6781, and the TTL math that governs each phase.
  • Configure BIND dnssec-policy to stage, publish, swap, and retire keys with zero manual zone re-signing.
  • Automate the rollover lifecycle with a scheduled job and continuous DNSKEY/DS monitoring.
  • Verify success with dig, delv, and a reusable script, and recover from the three classic failure modes.
ZSK pre-publish rollover timeline A horizontal timeline with three phases: publish the new ZSK in the DNSKEY set, wait one DNSKEY TTL, swap signing to the new key, wait one max-zone-TTL, then retire the old ZSK. ZSK pre-publish rollover T0 publish T1 swap signing T2 retire wait >= DNSKEY TTL wait >= max zone TTL DNSKEY set old ZSK + new ZSK old still signs DNSKEY set old ZSK + new ZSK new now signs DNSKEY set new ZSK only old removed Never remove a key before resolvers can no longer hold a signature made by it. KSK uses double-DS instead: publish new DS, wait DS TTL, swap KSK, then retire.

Why rollover timing is the whole problem

DNSSEC works because every RRset in a signed zone carries an RRSIG produced by the ZSK, and the DNSKEY RRset itself carries an RRSIG from the KSK. The KSK’s public key is committed to your parent zone through a DS record at the registrar. A validating resolver walks DS -> DNSKEY (KSK) -> DNSKEY (ZSK) -> RRSIG over the answer. Break any link and the resolver returns SERVFAIL.

Resolvers cache aggressively. If you delete a key while a resolver still holds a cached RRSIG made by that key, the resolver has a signature it cannot verify against any published DNSKEY. That is a self-inflicted outage that lasts up to a full TTL. Every safe rollover therefore overlaps old and new material for at least one cache lifetime. The two patterns from RFC 6781 encode exactly that overlap.

Pre-publish ZSK rollover

ZSK rollover does not touch the parent zone, so it is the rollover you run most often. The pre-publish method introduces the new ZSK into the DNSKEY RRset before it ever signs anything:

  1. Publish the new ZSK alongside the old one. Both appear in DNSKEY; the old key still produces all RRSIGs.
  2. Wait at least one DNSKEY TTL so every resolver has both keys cached.
  3. Swap signing: re-sign the zone with the new ZSK. Both keys remain published.
  4. Wait at least one maximum zone TTL (the largest TTL of any signed record) so old RRSIGs expire from caches.
  5. Retire the old ZSK and remove it from DNSKEY.

The wait in step 2 guarantees that by the time new signatures appear, resolvers already trust the new key. The wait in step 4 guarantees that by the time the old key disappears, no resolver still holds a signature that needs it.

Double-DS KSK rollover

KSK rollover is harder because the trust anchor lives at your registrar, and DS propagation is outside your nameserver’s control. The double-DS (double-signature) method publishes the new trust anchor first:

  1. Generate the new KSK and add its DS record at the registrar while the old DS stays in place. The parent now lists two DS records. See Submitting DS Records to Your Registrar for how to compute and upload the digest.
  2. Wait at least one parent DS TTL so resolvers cache both DS records.
  3. Publish the new KSK in DNSKEY and have it sign the DNSKEY RRset (alongside the old KSK’s signature).
  4. Wait one DNSKEY TTL, then remove the old KSK and its signature.
  5. Remove the old DS from the registrar.

Because both DS records are valid throughout the swap, a resolver that cached either one can still build the chain. This is the inverse ordering of the ZSK rollover: for the KSK you lead with the parent-side change.

Prerequisites and environment

This guide uses BIND 9.16+ for the self-managed path, where dnssec-policy replaced the older dnssec-keymgr and manual dnssec-signzone workflow. Confirm your versions:

named -v          # expect BIND 9.16.x or newer; 9.18+ recommended
dig -v            # dig 9.16+ ships delv
delv -v

You need write access to the zone directory and key directory, a running named with inline signing capability, and registrar API or console access to update DS records. If your zone is small and your TTLs are short, set the zone’s max TTL deliberately low before a KSK rollover so the wait phases are predictable. Lowering record TTLs ahead of a sensitive change is the same discipline you apply when you tune TTLs before a cutover.

Step 1: Define a dnssec-policy in BIND

dnssec-policy lets named perform both rollover types automatically. You declare key lifetimes and the relevant TTLs once, and BIND schedules publish, swap, and retire events itself.

// named.conf
dnssec-policy "auto-rollover" {
    // Rotate the ZSK every 30 days, KSK every year.
    keys {
        ksk key-directory lifetime P1Y algorithm ecdsap256sha256;
        zsk key-directory lifetime P30D algorithm ecdsap256sha256;
    };

    // TTL applied to DNSKEY, CDS and CDNSKEY records.
    dnskey-ttl PT1H;

    // Largest TTL of any record in the zone. BIND uses this to size
    // the post-swap wait so cached RRSIGs expire before key retirement.
    max-zone-ttl PT24H;

    // How long after publication BIND assumes a DS has propagated to
    // the parent. Used for double-DS KSK rollovers.
    parent-ds-ttl PT1H;
    parent-propagation-delay PT1H;

    // Resigning and signature validity windows.
    signatures-validity PT14D;
    signatures-refresh PT5D;
    publish-safety PT1H;
    retire-safety PT1H;
};

zone "example.com" {
    type primary;
    file "/var/named/zones/example.com.signed";
    inline-signing yes;
    dnssec-policy "auto-rollover";
};

Apply it and let BIND generate the initial keys and sign the zone:

sudo rndc reconfig
sudo rndc loadkeys example.com
sudo rndc dnssec -status example.com

Expected output from dnssec -status shows each key with its role and the next scheduled transition, for example key: 12345 (ECDSAP256SHA256), ZSK ... next event in 30 days. From here BIND publishes the next ZSK, waits dnskey-ttl, swaps signing, waits max-zone-ttl, and retires the old key with no further commands. The publish-safety and retire-safety margins add slack on top of the raw TTLs.

Step 2: Let BIND drive the ZSK rollover

You do not script the ZSK rollover phases by hand. When the ZSK lifetime expires, BIND moves the key through states automatically. You can watch the transition:

sudo rndc dnssec -status example.com
dnssec-policy: auto-rollover
current time: Sat Jun 20 12:00:00 2026

key: 12345 (ECDSAP256SHA256), ZSK
  published:     yes - since ...
  zone signing:  no  - since ...   <- retiring
  next event:    remove in 23h

key: 67890 (ECDSAP256SHA256), ZSK
  published:     yes - since ...
  zone signing:  yes - since ...   <- now active
  next event:    none

The retiring key stays published until BIND’s countdown (max-zone-ttl plus retire-safety) elapses, exactly the overlap RFC 6781 requires.

Step 3: Automate the KSK / DS hand-off

The one part BIND cannot finish alone is pushing the new DS to your registrar. Use CDS/CDNSKEY automation: BIND publishes CDS and CDNSKEY records signaling the desired parent state, and a scheduled job reads them and calls your registrar API. Below is a cron-driven job that extracts the published CDS and submits it.

#!/usr/bin/env bash
# /usr/local/bin/sync-cds.sh  — push BIND-published CDS to the registrar
set -euo pipefail
ZONE="example.com"
RESOLVER="@127.0.0.1"

# Read the CDS BIND wants the parent to publish.
cds=$(dig +short "$RESOLVER" CDS "$ZONE" | sort)
if [[ -z "$cds" ]]; then
  echo "no CDS published; nothing to do"; exit 0
fi

# Compare against the DS currently at the parent.
parent_ds=$(dig +short DS "$ZONE" @a.gtld-servers.net | sort || true)

if [[ "$cds" == "$parent_ds" ]]; then
  echo "parent DS already matches CDS"; exit 0
fi

echo "submitting new DS set to registrar..."
# Replace with your registrar's API call. Example shape:
#   registrar-cli ds-update --zone "$ZONE" --ds "$cds"
registrar-cli ds-update --zone "$ZONE" --ds "$cds"
echo "DS update submitted; BIND will retire the old KSK after parent-ds-ttl"

Schedule it hourly so the parent converges shortly after BIND signals a change:

# crontab -e
17 * * * * /usr/local/bin/sync-cds.sh >> /var/log/dnssec-cds.log 2>&1

When named later sees the new DS live at the parent (it polls because parent-propagation-delay and parent-ds-ttl are set), it completes the double-DS swap and retires the old KSK. Once the rollover finishes, BIND withdraws the old CDS, and the next run of the job removes the stale DS from the registrar.

Managed alternatives: Route 53 and Cloudflare

If you do not run your own authoritative servers, the provider performs the overlap for you.

Concern BIND dnssec-policy Route 53 Cloudflare
ZSK rollover automatic, you set lifetime automatic, AWS-managed ZSK automatic, no key exposed
KSK rollover automatic with CDS sync manual KSK rotate via API/console automatic; one-click DNSSEC
DS to parent CDS + your job you submit DS once, AWS signals updates provided digest, you submit once
TTL control full full limited on free plans

On Route 53 you create a KSK and EnableHostedZoneDNSSEC; AWS rotates the ZSK transparently and rolls a KSK on demand:

aws route53 create-key-signing-key \
  --hosted-zone-id Z123EXAMPLE \
  --key-management-service-arn arn:aws:kms:us-east-1:111122223333:key/abcd \
  --name new-ksk-2026 --status ACTIVE
aws route53 enable-hosted-zone-dnssec --hosted-zone-id Z123EXAMPLE

Cloudflare’s DNSSEC is enabled in the dashboard or via API; it manages keys entirely and gives you a DS digest to submit once at your registrar. In both managed cases you still own the registrar DS step, so the submitting DS records workflow remains your responsibility.

Verification

After any rollover phase, confirm the DNSKEY set, the active signatures, and the chain of trust. A scriptable check belongs in the same cron schedule as the rollover job.

#!/usr/bin/env bash
# verify-rollover.sh — confirm DNSKEY/RRSIG/DS consistency
set -euo pipefail
ZONE="example.com"
NS="@ns1.example.com"

echo "== Published DNSKEYs (keytags) =="
dig +short "$NS" DNSKEY "$ZONE" | while read -r flags proto alg key; do
  printf '  flags=%s alg=%s\n' "$flags" "$alg"
done

echo "== Key tags signing the SOA RRSIG =="
dig +dnssec +short "$NS" SOA "$ZONE" | grep -i RRSIG || true

echo "== Full chain validation via delv =="
delv +rtrace "$ZONE" SOA

echo "== Parent DS vs zone CDS =="
diff <(dig +short DS "$ZONE" | sort) \
     <(dig +short "$NS" CDS "$ZONE" | sort) \
  && echo "  DS and CDS agree" || echo "  MISMATCH: DS != CDS"

A healthy delv run ends with ; fully validated:

delv example.com SOA
; fully validated
example.com.  86400  IN  SOA  ns1.example.com. hostmaster.example.com. 2026062001 ...
example.com.  86400  IN  RRSIG  SOA 13 2 86400 ...

If delv prints ; unsigned answer or resolution failed: SERVFAIL, the chain is broken — diagnose it with the techniques in debugging DNSSEC validation failures. To confirm a specific key is present during the overlap window, match key tags between the DNSKEY set and the RRSIG keytag field:

dig +multi DNSKEY example.com | grep -i 'key id'
dig +dnssec example.com SOA | grep -i RRSIG

The keytag in each RRSIG must correspond to a key still present in DNSKEY. That single invariant is what every safe rollover preserves.

Troubleshooting

Premature key removal

Symptom: SERVFAIL from validating resolvers shortly after a rollover, while dig +cd (checking disabled) still returns the answer. Cause: the old key was removed before its RRSIGs expired from caches. Diagnosis:

dig +dnssec example.com SOA @1.1.1.1     # SERVFAIL
dig +cd +dnssec example.com SOA @1.1.1.1 # returns data -> validation-only failure

Fix: re-publish the removed key in DNSKEY immediately so cached signatures validate again, then wait a full max-zone-ttl before retiring it. With dnssec-policy, increase retire-safety so BIND holds keys longer. The root cause is almost always a max-zone-ttl set lower than the real maximum record TTL in the zone — audit with dig across your largest records.

DS / DNSKEY mismatch

Symptom: validation fails at the delegation; delv reports no valid DS or broken trust chain. Cause: the DS at the parent points to a KSK that is not (or no longer) in the DNSKEY set, typically because the registrar update lagged or removed the wrong digest. Diagnosis:

dig +short DS example.com @a.gtld-servers.net
dig +short DNSKEY example.com | grep ' 257 '   # KSK has flags 257

Compute the digest of the published KSK and compare it to the parent DS. Fix: submit the correct DS for the currently-active KSK and keep both old and new DS published until the swap completes. Never delete the old DS until the new KSK is live in DNSKEY and has had one DNSKEY TTL to propagate.

TTL not respected

Symptom: a rollover that “looked done” still produces intermittent SERVFAIL for hours. Cause: a resolver or middlebox is serving stale answers past their TTL, or your dnskey-ttl was raised after keys were already cached at the old value. Diagnosis: query several public resolvers and compare the DNSKEY TTLs they report:

for r in 1.1.1.1 8.8.8.8 9.9.9.9; do
  echo "resolver $r"; dig +noall +answer DNSKEY example.com @"$r"
done

Fix: wait out the longest observed TTL before proceeding to the next phase, and lower dnskey-ttl well in advance of planned rollovers so future overlap windows are short. If a single resolver is the outlier, it is usually caching independently of TTL and will recover on its own; the rest of the internet is already consistent. For a deeper resolver-by-resolver propagation method, the same comparison approach used for ordinary records applies here too.

Frequently Asked Questions

How often should I roll my ZSK and KSK? A common policy is a ZSK every 1-3 months and a KSK every 1-2 years, which is what the lifetime P30D and lifetime P1Y values in the policy encode. With ECDSA keys and automated dnssec-policy rollovers there is little cost to rolling the ZSK monthly, and frequent practice keeps the automation exercised so it never rusts.

Can I roll the KSK and ZSK at the same time? Avoid it. Each rollover has its own overlap timing and the KSK additionally depends on registrar propagation you do not control. Running both at once makes a SERVFAIL far harder to attribute. Stagger them so only one chain link is ever in motion, and let BIND’s scheduler space the lifetimes apart.

Do I have to touch the registrar for a ZSK rollover? No. The DS record commits only the KSK, so ZSK rollovers are entirely internal to your zone and nameservers. Only KSK rollovers require a parent-side DS change, which is why automating the CDS hand-off matters most for the KSK lifecycle.

What happens if my automation job fails mid-rollover? Because every phase overlaps old and new material, a stalled job leaves you in a safe, validating state — the old key keeps working. The danger is only if a job removes old material early. Build your monitoring to alert when CDS and parent DS disagree for longer than parent-ds-ttl, and never script unconditional key deletion.

How do I test a rollover without risking production? Run the full dnssec-policy against a staging zone delegated under a test domain, point delv at your staging nameserver, and watch rndc dnssec -status advance through the phases with short TTLs (for example dnskey-ttl PT5M). Once the timeline behaves, raise the TTLs to production values and apply the same policy to the live zone.

Back to DNSSEC Operational Management