Debugging DNSSEC Validation Failures

A DNSSEC validation failure does not return a polite error message: it returns a blunt SERVFAIL, the same code a resolver emits for a dozen unrelated reasons. This guide teaches you to confirm that a SERVFAIL is genuinely a DNSSEC fault, then walk the signed chain of trust link by link with dig +dnssec, delv, and dnsviz until you find the broken signature, mismatched key, or missing delegation. By the end you can isolate the exact failing link and remediate it without disabling validation across your zone. This is the diagnostic companion to DNSSEC Operational Management.

Key objectives:

  • Distinguish a DNSSEC-induced SERVFAIL from an ordinary resolution failure using the AD flag and CD bit.
  • Read RRSIG, DNSKEY, and DS records to locate which link in the chain of trust is bogus.
  • Use delv and dnsviz to get an authoritative, human-readable verdict on the failing zone.
  • Map each common cause (expired signature, DS/DNSKEY mismatch, missing DS, algorithm gap, bogus NSEC3) to a precise fix.
DNSSEC validation failure decision tree A decision tree that starts at SERVFAIL and branches through CD-bit test, delv output, RRSIG validity, DS-DNSKEY match, and DS presence to reach a remediation. SERVFAIL on query dig +dnssec name Retry with +cd answer returns? No: not DNSSEC network / origin fault Yes: DNSSEC bogus run delv +rtrace RRSIG expired? check inception/expiry Yes: re-sign zone fix signer schedule DS = DNSKEY? no: fix DS at parent NSEC3 / alg gap? align signer + DS

Prerequisites & environment setup

You need a validating-capable toolchain on the workstation you debug from, not just whatever ships by default. Confirm versions before you trust their output, because older delv builds shipped buggy root trust anchors.

dig -v          # expect DiG 9.16+ (BIND utilities)
delv -v         # expect delv 9.16+ ; ships with bind-utils / bind9-dnsutils
dnsviz --version # expect 0.9+ ; pip install dnsviz

Install the missing pieces on Debian/Ubuntu and pull a current root trust anchor so delv validates from the real root key:

sudo apt-get install -y bind9-dnsutils python3-dnsviz
# delv reads /etc/bind/bind.keys or the built-in root anchor by default;
# refresh an explicit anchor if you maintain one:
dig . DNSKEY +short | head

Work against a resolver you control or a known validating public resolver (1.1.1.1, 8.8.8.8, 9.9.9.9 all validate). Avoid your corporate forwarder during diagnosis: a middlebox that strips DNSSEC records produces misleading evidence. If you suspect resolver caching is masking a fix, the techniques in Debugging DNS Propagation Delays Across Global Resolvers apply here too.

Step 1: Confirm the failure is actually DNSSEC

A bare query just shows SERVFAIL with no detail. The decisive test is the CD (Checking Disabled) bit: it tells the resolver to skip validation and hand you the raw answer. If the name resolves with +cd but fails without it, the chain of trust is bogus.

# Fails with validation enabled (default on a validating resolver):
dig @1.1.1.1 www.dnssec-failed.org A +dnssec

# Succeeds with validation disabled — proof it is a DNSSEC fault:
dig @1.1.1.1 www.dnssec-failed.org A +dnssec +cd

Expected contrast in the header flags: line:

;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 41122     # without +cd
;; ->>HEADER<<- opcode: QUERY, status: NOERROR,  id: 41123     # with +cd

The +cd answer returning NOERROR while the plain query returns SERVFAIL is your signal to proceed down the DNSSEC path. If both fail identically, the problem is not DNSSEC — investigate the origin or delegation instead and stop here. On a healthy signed name the plain query returns NOERROR and the flags line includes ad (Authenticated Data), which is the resolver vouching that the chain validated end to end.

Step 2: Read the signed records with dig +dnssec

Now inspect what the authoritative servers actually serve. Query an authoritative nameserver directly (use +cd so a validating resolver does not refuse to return the bogus data) and pull the signatures:

dig @ns1.example.com example.com DNSKEY +dnssec +multiline +cd
dig @ns1.example.com www.example.com A +dnssec +multiline +cd

Each signed RRset is accompanied by an RRSIG. The fields that matter most are the inception and expiration timestamps and the key tag:

example.com. 3600 IN RRSIG A 13 2 3600 (
                20260628000000 20260614000000 12345 example.com.
                m9F...signature... )

Read those two timestamps as YYYYMMDDHHMMSS in UTC: expiry 20260628, inception 20260614. If today’s date is outside that window the signature is expired or not yet valid — jump to the re-sign fix. The trailing 12345 is the key tag; it must match the tag of a DNSKEY in the zone and, for the KSK, the tag inside the parent DS record. Capture all three tags now, you will reconcile them in Step 4.

Step 3: Get an authoritative verdict with delv

dig shows you records; delv performs the validation itself and tells you why it fails. It uses the same library as BIND, so its judgment matches a real resolver. The +rtrace flag prints each chain step it walks.

delv @1.1.1.1 www.example.com A +rtrace

A healthy zone ends with a clear statement:

; fully validated
www.example.com. 300 IN A 203.0.113.10
www.example.com. 300 IN RRSIG A 13 3 300 ...

A broken zone names the failing operation. Common verdicts and what they mean:

;; validating example.com/DNSKEY: no valid signature found   -> RRSIG expired or wrong key
;; no valid RRSIG resolving 'example.com/DS/IN'              -> DS broken at parent
;; insecurity proof failed                                   -> DS exists but DNSKEY missing/mismatch
;; got insecure response; parent indicates it should be secure -> DS present, child unsigned

delv cutting off mid-chain tells you the last zone it validated successfully — the failure is in the next delegation down. That alone usually localizes the problem to one zone.

Step 4: Visualize the full chain with dnsviz

When the failure is subtle — an algorithm rollover half-finished, or one of several nameservers serving stale signatures — a text trace is hard to read. dnsviz probes every authoritative server and renders the entire chain of trust, flagging broken links in red.

dnsviz probe example.com > example.json
dnsviz grok example.json            # text summary of errors/warnings
dnsviz graph -Thtml example.json > example.html   # visual chain

dnsviz grok emits structured findings you can scan quickly:

example.com:
  errors:
    - RRSIG for DNSKEY expired (2026-06-12)
  warnings:
    - DNSKEY 12345 (KSK) is not referenced by any DS record

The “not referenced by any DS record” warning is the fingerprint of a botched key rollover where the new KSK was activated before the parent DS was updated. The graph view makes per-server divergence obvious: if three of four nameservers are green and one is red, you have an unsynchronized signer, not a key problem. Coordinating those signer and DS updates safely is the subject of Automating DNSSEC Key Rollover.

With the failing zone identified, match the symptom to one of the five recurring causes.

Expired RRSIG

The most common cause: a signer that stopped re-signing. The RRSIG expiry timestamp is in the past. Re-sign the zone and fix whatever stopped the schedule (a dead cron, a paused managed-DNS signer, a signer that lost its private key).

# BIND inline-signing: force a re-sign
rndc sign example.com
# or with dnssec-signzone:
dnssec-signzone -A -3 $(head -c 16 /dev/urandom | xxd -p) \
  -N INCREMENT -o example.com -t db.example.com

After re-signing, confirm the new expiry is in the future with the Step 2 dig command.

DS / DNSKEY mismatch after a rollover

The DS at the parent references a key tag or digest that no longer matches the live DNSKEY. Regenerate the DS from the current KSK and resubmit it. The full submission procedure lives in Submitting DS Records to Your Registrar.

# Generate the correct DS from the live KSK:
dig @ns1.example.com example.com DNSKEY +short | \
  grep " 257 " > ksk.txt
dnssec-dsfromkey -2 -f <(echo "example.com. IN DNSKEY $(cat ksk.txt)") example.com
# Compare digest+tag to what the parent serves:
dig example.com DS +short

Missing DS at the parent

The zone is signed but the parent delegation has no DS, so validators treat the zone as insecure — unless a stale DS lingers, which yields SERVFAIL. Either submit the correct DS (to go secure) or remove the stale one at the registrar (to fall back to insecure cleanly). Never leave a DS pointing at a key the zone no longer publishes.

Algorithm rollover gap

During an algorithm change (e.g. RSASHA256/8 to ECDSAP256SHA256/13) every RRset must be signed by both algorithms while both DS records are present, and the parent must list a DS for the algorithm the zone serves. A half-applied rollover where the zone signs with algorithm 13 but the parent DS covers only algorithm 8 is bogus. Align them: publish both, then retire the old one only after the parent reflects the new.

Bogus NSEC3

A malformed or expired NSEC3 denial-of-existence record makes negative answers (NXDOMAIN) fail validation even when positive answers succeed. Check by querying a deliberately nonexistent name:

delv @1.1.1.1 nonexistent-$(date +%s).example.com A +rtrace

If positive lookups validate but this one returns bogus, re-sign with consistent NSEC3 parameters (matching salt and iteration count across the whole zone) and verify the NSEC3PARAM record is present and singular.

Verification

After remediating, prove the fix from a clean validating resolver. The plain query must now return NOERROR with the ad flag set:

dig @1.1.1.1 www.example.com A +dnssec
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 5510
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0

The presence of ad in the flags line is the definitive pass. Confirm delv agrees and that dnsviz shows no errors:

delv @1.1.1.1 www.example.com A          # expect: ; fully validated
dnsviz probe example.com | dnsviz grok   # expect: no errors

Because resolvers cache bogus results for the SOA negative-TTL, allow that window to elapse (or flush) before declaring victory; a lingering SERVFAIL after a correct fix is usually just cache.

Troubleshooting

Symptom Likely cause Diagnosis Fix
SERVFAIL plain, NOERROR with +cd DNSSEC bogus (some link) delv +rtrace to name the zone Follow the named cause below
delv: “no valid signature found” Expired or future-dated RRSIG Read inception/expiry in dig +dnssec Re-sign zone; fix signer schedule
dnsviz: KSK “not referenced by any DS” DS/DNSKEY mismatch after rollover dnssec-dsfromkey vs dig DS Submit corrected DS at registrar
“insecurity proof failed” Stale DS, zone now unsigned/changed dig DS + dig DNSKEY compare Remove or correct parent DS
Positive OK, NXDOMAIN bogus Broken/expired NSEC3 delv a random nonexistent name Re-sign with consistent NSEC3 params
Bogus on one NS only Unsynchronized signer dnsviz graph per-server colors Force zone transfer / re-sign laggard

Why does the same name resolve on my laptop but SERVFAIL elsewhere? Your laptop’s resolver is probably non-validating or has the answer cached from before the breakage, while the failing resolver validates and rejects the bogus chain. Test explicitly against 1.1.1.1 and compare the ad flag.

Can I just disable DNSSEC validation to make the SERVFAIL go away? On a single resolver you can set the CD bit, but that hides the fault for everyone behind a validating resolver you do not control. Fix the chain instead; suppressing validation is a diagnostic step, not a remediation.

How long after I re-sign will the SERVFAIL clear? As soon as the new signatures propagate to all authoritative servers and the bogus result ages out of resolver caches — bounded by the SOA negative-caching TTL, commonly 300 to 3600 seconds.

delv says “fully validated” but users still get SERVFAIL — why? You are likely validating against a different (already-fixed) server or a cached good answer. Query the specific resolver your users use, and probe every authoritative nameserver with dnsviz to catch one stale server.

Is a missing AD flag always a DNSSEC failure? No. The ad flag is only set when the resolver itself validated. An unsigned zone, a non-validating resolver, or a forwarder that strips DNSSEC all yield no ad without any failure. Pair the flag check with delv for certainty.

Back to DNSSEC Operational Management