DNS Fundamentals & Advanced Record Configuration

An authoritative, production-grade reference for the DNS layer that fronts every modern application: resolution architecture, record taxonomy, TTL economics, apex aliasing, zone administration, DNSSEC, and the DNS-to-edge handoff that drives geo-routing and failover. This guide is the map; each section links to a focused guide where the procedures, commands, and failure modes are worked end to end.

  • How recursive and authoritative resolution actually traverse the root, TLD, and zone delegation chain, and where caching and query minimization change observed behavior.
  • The full record taxonomy (A, AAAA, CNAME, ALIAS, NS, SOA, MX, SRV, TXT, CAA, DS, DNSKEY) with RFC citations and the syntax constraints that make a zone valid.
  • TTL strategy as an engineering trade-off between failover speed, authoritative query volume, and resolver cache thrashing.
  • Apex aliasing via CNAME flattening / ALIAS / ANAME, and why a raw CNAME at @ breaks a zone.
  • Zone synchronization (AXFR/IXFR + TSIG), GitOps zone management, and DNSSEC operational hardening (key rollover, DS submission, validation debugging).
  • The DNS-to-edge contract: how records feed geo-routing, weighted load balancing, health-checked failover, and CDN origin shielding.

How DNS resolution and record management fit together

The diagram below shows the end-to-end path of a single query and where each topic in this guide intervenes — from stub resolver through recursive caching, authoritative answer, DNSSEC validation, and finally the edge/CDN layer that the returned record points at.

DNS resolution and record management data flow A client query passes through a recursive resolver that walks root, TLD, and authoritative servers; the authoritative answer is DNSSEC-validated and TTL-cached, then the returned record steers traffic to the edge or CDN origin. Stub / client Recursive resolver TTL cache + ECS Root TLD delegation Authori- tative zone data DNSSEC RRSIG / DS Validated answer + TTL GeoDNS / weighted RRset Health-check failover pool CDN edge origin shield Origin backend Returned A/AAAA/ALIAS steers the connection; lower TTL → faster failover, higher query volume

Read the diagram left-to-right, top row first: a query leaves the client, the recursive resolver either answers from its TTL cache or walks the delegation chain to the authoritative nameservers, the answer is DNSSEC-validated, and the cached A/AAAA/ALIAS record then determines which edge or origin the client actually connects to. Every box maps to a guide below. The deeper you push routing logic toward the Edge Routing & Serverless Function Architecture layer, the more your DNS records become coarse pointers to anycast PoPs rather than fine-grained traffic steering — a shift that changes how you reason about TTLs and propagation.

DNS architecture & the resolution lifecycle

Recursive resolvers accept client queries and traverse the root, TLD, and authoritative nameserver chains on behalf of the stub resolver, then cache each answer for the duration of its TTL. Authoritative servers respond strictly with zone data or delegation referrals — they never recurse. EDNS0 (RFC 6891) extends the classic 512-byte UDP limit, advertising larger payload sizes (commonly 1232 bytes today to avoid IP fragmentation, historically up to 4096) and carrying option codes such as EDNS Client Subnet (ECS, RFC 7871), which forwards a truncated client prefix so geo-aware authoritative servers can tailor the answer to the user’s network rather than the resolver’s.

Two behaviors materially change what you observe in the field. First, query name minimization (RFC 9156): modern resolvers send only the label needed at each delegation step instead of the full QNAME, which improves privacy but can surface bugs in authoritative servers that mishandle NS/empty-non-terminal queries. Second, negative caching (RFC 2308): an NXDOMAIN or NODATA answer is cached for the lesser of the SOA Minimum field and the SOA record’s own TTL, so a typo that returns NXDOMAIN can persist long after you fix the record. Encrypted transports — DNS-over-HTTPS (DoH, RFC 8484) and DNS-over-TLS (DoT, RFC 7858) — change the metadata exposure but not the resolution logic; they sit between stub and recursive resolver.

Caching mechanics dictate response latency and directly influence edge deployment behavior. For the cache-expiry hierarchy, ISP forwarder clamping, and how to measure real propagation, see Propagation & Caching Basics and its companion walkthrough on debugging propagation delays across global resolvers.

Platform implementation notes:

  • Cloudflare: Operates an anycast recursive network (1.1.1.1) and an anycast authoritative network; the authoritative side supports ECS for proxied geo-steering and serves QNAME-minimized recursion.
  • AWS Route 53 Resolver: Integrates with VPC routing, supports inbound/outbound endpoints for hybrid forwarding to on-prem resolvers, and applies a separate resolver cache distinct from Route 53 authoritative.
  • Self-hosted (BIND/Unbound/PowerDNS): You control max-cache-ttl, qname-minimisation, ECS forwarding, and prefetch behavior explicitly — and you own the consequences of getting them wrong.

Annotated diagnostic block:

# Walk the full delegation chain from the root, showing each referral.
dig +trace example.com @8.8.8.8

# Confirm which nameservers are authoritative and compare serials between them.
dig +nssearch example.com

# Inspect EDNS negotiation and the advertised UDP payload size.
dig +edns=0 example.com @1.1.1.1

# Query the authoritative server directly to bypass all resolver caching
# (gives you the canonical answer + TTL the zone is publishing right now).
dig @ns1.example.com example.com A +norecurse

# On a Linux host using systemd-resolved, dump per-link resolver state.
resolvectl status

The +norecurse query against the authoritative server is the single most useful diagnostic: it separates “the zone is wrong” from “a resolver cached the old answer.”

Record taxonomy & syntax validation

Production DNS lives inside hard constraints from RFC 1035 and RFC 2181: each label is at most 63 octets, a full domain name at most 255 octets on the wire (≈253 in presentation form), and an RRset’s records must share owner, class, type, and TTL (RFC 2181 §5.2 — mismatched TTLs in a set are an error resolvers will silently normalize). A CNAME cannot coexist with any other data at the same owner name (RFC 1034 §3.6.2). These rules are not pedantry; violating them produces SERVFAIL, truncation, or silently dropped records.

The table below is the working taxonomy. For the full per-type treatment — including CAA, TXT formatting rules, and the IPv6 dual-stack migration path — see Understanding DNS Record Types and the focused guide on configuring A vs AAAA records for IPv6 migration.

Type RFC Purpose Apex-safe? Key constraint
A 1035 IPv4 address Yes 32-bit address; multiple = round-robin
AAAA 3596 IPv6 address Yes 128-bit; deploy alongside A for dual-stack
CNAME 1034/2181 Canonical alias No Cannot coexist with other data at the label
ALIAS/ANAME (vendor) Apex alias Yes Resolved server-side to A/AAAA
NS 1035 Delegation Required at apex Must match parent-side glue
SOA 1035 Zone authority Required at apex Serial drives AXFR/IXFR; Minimum = negative TTL
MX 1035/7505 Mail routing Yes Priority field; null MX . opts out
SRV 2782 Service location n/a _service._proto; priority/weight/port/target
TXT 1035/1464 Verification/policy Yes SPF, DKIM, DMARC, domain proofs
CAA 8659 CA authorization Yes Restricts which CAs may issue certs
DS / DNSKEY / RRSIG 4034 DNSSEC chain Apex DS at parent, DNSKEY/RRSIG in child

Multi-value record sets give you basic round-robin distribution but no health awareness — a dead origin in an A set keeps receiving roughly its share of traffic until you remove it. Real traffic steering belongs to health-checked routing policies and the edge layer, covered later in this guide.

Validation commands:

# Reject malformed zones before reload; catches dangling CNAMEs and bad SOA.
named-checkzone example.com /var/named/example.com.zone

# Lint a record set declaratively before pushing with a GitOps tool.
octodns-validate --config-file=config/octodns.yaml

# Verify a specific RRset's owner/type/TTL coherence as published.
dig example.com ANY +noall +answer

TTL strategy & cache control

Time-to-Live is the throttle between failover responsiveness and authoritative query volume, and it is the one DNS knob with the most operational consequence. A record’s explicit TTL governs how long a positive answer is cached; the SOA Minimum field (the fifth SOA timer) governs negative-answer caching per RFC 2308. Set TTLs too high and a cutover takes hours to clear globally; set them too low and you amplify query load, increase cost on per-query billing, and make every authoritative outage immediately user-visible because nothing is cached to ride through it.

The disciplined workflow is to lower TTLs 24–48 hours ahead of any planned change so caches worldwide have aged out by cutover, execute the change, verify, then restore the baseline. This is the backbone of zero-downtime IP migrations and registrar moves. For the full decision framework — including how SaaS platforms balance deploy velocity against cache stability — see Mastering TTL Strategies and the concrete recommendations in best TTL values for high-traffic SaaS platforms.

Platform implementation notes:

  • Cloudflare: ttl: 1 means “Auto” — for proxied records the wire TTL is fixed low (300s) because clients hit the anycast edge, not the origin IP, so origin TTL is largely irrelevant to failover.
  • AWS Route 53: Minimum TTL of 0 is accepted but most resolvers clamp it; Alias records carry no TTL of their own and inherit the target’s, so they always return current AWS endpoints.
  • Self-hosted: $TTL sets the zone default; per-record TTLs override it. Mind the SOA Minimum for NXDOMAIN caching independently.

Configuration example — SOA timer hierarchy:

$TTL 3600                       ; default positive TTL for records without an explicit one
@ IN SOA ns1.example.com. hostmaster.example.com. (
    2026062001 ; serial (YYYYMMDDnn) — must increment for AXFR/IXFR to propagate
    7200       ; refresh — how often secondaries poll the primary
    900        ; retry — wait after a failed refresh
    1209600    ; expire — secondaries stop answering if primary unreachable this long
    300        ) ; minimum — negative-cache TTL for NXDOMAIN/NODATA (RFC 2308)

Bump the serial on every change; a forgotten serial increment is the most common reason a secondary keeps serving stale data after an AXFR.

Apex aliasing & CNAME flattening

RFC 1034 forbids a CNAME at the zone apex because the apex must already carry SOA and NS records, and a CNAME tolerates no siblings. Point a raw CNAME at @ and you break mail, delegation, and the zone’s own authority records. Providers solve this with three mechanisms that all converge on the same wire result: ALIAS/ANAME pseudo-records resolved server-side at query time, native cloud Alias targets (Route 53), and CNAME flattening where the authoritative server follows the target CNAME and returns the resulting A/AAAA to the client. In every case the client receives address records, so RFC compliance is preserved at the wire even though the zone file expresses an alias.

The trade-offs — TTL inheritance, health-check propagation, query cost, and how flattening interacts with a proxied CDN — are subtle enough to warrant their own treatment. See CNAME Flattening Explained and the head-to-head comparison in CNAME flattening vs ALIAS records at the apex domain. This pattern is foundational for SaaS custom-domain onboarding, where thousands of customer apex domains must point at your platform without each customer running an ALIAS-capable provider.

Infrastructure-as-Code example — flattened apex on Cloudflare:

resource "cloudflare_record" "apex_alias" {
  zone_id = "023e105f4ecef8ad9ca31a8372d0c353"
  name    = "@"
  type    = "CNAME"            # Cloudflare flattens this to A/AAAA at the edge
  content = "origin-cdn.provider.net"
  proxied = true
  ttl     = 1                  # required when proxied; ignored on the wire
  comment = "Apex flattening for CDN-fronted origin"
}

Service discovery & mail routing

SRV records (RFC 2782) locate a service by name using four fields — priority, weight, port, target — under a _service._proto.name owner, enabling client-side load distribution and failover for SIP, XMPP, LDAP, and microservice meshes. MX records set mail-exchange preference and only work when aligned with the email-authentication TXT triad: SPF (RFC 7208) authorizes sending hosts, DKIM (RFC 6376) cryptographically signs messages, and DMARC (RFC 7489) ties the two to a published policy and reporting address.

The deep procedures — SRV weighting math, SPF flattening to stay under the 10-lookup limit, DKIM selector rotation, and staged DMARC enforcement — live in Advanced SRV & MX Routing. Start with configuring SRV records for SIP and XMPP services, then secure mail with SPF anti-spoofing, DKIM signing, and DMARC with monitoring and enforcement.

Diagnostic commands:

# Resolve an SRV target set with priority/weight/port/target visible.
dig SRV _sip._tcp.example.com +short

# Confirm mail routing preference order.
dig MX example.com +short

# Inspect the SPF and DMARC policy records.
dig TXT example.com +short
dig TXT _dmarc.example.com +short

Zone administration & GitOps

Primary/secondary synchronization relies on AXFR (full transfer) and IXFR (incremental, RFC 1995), both triggered by an SOA serial change and gated by NOTIFY messages. TSIG (RFC 8945) authenticates transfer endpoints with a shared symmetric key so an attacker cannot pull your entire zone or masquerade as the primary. Treating zones as code — declarative definitions in Git, validated in CI, applied with octodns or Terraform — turns risky manual edits into reviewable, revertible changes and gives you a diff before anything reaches a resolver.

The operational disciplines for splitting authority across providers, performing controlled migrations, and keeping serials and TSIG keys honest are covered in DNS Zone Management, with a full runbook in migrating DNS zones without downtime using zone transfers.

Annotated security/transfer block:

# Generate a TSIG key for authenticated zone transfers.
tsig-keygen -a HMAC-SHA256 xfr-key > /etc/named/xfr.key

# Perform an authenticated AXFR against the primary (secondary's view).
dig @ns1.example.com example.com AXFR -y hmac-sha256:xfr-key:BASE64SECRET==

# Confirm the secondary has caught up by comparing SOA serials.
dig @ns2.example.com example.com SOA +short

# Plan a declarative zone change before applying it.
octodns-sync --config-file=config/octodns.yaml          # dry-run by default
octodns-sync --config-file=config/octodns.yaml --doit   # apply

DNSSEC operational management

DNSSEC closes the spoofing gap by signing every RRset (RRSIG) under keys published as DNSKEY records, with the parent zone vouching for the child via a DS record at the registrar. The chain of trust runs root → TLD → your zone, and it is unforgiving: a missing DS update, an expired RRSIG, or a botched key rollover produces SERVFAIL for every validating resolver — which today includes the major public resolvers and a large share of ISP forwarders. That blast radius is why DNSSEC is an operational discipline, not a one-time checkbox.

The three failure-prone procedures each have a dedicated runbook in DNSSEC Operational Management: automating DNSSEC key rollover using the pre-publish (ZSK) and double-signature (KSK) schemes from RFC 6781; submitting DS records to your registrar so the parent delegation actually trusts your new key; and debugging DNSSEC validation failures when resolvers return SERVFAIL after a rollover or signature expiry.

Platform implementation notes:

  • Cloudflare: One-click signing with managed key rollover; you copy a single DS record to the registrar and Cloudflare handles RRSIG lifecycle and resigning.
  • AWS Route 53: Managed signing with KMS-backed KSKs; you enable signing, then add the generated DS at the registrar — Route 53 automates ZSK rollover but KSK rollover and DS sync remain your responsibility.
  • Self-hosted: dnssec-policy in modern BIND automates key timing, but you still own DS submission and must monitor RRSIG expiry windows.

Annotated DNSSEC validation block:

# Show the DNSKEY set and the RRSIG covering it.
dig DNSKEY example.com +dnssec +multiline

# Verify the parent-side DS matches your zone's KSK (chain of trust to the TLD).
dig DS example.com @a.gtld-servers.net +short

# Ask a validating resolver explicitly; "ad" flag = authenticated data.
dig example.com A +dnssec @1.1.1.1 | grep -E 'flags:|RRSIG'

# A SERVFAIL here that disappears with +cd (checking disabled) means
# validation is failing — usually an expired RRSIG or stale DS.
dig example.com @1.1.1.1
dig example.com @1.1.1.1 +cd

The DNS-to-edge contract: geo-routing, load balancing & CDN integration

This is where DNS stops being a static phone book and becomes part of the traffic plane. GeoDNS answers differently based on the resolver’s (or ECS client’s) location; BGP anycast publishes one address from many PoPs so the network picks the nearest; weighted and latency-based routing policies split or steer traffic across regions; and health-checked failover pools withdraw a degraded origin within seconds — which only works if your TTLs are low enough for the change to take effect. Origin-pull CDNs rely on DNS delegation to route every request through an edge cache before it reaches your backend, so the record you publish at the apex effectively chooses your shielding strategy.

These mechanisms are detailed across the Edge Routing & Serverless Function Architecture guide, and the caching behavior that determines hit ratio and staleness is covered in CDN Caching & Performance Optimization. The key DNS-layer insight: when you front an origin with an anycast edge, DNS failover becomes coarse (PoP-level) and the edge platform’s own health checks and routing logic take over the fine-grained steering.

Infrastructure-as-Code example — health check + failover record (Route 53):

resource "aws_route53_health_check" "api_primary" {
  fqdn              = "api-primary.example.com"
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3      # consecutive failures before unhealthy
  request_interval  = 10     # seconds between checks
}

resource "aws_route53_record" "api_failover" {
  zone_id        = aws_route53_zone.main.zone_id
  name           = "api.example.com"
  type           = "A"
  set_identifier = "primary"
  failover_routing_policy { type = "PRIMARY" }
  health_check_id = aws_route53_health_check.api_primary.id
  alias {
    name                   = aws_lb.primary.dns_name
    zone_id                = aws_lb.primary.zone_id
    evaluate_target_health = true   # honor the ALB's own health state
  }
}

The failover_routing_policy PRIMARY record requires an associated health_check_id; without it, Route 53 treats the record as always-healthy and never fails over.

More production configuration examples

Cloudflare API payload — proxied subdomain with edge routing

{
  "type": "CNAME",
  "name": "app",
  "content": "origin-cdn.provider.net",
  "proxied": true,
  "ttl": 1,
  "comment": "Edge-routed app subdomain with CDN origin shielding"
}

For the apex, send "name": "@" with the same body — Cloudflare flattens it to A/AAAA automatically. ttl must be 1 whenever proxied is true.

BIND zone fragment — RFC-compliant apex with delegation and SRV

$TTL 3600
@   IN SOA ns1.example.com. hostmaster.example.com. (
        2026062001 7200 900 1209600 300 )
@           IN NS    ns1.example.com.
@           IN NS    ns2.example.com.
@           IN A     203.0.113.10
@           IN AAAA  2001:db8::10
@           IN MX 10 mail.example.com.
@           IN CAA 0 issue "letsencrypt.org"
www         IN CNAME example.com.
_sip._tcp   IN SRV 10 60 5060 sip1.example.com.
_sip._tcp   IN SRV 20 0  5060 sip2.example.com.

Note the apex carries SOA, NS, A, AAAA, MX, and CAA together — exactly the coexistence a CNAME at @ would forbid, which is why apex aliasing needs flattening or ALIAS.

Edge cases & warnings

Scenario Impact Mitigation
Raw CNAME at zone apex (@) Violates RFC 1034; breaks SOA, NS, and MX at the apex Use ALIAS/ANAME or provider CNAME flattening; never publish a bare CNAME at root
DNSSEC key rolled without updating the registrar DS Chain of trust breaks → SERVFAIL for all validating resolvers Pre-publish per RFC 6781 and automate DS submission; verify with dig DS at the parent before retiring the old key
Forgot to increment SOA serial after a zone edit Secondaries keep serving stale data; IXFR/AXFR never triggers Use a YYYYMMDDnn serial in CI and fail the pipeline if the serial did not change
TTL set to 0 on a high-traffic record Resolver cache bypass amplifies authoritative query volume; an authoritative blip becomes immediately user-visible Keep a 60s practical floor in production; lower temporarily only during a planned cutover window
Split-horizon views misconfigured across VPC and public resolvers Internal names resolve to public IPs (or vice versa); routing loops, failed health checks, data exposure Separate views/zones explicitly, use forwarding zones, and validate resolution from both network contexts
Apex flattening across providers with mismatched TTLs Flattened A/AAAA inherit the target’s TTL, surprising your failover timing Confirm the effective wire TTL with dig @authoritative rather than trusting the zone-file value

Frequently Asked Questions

How does CNAME flattening differ from an ALIAS record at the DNS level? Both return A/AAAA records on the wire, so clients can’t tell them apart. CNAME flattening means you author a CNAME in the zone and the authoritative server resolves the target server-side at query time; an ALIAS/ANAME is a distinct pseudo-record type that the provider resolves the same way. The practical differences are TTL inheritance and how each interacts with health checks and proxying — covered in the apex aliasing guide.

What TTL should production SaaS domains use between deployments? Hold a baseline of 300–3600 seconds for stability and lower query cost, then drop to 60–120 seconds 24–48 hours before a planned migration or failover test so global caches age out, and restore the baseline afterward to avoid cache thrashing. If the record is proxied through an anycast CDN, origin TTL barely matters because clients connect to the edge, not the origin IP.

Can DNSSEC coexist with a CDN, WAF, and proxied edge routing? Yes. DNSSEC signs the DNS answer chain and is independent of HTTP/TLS, which operate at higher layers. You can sign your zone and still proxy traffic through a CDN — the resolver validates the (possibly flattened) A/AAAA answer, then the client connects to the edge over TLS. Ensure your provider signs the synthesized records it returns for proxied/flattened names.

Why do some resolvers ignore an updated record right after I change it? Recursive resolvers cache the prior answer for its full TTL, and some ISP forwarders enforce their own minimum cache times on top. Query the authoritative server directly with dig @ns1.example.com ... +norecurse to confirm the zone is correct, then test public resolvers (1.1.1.1, 8.8.8.8) to watch caches expire. This is exactly the workflow in the propagation-debugging guide.

How do I validate DNS changes in a CI/CD pipeline before they go live? Run named-checkzone or octodns-validate to catch syntax and dangling-reference errors, then a terraform plan or octodns-sync dry-run to diff intended changes against live state, gating the apply on review. Post-apply, assert live resolution with dig +short against multiple resolvers and hit your /health endpoint before declaring the change complete.

Back to Edge & DNS Ops Guide