Load Balancing at the Edge

Q: How does edge load balancing differ from a traditional Layer 4/7 load balancer?

A traditional load balancer sits in one data center and distributes across local backends. An edge load balancer runs at every CDN PoP, makes the steering decision close to the user from a replicated health table, and fails over in under a second without waiting for DNS caches to expire.

Q: What is the optimal DNS TTL for edge failover?

Use 30-60 seconds on the load balancer hostname. That keeps a DNS-level rollback fast without flooding resolvers with queries. Real failover between origins should happen at the edge via health checks, not by waiting for DNS TTLs to lapse.

Q: Can edge load balancers keep session affinity across regions?

Yes, with an edge-set cookie or a JWT signed by a key shared across all PoPs, backed by a globally replicated session store. Avoid source-IP affinity because carrier-grade NAT and mobile IP churn break it. Always implement a graceful fallback when the pinned pool fails.

Q: Why do I see 5xx errors during a partial origin outage even though failover is configured?

The usual cause is a DNS TTL longer than the edge health-check window: DNS keeps pinning users to a region the edge is trying to drain. Shorten the TTL to 30-60s, make sure error responses are not cached, and confirm the failed pool’s health checks are actually transitioning to unhealthy.

Edge load balancing distributes incoming requests across multiple origin pools from the CDN’s network edge, so routing decisions, health evaluation, and failover happen at a Point of Presence (PoP) close to the user instead of at a single distant data center. As part of Edge Routing & Serverless Function Architecture, it combines Anycast IPs, programmable edge runtimes, and active origin health checks to deliver sub-100ms steering decisions with sub-second failover — something pure DNS round-robin can never achieve because it is bottlenecked by resolver TTL caches.

This guide covers the wire-level mechanism, provider-specific configuration for Cloudflare, AWS, and Fastly/Azure/GCP, traffic-steering algorithms, the operational rollout procedure, TTL and caching implications, and a troubleshooting and rollback playbook.

Key implementation priorities:

Move steering and failover decisions from the DNS layer to the CDN edge to collapse failover from minutes to seconds.
Run active edge health checks and automatic failover so dead origins are drained before users see 5xx errors.
Use weighted load balancing across multi-region origins for canary releases, capacity-proportional splits, and controlled regional drains.
Keep DNS TTLs short (30-60s) at the load balancer hostname so DNS-layer changes can still propagate quickly when edge steering itself must change.

Core architecture: where edge load balancing sits

An edge load balancer is a control plane that lives at every PoP in the CDN’s Anycast network. When a client resolves your load balancer hostname, the answer is a small set of Anycast IPs shared by hundreds of PoPs. BGP routing delivers the packets to the topologically nearest PoP; that PoP terminates TLS, then consults the load balancer’s steering policy and the live health state of each origin pool to pick a backend. Because both the policy and the health table are replicated to every PoP, the steering decision is a local lookup measured in microseconds — no round trip to a central controller.

This is fundamentally different from DNS-based load balancing, where the routing decision is baked into the A/AAAA record the resolver hands back and is then cached for the full TTL. Edge routing decouples the name resolution layer (slow to change, governed by TTL) from the request distribution layer (changes instantly at the edge). The hostname that fronts the load balancer should still be managed carefully — see DNS Zone Management for how to keep that record, its delegation, and its TTL under version control so a DNS-level rollback is always available as a backstop.

Routing layer comparison:

Component	Function	Failover speed	Scope
Authoritative DNS	Initial IP resolution	30-300s (TTL dependent)	Global
CDN edge router	Request distribution & caching	<1s	Regional/Global
Origin LB	Backend server distribution	1-5s	Datacenter

TTL management dictates how fast a DNS-level change reaches users, while the edge layer’s health table dictates how fast an origin-level failure is routed around. TLS termination at the edge also removes handshake latency and origin CPU cost, and lets the edge inspect the request (geo, headers, JWT) before choosing a pool — closely related to Geo-Targeted Traffic Routing, which uses the same inspection point to enforce regional and compliance boundaries.

Verify the resolution and routing path with a trace:

dig +trace api.example.com

Expected output:

;; Received 256 bytes from 198.41.0.4#53(a.root-servers.net) in 12 ms
;; Received 144 bytes from 192.33.4.12#53(c.root-servers.net) in 15 ms
;; ANSWER SECTION:
api.example.com.        30  IN  CNAME   edge-cdn-provider.net.
edge-cdn-provider.net.  30  IN  A       203.0.113.45

The 30-second TTL on the CNAME is the deliberate ceiling on DNS-level failover; the A record points at an Anycast prefix, so the actual PoP selection is handled by BGP, not by this record.

It is worth being precise about the three independent decisions hidden in a single request. BGP/Anycast decides which PoP the packet reaches — you do not control this directly; it follows network topology and is effectively instant. Steering policy decides which pool that PoP forwards to — fully under your control and changeable in seconds. Origin LB (if present) decides which server inside the pool handles the request. Conflating these is the root of most “my failover didn’t work” tickets: a request can reach a perfectly healthy PoP, be steered correctly to a healthy pool, and still 500 because a single server behind the pool’s own load balancer is broken. Edge load balancing only owns the middle decision; the pool’s internal health is the origin LB’s job, which is why your /healthz endpoint must reflect the whole application stack, not just that the web server is accepting connections.

Steering algorithms and weighted routing

The steering policy is the heart of the load balancer. It decides, per request, which healthy pool receives traffic. The four policies you will use most:

Dynamic latency steering — each PoP measures RTT to every pool from active probes and sends traffic to the lowest-latency healthy pool. Best default for global APIs.
Geo steering — map countries or continents to specific pools for data sovereignty or regional caching. Pairs directly with Geo-Targeted Traffic Routing.
Weighted (proportional) steering — split traffic by explicit weights. This is the mechanism behind canary releases and capacity-aware splits, covered in depth in Weighted Load Balancing Across Multi-Region Origins.
Failover / off (offload) — a strict ordered list; pool B only receives traffic when pool A is unhealthy. Ideal for active-passive disaster recovery and maintenance drains.

Steering policy specifications:

Algorithm	Use case	Configuration parameter	Persistence
Dynamic latency	Global API acceleration	`steering_policy: "dynamic_latency"`	Cookie/JWT
Geo-location	Compliance & regional caching	`steering_policy: "geo"`	None required
Weighted	Canary releases, capacity splits	pool `weight: 0.1` (canary)	IP or Cookie
Off (failover)	Active-passive, maintenance	`steering_policy: "off"`	N/A

Session affinity layers on top of steering: once a user lands on a pool, an edge-set cookie or signed JWT pins subsequent requests to the same pool so stateful sessions are not torn across regions. Prefer cookie or JWT affinity over source-IP affinity, which breaks under carrier-grade NAT and mobile IP churn.

Two refinements matter in production. First, steering is evaluated only against the healthy subset of pools. If dynamic latency steering would normally pick us-east but that pool is unhealthy, the policy re-evaluates over the remaining healthy pools and picks the next-lowest latency rather than blindly failing. This is why the steering policy and the health table must be reasoned about together, not in isolation. Second, weighted and latency steering compose: many platforms let you assign weights within a latency or geo tier, so you can say “send EU traffic to the two EU pools, split 80/20.” That composition is the foundation of safe regional canaries — you ramp the new pool’s weight inside its own region without ever exposing it to the wrong geography.

A common mistake is reaching for source-IP affinity because it requires no client-side state. In practice it produces uneven distribution (large NAT pools land entirely on one origin), it fails completely when a user roams between Wi-Fi and cellular, and it offers no cryptographic integrity. A signed JWT carrying the chosen pool id solves all three: it survives IP changes, distributes evenly because it is set per session, and cannot be forged because every PoP validates the signature with the shared key. The trade-off is that you must rotate that signing key across all PoPs atomically, or in-flight sessions will fail validation mid-request.

Provider implementations

The mechanism is universal; the syntax and execution model differ. Below, each major platform with a working snippet.

Cloudflare (Load Balancer + Workers)

Cloudflare exposes load balancers declaratively over the API and Terraform. A load balancer references one or more pools, each pool references origins, and each pool carries a monitor (health check). The steering policy and session affinity live on the load balancer object.

{
  "name": "prod-api-lb",
  "fallback_pool_id": "pool-origin-backup",
  "default_pools": ["pool-us-east", "pool-eu-west"],
  "steering_policy": "dynamic_latency",
  "proxied": true,
  "ttl": 30,
  "session_affinity": "cookie",
  "session_affinity_ttl": 3600
}

For request-level logic beyond what the declarative object supports, attach a Worker. This is where Cloudflare Workers Routing and edge load balancing meet — the Worker can override pool selection per request based on headers, A/B cohort, or auth claims:

export default {
  async fetch(request, env) {
    const cohort = request.headers.get("x-canary") === "1";
    const origin = cohort ? "https://canary.origin.internal"
                          : "https://stable.origin.internal";
    const url = new URL(request.url);
    const target = new URL(origin);
    url.hostname = target.hostname;
    return fetch(url, request);
  }
};

AWS (Route 53 + CloudFront origin groups)

AWS splits the job: Route 53 does latency/geo record selection, and CloudFront origin groups provide the fast in-region failover between a primary and secondary origin without waiting for DNS.

resource "aws_route53_record" "api_latency" {
  zone_id        = var.hosted_zone_id
  name           = "api.example.com"
  type           = "A"
  set_identifier = "us-east-1"

  latency_routing_policy {
    region = "us-east-1"
  }

  alias {
    name                   = aws_cloudfront_distribution.main.domain_name
    zone_id                = aws_cloudfront_distribution.main.hosted_zone_id
    evaluate_target_health = true
  }
}

resource "aws_cloudfront_distribution" "main" {
  origin_group {
    origin_id = "primary-secondary-group"
    failover_criteria {
      status_codes = [500, 502, 503, 504]
    }
    member { origin_id = "origin-us-east-1" }
    member { origin_id = "origin-eu-west-1" }
  }
  # ... origins, default_cache_behavior, viewer_certificate ...
}

The origin group fails over on the listed status codes within a single CloudFront request, so users do not wait for the Route 53 TTL to expire.

Fastly, Azure, and GCP

Fastly performs steering and failover in VCL with backend .probe health checks and director objects:

backend origin_us_east {
  .host = "us-east.origin.internal";
  .port = "443";
  .ssl = true;
  .probe = {
    .request = "GET /healthz HTTP/1.1" "Host: api.example.com" "Connection: close";
    .interval = 30s;
    .timeout = 5s;
    .window = 5;
    .threshold = 3;
  }
}

sub vcl_recv {
  if (!backend.origin_us_east.healthy) {
    set req.backend = origin_eu_west;
  }
}

Azure Front Door uses origin groups with a SampleSize/SuccessfulSamples/AdditionalLatencyMilliseconds probe model and priority + weight per origin. GCP uses a global external HTTPS load balancer with backend services, where outlier detection and health checks drain unhealthy backends and weighted backend buckets enable splits. All three express the same primitives — pools, weights, probes, failover order — just under different names.

Platform configuration matrix:

Provider	Mechanism	Wire behavior	Failover / Notes
Cloudflare	Load Balancer API + Workers	Anycast PoP picks pool from replicated health table	`fallback_pool_id`; Worker can override per request
AWS	Route 53 + CloudFront origin groups	DNS picks region; origin group fails over in-request	Failover on 5xx status codes within one request
Fastly	VCL directors + `.probe`	Edge picks backend per `vcl_recv` logic	Service version activation rollback
Azure Front Door	Origin group + priority/weight	Edge probes + latency-weighted selection	Priority then weight; `AdditionalLatencyMilliseconds` band
GCP	Global HTTPS LB + backend services	Backend selection + outlier detection	Weighted backend buckets; outlier ejection

Operational rollout procedure

Treat any steering change as a production deploy with staged validation.

Define pools and origins as code. Commit the load balancer, pool, and monitor objects to your IaC repo. Never click-configure production.
Attach health monitors before steering. Point each pool at a lightweight /healthz endpoint and confirm every origin reports healthy. See Configuring Edge Health Checks and Automatic Failover for probe interval, timeout, and threshold tuning.
Validate in staging. Apply the policy against a staging hostname and run synthetic checks from multiple regions to confirm the expected pool is selected.
Ship weighted, not all-at-once. Introduce a new pool at a low weight (e.g. 5-10%) using weighted multi-region routing, watch error rates and latency, then ramp.
Set the DNS TTL to 30-60s on the load balancer hostname so a DNS-level rollback path stays fast.
Activate and monitor. Promote the configuration version, then watch trace headers and 5xx rates for at least one full health-check window.

# Validate the health endpoint each origin will be probed on
curl -v -H "Host: api.example.com" https://203.0.113.45/healthz

Expected output:

< HTTP/2 200
< content-type: application/json
< x-health-status: healthy
< cache-control: no-store
{"status":"ok","region":"us-east-1","uptime":"14d"}

TTL, caching, and propagation implications

Edge load balancing intersects with three different expiry clocks, and most production incidents come from these clocks being out of sync.

DNS TTL on the load balancer hostname controls how long resolvers cache the Anycast answer. Keep it at 30-60s. A 3600s TTL means a DNS-level failover (e.g. abandoning a whole CDN) takes up to an hour to clear caches.
Health-check window controls how fast the edge notices an origin is down: interval × unhealthy_threshold. A 30s interval with a threshold of 3 means up to ~90s of detection latency. Tighten the threshold for faster detection at the cost of more flapping sensitivity.
Cache TTL at the edge. If the edge serves cached objects, a failed origin may be invisible to users for cached paths but cause errors on uncached ones. Decide explicitly whether to serve stale on origin failure (stale-if-error) rather than failing over.

The rule: the DNS TTL should be shorter than or equal to the time you are willing to wait for a DNS-level rollback, and the health-check window should be shorter than your error budget for a single-origin outage. When these are mismatched — long DNS TTL plus aggressive edge failover — you get 5xx spikes during partial outages because DNS keeps pinning users to a region the edge is trying to drain.

There is also a propagation subtlety unique to the edge model. Because health state is replicated across PoPs rather than computed centrally, there is a brief window — typically a few seconds — where one PoP has marked a pool unhealthy and another has not yet received the update. During this window two users in different regions can get different routing for the same hostname. This is normal and self-healing, but it means you should never assert that “all traffic moved at time T”; instead, monitor the rate of requests still hitting the draining pool and confirm it trends to zero within one or two replication intervals. If it plateaus above zero, a subset of PoPs are not receiving health updates — usually because their probes to the origin are being blocked by a firewall rule that allows some CDN ranges but not others.

Finally, remember that caching can mask the entire failover. If a path is served from edge cache, a dead origin produces zero user-visible errors for that path until the cached object expires, while uncached paths fail immediately. Decide deliberately, per route, whether origin failure should trigger failover or fall back to stale content (stale-if-error). The two strategies are not mutually exclusive: serve stale for cacheable GETs and fail over for everything else.

Troubleshooting and rollback

Every edge provider injects a trace identifier. Capture it on every error and correlate against your logs.

Trace header reference:

Header	Provider	Purpose
`cf-ray`	Cloudflare	Unique request id + PoP code
`x-vercel-id`	Vercel	Edge function execution trace
`x-amz-cf-id`	CloudFront	CloudFront request tracking
`fastly-debug`	Fastly	Detailed routing & cache state

# Extract edge routing headers from a live response
curl -sI https://api.example.com | grep -E "(cf-ray|x-vercel-id|x-amz-cf-id|server)"

Expected output:

cf-ray: 8a1b2c3d4e5f6789-IAD
server: cloudflare
x-edge-origin: us-east-1
x-cache-status: MISS

The -IAD suffix on cf-ray tells you which PoP served the request; x-edge-origin tells you which pool it chose. If those two disagree with your steering policy, the health table is the first place to look.

Rollback playbook, fastest to slowest:

Steering toggle (instant) — switch the load balancer steering_policy to off and point at a known-good pool, or flip the Worker override flag. No DNS involved.
Pool weight to zero (instant) — drain a bad pool by setting its weight to 0; healthy pools absorb traffic immediately.
Configuration version revert (seconds) — re-activate the previous load balancer / service version. Keep versioned configs so this is a one-command operation.
DNS-level failover (up to TTL) — last resort: repoint the hostname at a different CDN or origin via DNS Zone Management. Bounded by the TTL you set earlier, which is why 30-60s matters.

Edge cases and gotchas

DNS TTL vs health-check interval mismatch. Long DNS TTL with fast edge failover causes 5xx spikes during partial outages. Align TTL to 30-60s and prefer edge-level failover for sub-second recovery.
Sticky-session loss on pool migration. When a user’s pinned pool fails, affinity must fall back gracefully. Use edge-issued JWTs signed with a key shared across all PoPs and a globally replicated session store; never rely on source-IP affinity.
Health-check flapping. Aggressive thresholds cause traffic to oscillate and trigger origin cold starts (thundering herd). Require multiple consecutive passes/fails and use backoff on recovery probes.
TLS/SNI mismatch edge-to-origin. A certificate or SNI mismatch on a secondary pool only surfaces during failover, when it is hardest to debug. Validate origin certificates and SNI on every pool, not just the primary, in staging.
Cached errors. Make sure error responses (5xx) are never cached at the edge; a single cached 503 can poison a path for the whole cache TTL. Set Cache-Control: no-store on health and error responses.
Probe source IPs. Origin firewalls must allow the CDN’s probe IP ranges, or every pool will read as unhealthy and the load balancer will hard-fail to its fallback.

Frequently Asked Questions

How does edge load balancing differ from a traditional Layer 4/7 load balancer? A traditional load balancer sits in one data center and distributes across local backends. An edge load balancer runs at every CDN PoP, makes the steering decision close to the user from a replicated health table, and fails over in under a second without waiting for DNS caches to expire.

What is the optimal DNS TTL for edge failover? Use 30-60 seconds on the load balancer hostname. That keeps a DNS-level rollback fast without flooding resolvers with queries. Real failover between origins should happen at the edge via health checks, not by waiting for DNS TTLs to lapse.

Can edge load balancers keep session affinity across regions? Yes, with an edge-set cookie or a JWT signed by a key shared across all PoPs, backed by a globally replicated session store. Avoid source-IP affinity because carrier-grade NAT and mobile IP churn break it. Always implement a graceful fallback when the pinned pool fails.

Why do I see 5xx errors during a partial origin outage even though failover is configured? The usual cause is a DNS TTL longer than the edge health-check window: DNS keeps pinning users to a region the edge is trying to drain. Shorten the TTL to 30-60s, make sure error responses are not cached, and confirm the failed pool’s health checks are actually transitioning to unhealthy.

Back to Edge Routing & Serverless Function Architecture

Load Balancing at the Edge

Core architecture: where edge load balancing sits

Steering algorithms and weighted routing

Provider implementations

Cloudflare (Load Balancer + Workers)

AWS (Route 53 + CloudFront origin groups)

Fastly, Azure, and GCP

Operational rollout procedure

TTL, caching, and propagation implications

Troubleshooting and rollback

Edge cases and gotchas

Frequently Asked Questions

Related