Edge Routing & Serverless Function Architecture
Edge routing pushes request steering, authentication, transformation and caching logic out of centralized origins and onto a global mesh of points of presence (PoPs) that sit milliseconds from the user. This guide is the reference map for the entire discipline: how DNS hands a client to the nearest PoP, how serverless isolates execute routing code in single-digit milliseconds, and how you compose those primitives into a resilient, observable production system.
Key architectural considerations covered here:
- Anycast + DNS handoff: how a single address resolves to the closest PoP, and the TTL and DNSSEC choices that make failover fast and trustworthy.
- Execution models: V8 isolates versus container-per-invocation, CPU/memory budgets, and the cold-start arithmetic that decides where logic belongs.
- Request lifecycle: the strict ordering of routing, auth, transformation, cache lookup and origin fetch inside a single PoP.
- Traffic steering: weighted, latency-based and geo-aware policies, plus health checks and automatic failover across multi-region origins.
- Security at the perimeter: WAF, bot mitigation and rate limiting applied before a request ever touches your application.
- Resilience: stale-serving, multi-cloud fallback and the observability needed to operate the whole thing.
The diagram below shows the end-to-end path a request takes from a client through anycast DNS into a PoP’s processing pipeline and out to a steered origin — the mental model the rest of this guide fills in.
Concept overview: from resolver query to steered origin
Every edge request begins as a DNS lookup. The client’s recursive resolver asks for your hostname and gets back an anycast address — a single IP advertised from hundreds of PoPs simultaneously. BGP delivers the packet to the topologically nearest PoP, so geographic proximity and routing health, not a central decision, determine where the request lands. This is why the DNS Fundamentals & Advanced Record Configuration layer is inseparable from edge routing: the records you publish, and the TTLs you attach to them, govern how quickly traffic can be re-steered when a region degrades.
Once inside a PoP, the request enters a deterministic pipeline. Security runs first — WAF & Rate Limiting at the Edge inspects and may reject the request before any application code executes, which is the cheapest possible place to shed abusive load. Surviving requests hit route matching, where path and host patterns decide which function or behavior applies. Then comes authentication — typically JWT validation at the edge so unauthenticated traffic never reaches origin. Surviving requests pass through Request/Response Transformation to rewrite headers and paths, then a cache lookup that can short-circuit the whole thing with a stored response. Only a genuine miss triggers a steered fetch to one of several origins.
The compute that runs this pipeline comes in two main flavors, covered in depth in Cloudflare Workers Routing and Vercel Edge Middleware. Both run on V8 isolates and share the same fundamental advantage over container-based functions: near-zero cold starts. The remaining sections break down each stage, the platform differences, and the operational patterns that keep the system reliable.
A useful way to reason about the architecture is to separate the control plane from the data plane. The control plane is everything you configure ahead of time: DNS records, routing rules, WAF expressions, load-balancer pools, KV bindings and secrets. It changes on a deployment cadence — minutes to hours — and lives in version control. The data plane is the request-time execution: the isolate that runs on every hit, the cache lookups, the steering decisions, the rate-limit counters. It changes on a per-request cadence — microseconds. Almost every production incident in edge systems traces back to a mismatch between the two: a control-plane change (a new TTL, a tightened WAF rule, a re-weighted pool) propagating into the data plane faster or slower than the operator expected. Keeping that distinction explicit is the single most valuable mental discipline when operating the stack, and it is why nearly every section below pairs a configuration artifact with a runtime verification command.
DNS configuration & anycast routing fundamentals
Edge routing inherits whatever DNS hygiene you bring to it. Authoritative records must resolve to the proxied edge address, and apex domains need CNAME Flattening Explained-style handling because a bare domain cannot legally hold a CNAME alongside the SOA and NS records that RFC 1034 and RFC 1035 require at every zone apex.
- CNAME flattening vs ALIAS records: Root domains cannot use standard CNAMEs per RFC 1034 §3.6.2. Cloudflare resolves a flattened CNAME at query time and returns A/AAAA answers; AWS Route 53 uses ALIAS records that map to underlying targets dynamically. Both preserve apex routing without violating the spec.
- DNSSEC validation: Maintain an unbroken cryptographic chain by rotating Key Signing Keys (KSK) and Zone Signing Keys (ZSK) systematically per RFC 6781. A stale or mismatched DS record at the registrar produces SERVFAIL across validating resolvers — an outage that no edge logic can recover from because clients never get an address.
- TTL tuning: Lower TTLs (60–300s) accelerate re-steering during incidents; higher TTLs (3600s+) reduce authoritative query volume during steady state. The right value is a function of how fast you need to move traffic, not a fixed best practice.
- Geolocation & ASN mapping: Edge providers map client source IPs to proprietary GeoIP and ASN databases. That mapping feeds Geo-Targeted Traffic Routing and BGP path selection so regional users reach regional capacity.
Validate the delegation chain and authoritative responses before trusting any edge behavior:
# Walk the delegation from root to authoritative and inspect each referral
dig +trace example.com
# Confirm the DS record at the parent matches the published DNSKEY
dig DS example.com @8.8.8.8 +dnssec +short
dig DNSKEY example.com +dnssec +short
# Verify the proxied edge address actually answers and check TTL
dig +noall +answer A app.example.com
# app.example.com. 300 IN A 104.18.0.0 <- TTL 300, anycast edge IP
A 300 in the TTL column above means resolvers will hold this answer for five minutes; during a regional failover, that is your worst-case window before stale clients re-query. If you are migrating origins, drop the TTL well in advance — see the edge-cases table below.
Edge middleware & serverless execution models
The defining property of edge compute is where and how the runtime initializes. This is the single biggest architectural fork, and it dictates which logic you can safely place at the edge.
- Execution contexts: V8 isolates (Cloudflare Workers, Deno Deploy, Vercel Edge) share one OS process and switch between tenant contexts in microseconds with memory-safe boundaries, effectively eliminating cold starts. Container-per-invocation models (AWS Lambda@Edge classic) boot a full runtime on a cold request — tens to hundreds of milliseconds — though they offer richer language and library support.
- Resource budgets: Isolate platforms enforce strict CPU-time limits (typically 10–50ms wall-clock on baseline tiers, up to several seconds on paid tiers) and memory caps around 128MB. These budgets are deliberate: they keep one tenant from starving a shared PoP and force you to keep routing logic lean.
- Secret management: Inject credentials via provider secret stores or CI/CD pipelines, never in the deployed bundle. A leaked secret in an edge bundle is replicated to every PoP on earth within seconds.
- Route matching priority: Order matters. Specific exact-path routes must precede wildcard routes (
/api/*) so a broad pattern never shadows a narrow one. Most providers evaluate routes in declaration or specificity order — verify which, because the wrong assumption silently misroutes traffic.
Deploy and exercise the two dominant platforms with their native CLIs:
# Cloudflare Workers — publish a routing isolate and tail live logs
wrangler deploy
wrangler tail --format pretty
# Vercel Edge Middleware — run locally against the edge runtime, then ship
vercel dev
vercel deploy --prebuilt
For a head-to-head on latency, CPU limits and developer ergonomics, see the Vercel Edge vs Cloudflare Workers performance comparison and the broader Cloudflare Workers vs AWS Lambda@Edge for Request Routing breakdown. The general rule: anything that must run on every request — auth checks, header rewrites, geo decisions — belongs on isolates; anything heavy or stateful belongs behind an origin or a queue.
Spec and syntax reference
Edge platforms converge on the WHATWG Fetch standard rather than provider-specific request objects, which is what makes routing code broadly portable. A Worker, a Vercel middleware and a Deno Deploy handler all receive a standard Request and return a standard Response; the differences are in the surrounding runtime APIs (KV, geo metadata, waitUntil) rather than the core types. The taxonomy below maps the primitives you will reference across the rest of this guide and the specs that define them.
| Primitive | Standard / RFC | Role in the pipeline |
|---|---|---|
Request / Response |
WHATWG Fetch | The objects every edge handler receives and returns |
Cache-Control directives |
RFC 9111 | Govern freshness, revalidation and stale-serving at the edge |
stale-while-revalidate |
RFC 5861 | Serve expired content while revalidating in the background |
Vary |
RFC 9110 §12.5.5 | Declares which request headers fork the cache key |
| Apex CNAME flattening / ALIAS | RFC 1034 §3.6.2 | Resolve a bare domain to A/AAAA without a literal apex CNAME |
| DNSSEC chain (DS / DNSKEY) | RFC 4035, RFC 6781 | Cryptographic trust from registrar to authoritative answer |
| JWT bearer validation | RFC 7519, RFC 9068 | Verify caller identity at the edge before origin fetch |
| Anycast addressing | BGP (RFC 4271) | Advertise one address from many PoPs for proximity routing |
The practical consequence of standardizing on Fetch is that most routing logic can be unit-tested in plain Node or Deno against synthetic Request objects, with platform-specific bindings stubbed — a property you should exploit in the CI section below.
Request and response transformation & caching
The transformation stage is where edge functions earn their keep, reshaping requests for the origin and responses for the client without round-tripping logic to a central server. The full treatment lives in Request/Response Transformation, but the load-bearing patterns are:
- Header injection: Append routing and identity metadata (
x-forwarded-for,cf-ipcountry,x-edge-region) so the origin can segment traffic or assign A/B cohorts. Always set, never trust client-supplied copies of these headers. - Cache key normalization: Strip volatile parameters (
utm_*,fbclid,gclid) before computing the cache key, and scopeVarynarrowly. An over-broadVary: User-Agentfragments your cache into thousands of near-duplicate entries and collapses hit ratio. - Partial vs full-page caching: Edge-Side Includes assemble pages from independently cached fragments but cost PoP CPU per assembly; full-page caching scales better for static-heavy SaaS dashboards. Choose based on how personalized the page actually is.
- Compression and assets: Apply Brotli or Zstandard at the edge and offload image resizing to dedicated transform workers so you do not spend your function CPU budget on media.
Cache-control semantics are deep enough to warrant their own track — see CDN Caching & Performance Optimization for cache keys, Vary, purging and stale-serving. Validate directives and confirm what the edge is actually storing:
# Inspect what the edge returns, including cache status and applied directives
curl -sI https://app.example.com/dashboard \
| grep -iE 'cf-cache-status|cache-control|vary|age'
# cf-cache-status: HIT
# cache-control: public, max-age=60, stale-while-revalidate=300
# vary: Accept-Encoding
# age: 42 <- served 42s into a 60s fresh window
# Force a revalidation to confirm the origin path still works
curl -sI -H 'Cache-Control: no-cache' https://app.example.com/dashboard \
| grep -i cf-cache-status
# cf-cache-status: MISS <- bypassed cache, fetched origin
Global load balancing & traffic distribution
Once a request misses cache, the PoP must choose an origin. Load Balancing at the Edge covers the steering policies and the health-check machinery that makes failover automatic rather than a 3am page.
- Health checks: Active probes (HTTP/HTTPS/TCP) test origin liveness every 15–120s and pull failing pools out of rotation; passive monitoring reacts to real-time 5xx spikes without synthetic traffic. Combine both for fast, accurate detection. See Configuring Edge Health Checks and Automatic Failover.
- Steering algorithms: Weighted distribution sends a controlled share to each pool — the basis of canaries and Weighted Load Balancing Across Multi-Region Origins. Least-connections adapts to varying request durations; latency-based steering routes to the fastest-responding origin per PoP.
- Session persistence: Sticky cookies bind a user to one origin. Use this only when state genuinely cannot be externalized to Redis or DynamoDB — stickiness undermines even load distribution and complicates failover.
- Origin shields: A designated intermediate cache tier absorbs misses from every PoP into a single upstream, cutting origin connection count and bandwidth dramatically during traffic spikes.
Provision the topology as code so steering policy is reviewable and reversible:
# Apply only the load-balancer module, then verify pool health from the API
terraform apply -target=module.edge_lb
curl -s -H "Authorization: Bearer $CF_API_TOKEN" \
"https://api.cloudflare.com/client/v4/zones/$ZONE/load_balancers" \
| jq '.result[].default_pools'
Security at the perimeter: WAF & rate limiting
The cheapest request to serve is the one you reject before it costs you anything. Running WAF & Rate Limiting at the Edge means injection attempts, credential stuffing and volumetric floods are filtered at the PoP nearest the attacker, not after they have traversed your network and consumed origin capacity.
- Managed and custom rules: Provider-managed rulesets cover the OWASP categories; layer custom expressions on top for app-specific paths. Blocking Common Attacks with Cloudflare WAF Rules walks through concrete rule construction.
- Rate limiting: Enforce per-IP, per-token or per-path budgets at the edge so a single abusive client cannot exhaust shared capacity — see Rate Limiting API Requests at the Edge. Counters live in the edge data plane, so limits apply globally rather than per-PoP when the provider aggregates state.
- Bot mitigation and geo controls: Combine reputation scoring with country-level blocking or redirection for jurisdictions you do not serve, shedding load and meeting compliance requirements in one rule.
Because security runs first in the pipeline, a misconfigured WAF rule can black-hole legitimate traffic just as effectively as it blocks attacks. Always deploy new rules in log/observe mode, confirm the match rate against real traffic, then promote to block.
Platform notes per provider
The same patterns map onto different vocabularies depending on where you run them, and the operational ceilings differ enough to influence design.
- Cloudflare: Workers run on V8 isolates with the Fetch API, durable coordination via Durable Objects, and global state via Workers KV (eventually consistent) or D1 (SQL). WAF, rate limiting and load balancing are first-class proxy features configured through the Rulesets API, so much of the pipeline is declarative rather than coded. CPU limits start around 10ms on the free tier and extend to 30s on paid tiers.
- AWS: Lambda@Edge attaches functions to CloudFront at four trigger points (viewer-request, origin-request, origin-response, viewer-response), each with its own size and timeout limits; CloudFront Functions offer a lighter, isolate-like option for header and URL work. Steering and health checks come from Route 53 and CloudFront origin groups rather than a single load-balancer object.
- Fastly: Compute runs on a WebAssembly runtime, and VCL remains available for declarative caching and request manipulation. Fastly’s instant purge and surrogate-key model is among the most granular for cache invalidation.
- Self-hosted: An NGINX or Envoy tier with Lua/WASM filters reproduces routing, transformation and rate limiting on your own PoPs, trading managed convenience for control and cost predictability. You own the anycast announcement and health-check tooling end to end.
Whatever the provider, the design questions are identical: where does each stage of the pipeline run, what is its CPU and memory budget, and how fast does a control-plane change reach the data plane.
CI/CD integration & preview environments
Edge configuration is code, and it deserves the same pipeline discipline as the application it fronts.
- Branch-based routing: Map Git branches to ephemeral hostnames (
feature-x.preview.example.com) and route them with edge rules so previews appear instantly without waiting on DNS propagation. - Infrastructure as Code: Version routing rules, KV bindings, WAF expressions and secrets in Terraform or Pulumi. Treat every change as an immutable, reviewable artifact rather than a dashboard click.
- Rollback strategy: Keep versioned deployment artifacts and wire automated rollback to health-check degradation or error-rate thresholds, so a bad deploy reverts before it becomes an incident.
- Test isolation: Run lint, unit and integration tests against an isolated edge sandbox in CI before promotion, including the auth and rate-limit paths that are easy to break invisibly.
# Trigger a preview deploy and capture the generated URL for smoke tests
git push origin feature/edge-routing
wrangler versions upload --tag "$(git rev-parse --short HEAD)"
# Promote only after the preview passes health and contract checks
wrangler versions deploy
Resilience & disaster recovery patterns
The edge is a buffer between your users and the failure modes of origins, DNS and individual PoPs. Design it to degrade gracefully rather than fail hard.
- Multi-cloud fallback: Steer to a secondary provider (AWS → GCP, primary CDN → backup CDN) when primary health checks fail, keeping DNS records synchronized across providers so failover does not wait on propagation.
- Stale serving: Apply
stale-while-revalidateandstale-if-errorso the edge serves a recently-expired response during origin downtime instead of an error page. This single pattern converts many origin outages into invisible non-events for users. - Edge state replication: Synchronize KV stores and Durable Objects across regions, accepting eventual consistency for non-critical routing metadata while keeping strongly-consistent state behind a single coordinator.
- Observability: Log every edge function error and routing decision to a centralized stack (Datadog, Grafana, the provider’s analytics), and alert on cache-hit-ratio drops and origin-error spikes, which lead failures by minutes.
# Stream live edge errors during an incident for rapid triage
wrangler tail --status error --format json \
| jq '{ts:.eventTimestamp, path:.event.request.url, msg:.exceptions[0].message}'
The leading indicators worth alerting on are specific: a sustained drop in cache-hit ratio almost always precedes an origin-load spike, because misses are what reach origin; a rising p99 on the edge-to-origin fetch points at a degrading pool before health checks formally eject it; and a climbing rate-limit block count can signal either an attack or a legitimate client looping. Build dashboards around these three signals and most incidents announce themselves minutes before they page. Pair them with synthetic checks that exercise the full pipeline — DNS resolution, WAF pass-through, auth, cache, origin — from several regions, so you detect a regional PoP problem that aggregate metrics would average away.
Production configurations
Terraform: edge load balancer with health-checked pools
resource "cloudflare_load_balancer_pool" "primary" {
account_id = var.account_id
name = "origin-eu"
origins {
name = "eu-1"
address = "10.10.0.10"
enabled = true
weight = 1.0
}
monitor = cloudflare_load_balancer_monitor.https.id
}
resource "cloudflare_load_balancer" "edge_lb" {
zone_id = var.zone_id
name = "app.example.com"
default_pool_ids = [cloudflare_load_balancer_pool.primary.id]
fallback_pool_id = cloudflare_load_balancer_pool.secondary.id
proxied = true
steering_policy = "dynamic_latency" # route each PoP to its fastest pool
session_affinity = "none"
}
Explanation: Declares a health-monitored primary pool, a fallback pool for failover, latency-based dynamic steering, and proxy mode so the load balancer runs at the edge. Encoding this in Terraform makes steering changes a reviewed pull request rather than an untracked dashboard edit.
TypeScript: Vercel Edge Middleware with geo routing and header injection
import { NextRequest, NextResponse } from 'next/server';
export const config = { matcher: ['/((?!_next|favicon.ico).*)'] };
export function middleware(req: NextRequest) {
const country = req.geo?.country ?? 'XX';
// Redirect a non-served jurisdiction at the edge, before origin
if (country === 'XX') {
return NextResponse.redirect(new URL('/unavailable', req.url));
}
const res = NextResponse.next();
res.headers.set('x-edge-region', req.geo?.region ?? 'unknown');
res.headers.set('x-edge-country', country);
return res;
}
Explanation: Intercepts every request except static assets, makes a geo decision at the edge, redirects unserved regions before any origin work, and injects region/country headers for downstream segmentation. The matcher keeps the function off asset paths so you do not burn CPU budget on files the CDN already serves.
JSON: Cloudflare rate-limiting rule via the Rulesets API
{
"description": "Limit login attempts per IP",
"expression": "http.request.uri.path eq \"/api/login\"",
"action": "block",
"ratelimit": {
"characteristics": ["ip.src"],
"period": 60,
"requests_per_period": 10,
"mitigation_timeout": 600
}
}
Explanation: Caps /api/login at ten requests per IP per minute and blocks offenders for ten minutes, throttling credential stuffing at the perimeter without touching application code. Deploy it in log action first to confirm the threshold matches legitimate traffic before switching to block.
Edge cases & warnings
| Scenario | Impact | Mitigation |
|---|---|---|
| DNS TTL set too high during origin migration | Resolvers keep returning the decommissioned IP for the full TTL, sending users to dead origins and producing 5xx errors long after cutover. | Lower TTL to 60s at least 48 hours before migration; cut over in a low-traffic window; confirm with dig +noall +answer against several resolvers. |
| Serverless function exceeds CPU time limit | The isolate is terminated mid-request, returning 5xx and breaking critical routing or auth paths. | Profile warm-path CPU; move heavy work to origin or a queue; add circuit breakers; keep per-request logic under the tier’s budget. |
| Cache poisoning via unvalidated query params | Attacker-varied query strings each mint a unique cache key, exhausting edge cache and stampeding the origin. | Normalize URLs and strip non-essential params before key generation; scope Vary narrowly; cap cache key cardinality. |
| WAF or rate-limit rule deployed straight to block | A too-broad expression black-holes legitimate traffic globally within seconds of deploy. | Ship every new rule in log/observe mode, validate match rate against real traffic, then promote to block. |
| Route ordering puts a wildcard above an exact path | The wildcard shadows the specific route, silently misrouting or intercepting requests meant for another handler. | Order exact paths before wildcards; verify the provider’s evaluation order; add a contract test per critical route. |
Frequently Asked Questions
How does edge routing differ from traditional CDN caching? Edge routing executes serverless logic at the PoP to dynamically route, authenticate and transform requests before they reach origin, whereas a traditional CDN mainly serves static cached assets. The two compose: routing decides what to do, caching decides whether to do it at all. See CDN Caching & Performance Optimization for the caching half.
What is the optimal TTL for DNS records in an edge architecture? A TTL of 60–300 seconds balances fast re-steering during incidents against authoritative query volume in steady state. Drop it to 60s ahead of any planned origin migration, then raise it once traffic stabilizes; there is no single correct value, only the right value for your failover requirements.
Can serverless edge functions maintain state across requests? No — edge functions are stateless by design and may run in any PoP. Externalize state to a distributed KV store, a coordination primitive like Durable Objects, or an origin API, and accept eventual consistency for anything that does not strictly need a single writer.
How do I prevent cold starts in serverless edge deployments? Run on V8-isolate platforms (Cloudflare Workers, Vercel Edge), which switch contexts in microseconds and effectively have no cold start. Keep bundles small and dependency trees shallow; reserve container-based functions for work that genuinely needs a full runtime, and place that work behind the origin rather than on the request hot path.
Where should authentication happen — edge or origin? Validate tokens at the edge so unauthenticated and abusive traffic is rejected before consuming origin capacity, as detailed in API Gateway at the Edge. Keep the signing authority and session source of truth at or behind the origin; the edge verifies, it does not mint.
Related
- Cloudflare Workers Routing
- Vercel Edge Middleware
- API Gateway at the Edge
- Request/Response Transformation
- Load Balancing at the Edge
- Geo-Targeted Traffic Routing
- WAF & Rate Limiting at the Edge
Back to Edge & DNS Ops Guide