Vercel Edge Middleware
Vercel Edge Middleware runs lightweight JavaScript in a V8 isolate at the network edge, intercepting every matched request before it reaches your application code, so you can rewrite, redirect, authenticate and decorate traffic with single-digit-millisecond overhead.
Key points:
- Middleware executes in a constrained edge runtime (Web APIs only, no Node.js built-ins) before rendering, API routes, and static delivery — design for sub-50ms CPU budgets.
- Deterministic routing is built from
NextResponse.rewrite(),redirect(),next(), and a precisematcherthat keeps the function off static assets. - Geo, headers and cookies are the primary signals for steering, A/B testing, and tenant isolation in multi-tenant SaaS.
- Cache-Control discipline at the edge prevents stale auth state, redirect-loop cache poisoning, and CDN-wide incidents.
This page is part of Edge Routing & Serverless Function Architecture and focuses on the Vercel-specific implementation of edge logic. Where the mechanism is portable, it links across to sibling platforms so you can compare execution models and pick the right edge for each workload.
Architecture and execution context
Middleware runs in a V8 isolate, the same primitive that powers Cloudflare Workers Routing, distributed across Vercel’s global edge network. There is no container, no cold Node.js process, and no warm-up of a full server — the isolate boots in well under a millisecond and shares the process with thousands of other tenants. That model is what makes per-request interception cheap, but it also dictates the constraints you must design around.
Execution happens strictly before Next.js rendering, before API route resolution, and before static asset delivery. The middleware sees the raw request, can short-circuit it entirely, and otherwise hands a (possibly modified) request down the chain. Because it sits in front of everything, a bug here fails the whole route, so the resource ceilings below are hard limits, not advisory.
| Constraint | Limit | Operational impact |
|---|---|---|
| Bundle size | ~1 MB (gzipped) | Heavy dependencies are rejected at build/deploy time. |
| CPU time | ~50 ms wall per invocation | Overrun terminates the request with a 500. |
| Memory | ~128 MB shared isolate budget | Large in-memory state risks eviction under pressure. |
| Runtime | Web standard APIs only | fs, net, crypto (Node build), and native addons are unavailable. |
Place middleware.ts (or .js) at the project root, or inside src/ if that is your source directory. A minimal file declares the runtime implicitly — Next.js compiles middleware to the Edge Runtime automatically — but you should pin Node tooling versions in package.json to keep CI builds reproducible:
{
"engines": {
"node": ">=18.0.0"
}
}
Every I/O path must be asynchronous and bounded. A synchronous loop, an unbounded fetch to a slow origin, or a large JSON parse can blow the CPU budget and surface as intermittent 500s that are painful to reproduce because they depend on input size and PoP load.
Core routing and request transformation
Deterministic steering is built from three response primitives and one matcher. NextResponse.next() passes the request through, optionally with added or rewritten headers. NextResponse.rewrite(url) serves a different path internally while the browser URL stays the same — ideal for proxying, localization, and tenant fan-out. NextResponse.redirect(url) returns a 3xx to the client and changes the visible URL. The matcher decides which requests ever reach the function at all, and getting it right is the single biggest lever on both correctness and cost.
import { NextRequest, NextResponse } from 'next/server';
export const config = {
// Run on everything except static assets and image optimizer output.
matcher: ['/((?!api|_next/static|_next/image|favicon.ico).*)'],
};
export function middleware(req: NextRequest) {
const url = req.nextUrl.clone();
if (url.pathname.startsWith('/docs')) {
url.pathname = url.pathname.replace(/^\/docs/, '/documentation');
return NextResponse.rewrite(url);
}
const res = NextResponse.next();
res.headers.set('x-edge', '1');
return res;
}
This keeps the function off _next/static and favicon.ico so static delivery stays on the fast path, transparently maps /docs/* onto /documentation/* without a client-visible redirect, and stamps a marker header on everything else. Reshaping the inbound request — adding x-tenant-id, normalizing Accept-Language, or stripping client-supplied trust headers — is the same class of work covered in depth under Request/Response Transformation; the Vercel idiom is simply to mutate the headers on the NextResponse you return.
Cookies follow the same pattern. Read with req.cookies.get('session') and write with res.cookies.set(name, value, { httpOnly: true, secure: true, sameSite: 'lax', maxAge: 3600 }). Never echo a client-supplied cookie back as trusted state without validation, because the edge is the first place an attacker’s header reaches your stack.
Provider-specific implementation
Vercel (native middleware)
The canonical Vercel pattern is a single middleware.ts returning a NextResponse. Geo data is attached to the request automatically and is the basis for region-aware routing:
import { NextRequest, NextResponse } from 'next/server';
export const config = { matcher: ['/((?!_next|favicon.ico).*)'] };
export function middleware(req: NextRequest) {
const country = req.geo?.country ?? 'US';
const res = NextResponse.next();
res.headers.set('x-request-region', country);
if (country === 'DE' || country === 'FR') {
const eu = req.nextUrl.clone();
eu.pathname = `/eu${req.nextUrl.pathname}`;
return NextResponse.rewrite(eu);
}
return res;
}
Vercel Edge Functions (Web handler)
For non-Next.js projects, the same runtime is exposed as an Edge Function with a Web Request/Response signature. This is closer to the raw isolate and useful when you want middleware-style logic without the Next.js routing layer:
export const config = { runtime: 'edge' };
export default function handler(req: Request): Response {
const url = new URL(req.url);
if (url.pathname === '/healthz') {
return new Response('ok', { headers: { 'cache-control': 'no-store' } });
}
return new Response(null, {
status: 307,
headers: { location: '/app' + url.pathname },
});
}
Cloudflare Workers (portable equivalent)
The same logic ports almost verbatim to a Worker because both run V8 isolates over Web APIs — the difference is the entrypoint and the rewrite mechanism (a re-issued fetch rather than a framework helper):
export default {
async fetch(request) {
const url = new URL(request.url);
if (url.pathname.startsWith('/docs')) {
url.pathname = url.pathname.replace(/^\/docs/, '/documentation');
return fetch(new Request(url, request));
}
return fetch(request);
},
};
AWS Lambda@Edge / CloudFront (event-shaped)
On AWS the model is event-driven rather than fetch-driven: a viewer-request handler receives a CloudFront event, mutates request.uri or request.headers, and returns it. The latency and cold-start profile differs materially, which is the subject of the dedicated comparison guides below.
export const handler = async (event) => {
const request = event.Records[0].cf.request;
if (request.uri.startsWith('/docs')) {
request.uri = request.uri.replace(/^\/docs/, '/documentation');
}
request.headers['x-edge'] = [{ key: 'x-edge', value: '1' }];
return request;
};
Platform comparison
| Provider | Mechanism | Wire behavior | Failover / notes |
|---|---|---|---|
| Vercel Edge Middleware | NextResponse from middleware.ts |
Internal rewrite keeps URL; redirect sends 3xx | Auto-deployed to all PoPs; rollback via vercel rollback. |
| Vercel Edge Function | Web Response handler |
Returns Response directly |
Same runtime, no Next.js router; good for non-Next apps. |
| Cloudflare Workers | fetch() handler, V8 isolate |
Rewrite via re-issued fetch(new Request(url)) |
Closest portable analog; routes bound by patterns. See Cloudflare Workers Routing. |
| AWS Lambda@Edge | CloudFront event mutation | Mutate request.uri / headers in place |
Higher cold-start, regional replication lag on deploy. |
For head-to-head latency and throughput numbers, the Vercel Edge vs Cloudflare Workers performance comparison breaks down p50/p99 under load so you can choose per workload rather than by reputation.
Geo-targeting and conditional routing
Location-aware routing reads req.geo — populated by Vercel from the edge PoP — and branches on it. Always supply a fallback, because geo is undefined in local development and behind some corporate proxies, and a missing default silently routes everyone through one branch.
import { NextRequest, NextResponse } from 'next/server';
const EU = new Set(['DE', 'FR', 'NL', 'IE', 'ES', 'IT']);
export function middleware(req: NextRequest) {
const country = req.geo?.country ?? 'US';
const res = NextResponse.next();
res.headers.set('x-request-region', country);
if (EU.has(country)) {
const eu = req.nextUrl.clone();
eu.pathname = `/eu${req.nextUrl.pathname}`;
return NextResponse.rewrite(eu);
}
return res;
}
This injects a region header for downstream services and rewrites EU traffic onto a compliant subtree. DNS-level and CDN-level geo steering live one layer below the application; when you need both — for example pinning a region at the resolver and refining at the edge — coordinate this with Geo-Targeted Traffic Routing. Mock the signal during development with a header your middleware reads first, e.g. const country = req.headers.get('x-debug-geo') ?? req.geo?.country ?? 'US';.
Cohort assignment is structurally the same conditional, just keyed on a sticky cookie or a hash of the visitor rather than geography. That pattern — assign once, persist via cookie, rewrite to a variant — is detailed in A/B testing with Vercel Edge middleware, and it is the cleanest way to run experiments without a client-side flicker, because the bucketing decision is made before any HTML is served.
Configuration and operational procedure
Bring middleware to production in a controlled sequence rather than shipping it straight to main:
- Author and scope. Write
middleware.tsand define the tightestmatcherthat still covers your routes. A loose matcher invokes the function on assets you never meant to touch and inflates both latency and invocation count. - Validate locally. Run
vercel devand exercise every branch — geo fallback, rewrite, redirect, and pass-through — usingx-debug-geoand cookie overrides to force each path. - Preview deploy. Push to a branch; Vercel builds a preview deployment with the middleware live on a unique URL. Test the real edge behavior, not just the local emulator, because the emulator does not reproduce PoP geo or true CPU limits.
- Inspect headers. Confirm
x-vercel-cache(HIT/MISS/STALE) andx-vercel-id(which PoP served it) match expectations on representative routes. - Promote. Merge to production; the deployment is atomic across all PoPs. There is no partial rollout window, so your preview testing is the safety net.
- Watch and roll back. Tail logs (below). If error rate or edge latency breaches your SLO, run
vercel rollbackto instantly repoint the alias at the previous deployment.
# Stream real-time edge logs for the current production deployment
vercel logs --follow
# Emulate the edge runtime locally with an inspector attached
vercel dev --listen 3000
# Instantly revert the production alias to the previous deployment
vercel rollback
Caching, TTL, and propagation implications
Middleware itself is not cached — it runs on every matched request — but the responses it shapes are, so its header decisions directly govern CDN behavior. The rule of thumb: set caching by route sensitivity, never globally. Public assets want long, immutable TTLs; SSR responses benefit from short shared TTLs with background revalidation; authenticated routes must never enter a shared cache.
| Route type | Recommended Cache-Control | Purpose |
|---|---|---|
| Static / public | public, max-age=31536000, immutable |
Maximize CDN hit ratio for fingerprinted assets. |
| Dynamic / SSR | public, s-maxage=3600, stale-while-revalidate=86400 |
Shared cache with background refresh; no user-blocking miss. |
| Authenticated / API | private, no-store, max-age=0 |
Prevent cross-user leakage and stale sessions. |
import { NextRequest, NextResponse } from 'next/server';
export function middleware(req: NextRequest) {
const res = NextResponse.next();
const path = req.nextUrl.pathname;
if (path.startsWith('/api') || path.startsWith('/account')) {
res.headers.set('Cache-Control', 'private, no-store, max-age=0');
res.headers.set('X-Content-Type-Options', 'nosniff');
res.headers.set('Strict-Transport-Security', 'max-age=31536000; includeSubDomains');
} else {
res.headers.set('Cache-Control', 'public, s-maxage=3600, stale-while-revalidate=86400');
}
return res;
}
You can also pin behavior declaratively in vercel.json, which is useful when the same rule must apply regardless of whether middleware runs:
{
"headers": [
{
"source": "/api/(.*)",
"headers": [{ "key": "Cache-Control", "value": "private, no-store" }]
}
]
}
A critical propagation gotcha: if middleware ever issues a redirect on a cacheable path, the CDN can cache the 3xx. A subtle bug that occasionally redirects then becomes sticky across every visitor hitting that PoP until the cache expires. Keep redirects on no-store paths, or gate them so the redirecting branch is never reachable on a cacheable route. The deeper mechanics of shared-cache TTLs and revalidation behavior are covered under the stale-while-revalidate guide.
Debugging and production observability
Edge debugging is header-driven. console.log() and console.error() stream to vercel logs --follow and the dashboard, but verbose logging on a hot path costs CPU, so log at boundaries (decision taken, branch chosen) rather than per-line. The two headers that matter most are x-vercel-cache — HIT, MISS, STALE, or BYPASS — and x-vercel-id, which encodes the serving PoP and helps you correlate a slow or misrouted request to a specific region.
# Trace cache status and serving PoP for a route
curl -sI https://example.com/docs/intro | grep -i 'x-vercel'
# Watch only error-level edge output
vercel logs --follow | grep -i error
When a request misbehaves, work top-down: confirm the matcher actually included the path (a request that skips middleware shows no marker header), then confirm which branch ran, then inspect the resulting Location/Cache-Control. If you suspect a redirect loop, count hops with a temporary x-redirect-count header and bail when it exceeds a small threshold. For multi-platform incidents — say you front Vercel with another CDN — compare the execution and failover model against Cloudflare Workers Routing so you know which layer owns the decision. If edge latency or error rate breaches SLO, vercel rollback is the fastest mitigation and should be the first move, with root-cause investigation following on the now-stable previous deployment.
Edge cases and gotchas
- Infinite redirect loops poison the CDN and can trip account-level protections. Always compare the target path against the current path before redirecting, and pass through with
next()when they match. - Bundle over the size limit fails the deploy outright. Audit
package.jsonfor Node-only modules, prefer Web-API libraries, and tree-shake; a singlecrypto/fs-dependent import can sink the whole function. - Undefined
req.geoin local dev and behind proxies routes everyone through the fallback branch. Use the nullish-coalescing default and ax-debug-geooverride so you can exercise every region. - Cookies over ~4 KB truncate headers and silently drop sessions. Store only a session ID or compact JWT at the edge and offload heavy state to Edge KV or an external store.
- CPU overrun is input-dependent. A handler that passes in dev can 500 in production on a larger payload or a busier PoP. Bound every parse and loop; never trust that “it worked locally.”
- Caching a redirect on a public path makes a transient bug permanent for that PoP. Keep middleware redirects on
no-storeroutes.
Frequently Asked Questions
Can Vercel Edge Middleware modify DNS records or TTL values? No. Middleware runs at the application layer after DNS resolution and CDN routing have already happened. DNS records and their TTLs are managed at your registrar or in the Vercel DNS dashboard, not from middleware code.
How do I stop middleware from running on static asset requests?
Scope it with the config.matcher array, excluding paths like /_next/static, /_next/image, /favicon.ico, and any asset prefixes. A tight matcher keeps static delivery on the fast path and cuts unnecessary invocations and cost.
What happens if middleware exceeds the ~50 ms CPU budget? The request is terminated with a 500. Because the limit is wall-CPU and input-dependent, optimize by removing synchronous work, bounding I/O, and moving heavy computation to a regular serverless or origin function. The Vercel Edge vs Cloudflare Workers performance comparison shows where each runtime sits under sustained load.
Is middleware compatible with multi-tenant SaaS and custom domains?
Yes. It executes per request and can read req.headers.get('host'), cookies, or a tenant token to route to tenant-specific origins, inject x-tenant-id, and enforce isolation — all before any rendering happens, which is also why it is the natural place to run A/B testing without a visible flicker.