Kubernetes

Next.js ISR with multiple pods: why revalidatePath only invalidates 1 of 2 pods — and the fan-out pattern via Headless Service that fixes it

Published on May 4, 2026

The problem: ISR cache is per process

The architecture was headless WordPress + Next.js running on Kubernetes. WordPress updated a post, fired a webhook to the external Load Balancer, the LB forwarded to one of the Next.js pods, and the pod called revalidatePath. From WordPress's perspective, invalidation had worked — HTTP 200, success log.

The problem appeared in production intermittently: sometimes users saw new content immediately, sometimes it took hours. The pattern became clear with 2 pods running: half of visits served new content, half served the old cache.

Next.js ISR cache lives in the Node.js process memory. revalidatePath invalidates the cache of the process that received the call. The other pod doesn't know the invalidation happened — and keeps serving the cached version until the ISR timeout expires naturally.

This is not a Next.js bug. It is the correct behavior of a per-process system. The problem is that most architectures implicitly assume a single process. When there are multiple pods, each has its own cache and there is no native synchronization mechanism between them.

Why the obvious solution doesn't work

The first idea would be to have WordPress call all pods directly. But there is a network problem: pods use Flannel overlay and have IPs in the 10.244.x.x range — these IPs are not routable outside the cluster.

# Network topology (Flannel overlay)
# WordPress VM (OCI):  10.1.0.x (VCN network, routable)
# Pod A (Next.js):     10.244.1.5 (Flannel overlay, NOT routable outside cluster)
# Pod B (Next.js):     10.244.2.8 (Flannel overlay, NOT routable outside cluster)
# OCI external LB:     <OCI_LB_PUBLIC> (routable, distributes to pods)

# WordPress → 10.244.1.5:3000  ← FAILS: IP not routable outside cluster
# WordPress → <OCI_LB_PUBLIC>   ← OK: accesses 1 random pod via LB

Other approaches were also discarded: a NodePort with dnsmasq on the WordPress VM was blocked by the worker node NSG rules; Valkey pub/sub would solve pod discovery but would still need an HTTP call back to the pod to execute revalidatePath — same complexity, more components.

The solution must use the cluster network. Pods can call each other via Flannel IPs. The pod that receives the call from WordPress is the entry point — and it has access to the cluster's internal network to propagate the invalidation to other pods.

The solution: Headless Service for pod discovery

A normal Kubernetes Service with ClusterIP maintains a virtual IP and kube-proxy distributes connections between pods. To discover the real IPs of all pods, we need a Headless Service — with clusterIP: None, the cluster DNS returns one A record per active pod.

# k8s/service-headless.yaml
apiVersion: v1
kind: Service
metadata:
  name: nextjs-headless
  namespace: myapp
spec:
  clusterIP: None
  selector:
    app: nextjs
  ports:
    - name: http
      port: 3000
      targetPort: 3000

With this Service created, a DNS query for nextjs-headless.myapp.svc.cluster.local returns one A record for each pod with label app=nextjs that is Running. With 2 pods, it returns 2 IPs. If HPA scales to 3 pods, it returns 3 IPs — automatically, with no additional configuration.

# Verification: how many pods are registered?
kubectl get endpoints nextjs-headless -n myapp
# NAME               ENDPOINTS                         AGE
# nextjs-headless   10.244.1.5:3000,10.244.2.8:3000   2d

The fan-out pattern: the primary pod propagates to others

The pod that receives the call from WordPress is the primary pod in this transaction. It executes three things in sequence:

1. Executes revalidatePath/revalidateTag locally (invalidates its own cache).

2. Resolves nextjs-headless.myapp.svc.cluster.local via DNS → gets list of all pod IPs.

3. Filters its own IP (injected via Downward API) and calls each other pod with ?fanout=1 in parallel (fire-and-forget).

Pods that receive the call with fanout=1 execute only local revalidation and do not propagate — this prevents infinite loops.

// app/src/app/siteapi/revalidate/route.js (simplified)
import dns from "dns/promises";
import { revalidatePath, revalidateTag } from "next/cache";

const SECRET = process.env.REVALIDATE_SECRET;
const POD_IP = process.env.POD_IP;  // injected via Downward API

export async function GET(request) {
  const { searchParams } = new URL(request.url);

  if (searchParams.get("secret") !== SECRET) {
    return Response.json({ error: "unauthorized" }, { status: 401 });
  }

  const slug = searchParams.get("slug");
  const isFanout = searchParams.get("fanout") === "1";

  // Invalidate local cache
  revalidatePath(`/${slug}/`);

  // If this is a fan-out call, stop here
  if (isFanout) {
    return Response.json({ ok: true, pod: POD_IP, fanout: true });
  }

  // Primary pod: propagate to other pods
  try {
    const addresses = await dns.resolve4("nextjs-headless.myapp.svc.cluster.local");
    const otherPods = addresses.filter((ip) => ip !== POD_IP);

    const fanoutUrl = new URL(request.url);
    fanoutUrl.searchParams.set("fanout", "1");

    // Fire-and-forget: does not wait for response
    for (const ip of otherPods) {
      const url = fanoutUrl.toString().replace(fanoutUrl.hostname, ip).replace(fanoutUrl.port, "3000");
      fetch(url, { signal: AbortSignal.timeout(5000) }).catch(() => {
        console.error(`[revalidate] fan-out to ${ip} failed`);
      });
    }

    console.log(`[revalidate] primary pod=${POD_IP} fan-out to [${otherPods.join(", ")}]`);
  } catch (err) {
    console.error("[revalidate] fan-out error:", err.message);
  }

  return Response.json({ ok: true, pod: POD_IP });
}

Injecting POD_IP via Downward API

The pod IP is not available as an environment variable by default. The Kubernetes Downward API allows injecting pod metadata (IP, name, namespace) as environment variables without any API server call.

# k8s/deployment.yaml — relevant env vars
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nextjs
  namespace: myapp
spec:
  template:
    spec:
      containers:
        - name: nextjs
          image: registry.example.com/myapp/nextjs:latest
          env:
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: REVALIDATE_SECRET
              valueFrom:
                secretKeyRef:
                  name: nextjs-secrets
                  key: revalidate-secret

With this configuration, process.env.POD_IP inside the Next.js container returns the Flannel IP of the current pod — exactly what we need to filter the IP list returned by the Headless Service.

The complete flow after implementation

With fan-out implemented, the cache invalidation flow looked like this:

# Event sequence after publish in WordPress:
#
# 1. WordPress saves post → fires GET to external LB
#    GET https://www.mysite.com.br/siteapi/revalidate?secret=TOKEN&slug=my-post
#
# 2. LB routes to a random pod (e.g.: Pod A, 10.244.1.5)
#
# 3. Pod A (primary pod):
#    - revalidatePath("/my-post/")  ← local cache invalidated
#    - dns.resolve4("nextjs-headless.myapp.svc.cluster.local")
#      → [10.244.1.5, 10.244.2.8]
#    - filters 10.244.1.5 (own IP)
#    - fetch("http://10.244.2.8:3000/siteapi/revalidate?...&fanout=1")  ← fire-and-forget
#    - returns 200 to WordPress
#
# 4. Pod B (fanout):
#    - revalidatePath("/my-post/")  ← local cache invalidated
#    - returns 200 (does not propagate)
#
# Result: both pods invalidated in <100ms after save_post

Expected logs in the pods after a save_post:

# kubectl logs -f -n myapp -l app=nextjs --prefix=true | grep revalidate
[pod/nextjs-abc] [revalidate] primary pod=10.244.1.5 slug=my-post
[pod/nextjs-abc] [revalidate] fan-out from 10.244.1.5 to [10.244.2.8]
[pod/nextjs-abc] [revalidate] fan-out to 10.244.2.8 → 200
[pod/nextjs-xyz] [revalidate] fanout pod=10.244.2.8 slug=my-post

The critical detail: SECRET compiled into the bundle

There is an important pitfall in this architecture. The validation SECRET for the revalidation endpoint was compiled into the Next.js bundle — not as a runtime environment variable. In Next.js, environment variables without the NEXT_PUBLIC_ prefix are substituted at build time by default when used in statically compiled Route Handlers.

Practical consequence: if the revalidation token needs to be rotated, it is not enough to update the Kubernetes Secret and restart the pods. A full rebuild of the Next.js image with the new token compiled in is required.

The correct solution is to move the SECRET to a Kubernetes Secret and ensure the Route Handler reads the variable at runtime — using process.env in a server-side function, not in static module scope. The safest approach is to configure the Route Handler with export const dynamic = 'force-dynamic' to guarantee runtime execution.

Generalizing the pattern

The per-process cache problem appears in contexts beyond Next.js ISR. Any in-memory state that needs synchronization between multiple pods faces the same challenge:

In-memory rate limiters: per-IP/user counters are isolated per pod.

Local session stores: a session created on Pod A doesn't exist on Pod B.

Configuration caches: feature flags or configs read from the database and stored in memory become stale asynchronously between pods.

The fan-out pattern via Headless Service works for any of these cases as long as the operation can be triggered via HTTP to each pod individually. For cases where state needs to be truly shared (distributed counters, sessions), the correct solution is to move the state to external storage (Redis/Valkey, database) — not synchronize between pods.

Two pods serving different content to the same user is one of the hardest problems to debug in production — because it is intermittent by nature (depends on which pod the LB routes each request to). The diagnosis starts by checking whether there is any in-memory state that should be shared but isn't.

Diagnostic reference

To investigate stale cache in production with this pattern:

# 1. Check if Headless Service has endpoints
kubectl get endpoints nextjs-headless -n myapp
# If empty: pods are not Running or label selector doesn't match

# 2. Check fan-out logs after saving a post
kubectl logs -f -n myapp -l app=nextjs --prefix=true | grep revalidate

# 3. Test the endpoint directly from inside the pod
kubectl exec -n myapp deployment/nextjs -- sh -c   "wget -qO- 'http://localhost:3000/siteapi/revalidate?secret=TOKEN&slug=SLUG'"

# 4. Check if POD_IP is being injected
kubectl exec -n myapp deployment/nextjs -- env | grep POD_IP

Next.js ISR with multiple pods: why revalidatePath only invalidates 1 of 2 pods — and the fan-out pattern via Headless Service that fixes it

The problem: ISR cache is per process

Why the obvious solution doesn't work

The solution: Headless Service for pod discovery

The fan-out pattern: the primary pod propagates to others

Injecting POD_IP via Downward API

The complete flow after implementation

The critical detail: SECRET compiled into the bundle

Generalizing the pattern

Diagnostic reference

168,551 requests/day saturating PHP-FPM: solved with nginx srcache + Valkey via stunnel

10 traps I hit when deploying Kubernetes on Oracle Cloud (OKE)

Fluentd DaemonSet on OKE ARM64: 8 sequential errors until logs reached OCI Logging

OCI CCM v1.34: the annotations that don't exist and the Token Collision that freezes the LB

Next.js ISR with multiple pods: why revalidatePath only invalidates 1 of 2 pods — and the fan-out pattern via Headless Service that fixes it

The problem: ISR cache is per process

Why the obvious solution doesn't work

The solution: Headless Service for pod discovery

The fan-out pattern: the primary pod propagates to others

Injecting POD_IP via Downward API

The complete flow after implementation

The critical detail: SECRET compiled into the bundle

Generalizing the pattern

Diagnostic reference

Related articles

168,551 requests/day saturating PHP-FPM: solved with nginx srcache + Valkey via stunnel

10 traps I hit when deploying Kubernetes on Oracle Cloud (OKE)

Fluentd DaemonSet on OKE ARM64: 8 sequential errors until logs reached OCI Logging

OCI CCM v1.34: the annotations that don't exist and the Token Collision that freezes the LB