SRE

168,551 requests/day saturating PHP-FPM: solved with nginx srcache + Valkey via stunnel

Published on March 31, 2026

168.551 requisições/dia saturando PHP-FPM — resolvido com nginx srcache + Valkey via stunnel na OCI

The problem: ISR cascade at scale

The portal used Next.js with ISR (Incremental Static Regeneration) to serve a glossary with tens of thousands of terms. The `revalidate: 60` configuration instructed Next.js to regenerate each page every 60 seconds — which, with 36,508 pages in the glossary, created a continuous and uninterrupted cycle of calls to the internal WordPress API.

The access pattern to the internal endpoint looked like this:

GET /wp-json/api/v1/glossary?per_page=100&page=1
GET /wp-json/api/v1/glossary?per_page=100&page=2
...
GET /wp-json/api/v1/glossary?per_page=100&page=36508

With `revalidate: 60`, this full cycle of 36,508 requests repeated every minute. On peak days, the volume reached 168,551 requests per day just for this endpoint. With 150–201 PHP-FPM workers available, the socket returned `Resource temporarily unavailable` within less than 60 seconds of each cycle. Bursts lasted 1–2 minutes and recurred every 60–90 minutes.

p-queue with concurrency 10 was tried first and didn't solve it: with 36k pages and revalidate 60s, the queue never empties before the next cycle starts. The total request volume doesn't drop — only the concurrency peak is smoothed.

The solution: nginx srcache + Valkey via stunnel

The cache architecture was built in three layers, all running on the same host (OCI VM):

Next.js ISR → nginx (OpenResty) with srcache → stunnel (127.0.0.1:6379) → OCI Cache Valkey (TLS)

Nginx with the srcache module intercepts requests to /wp-json/ before they reach PHP-FPM. If the response is in Valkey, it returns directly — PHP is never called. On a cache miss, PHP is called, the response is stored in Valkey, and subsequent identical requests are served from cache.

Why stunnel?

OCI Cache requires a TLS connection. OpenResty's redis_pass and redis2_pass modules don't support TLS natively — they connect to Redis in plaintext. The solution is stunnel running as a local proxy: nginx connects to 127.0.0.1:6379 (plaintext), stunnel encrypts and forwards to the OCI Cache FQDN on port 6379 via TLS.

stunnel configuration

/etc/stunnel/redis-oci.conf

[redis-oci]
client  = yes
accept  = 127.0.0.1:6379
connect = <oci-cache-fqdn>.redis.sa-saopaulo-1.oci.oraclecloud.com:6379
verifyChain = no

Enabling as a persistent systemd service:

systemctl enable stunnel@redis-oci --now

Connectivity test before configuring nginx:

redis-cli -h 127.0.0.1 -p 6379 PING  # should return PONG

nginx srcache configuration (OpenResty)

The critical point is using redis.conf instead of wpfc.conf. The fundamental difference:

wpfc.conf (fastcgi_cache): has a skip for $query_string — requests with ?per_page=100&page=N are NOT cached.

redis.conf (srcache): no skip for query string — each ?per_page=100&page=N is cached individually. Exactly the behavior needed for ISR.

The srcache configuration block for /wp-json/:

location ^~ /wp-json/ {
    set $key          "$scheme$host$request_uri";
    set $escaped_key  $key;

    srcache_fetch_skip             $skip_cache;
    srcache_store_skip             $skip_cache;
    srcache_response_cache_control off;
    srcache_fetch GET  /redis-fetch $key;
    srcache_store PUT  /redis-store key=$escaped_key;

    more_set_headers 'X-SRCache-Fetch-Status $srcache_fetch_status';
    more_set_headers 'X-SRCache-Store-Status $srcache_store_status';

    fastcgi_pass php;
    include fastcgi_params;
}

The Redis upstream points to the local stunnel:

upstream redis {
    server 127.0.0.1:6379;
    keepalive 512;
}

Critical trap: try_files breaks everything

The first version of the /wp-json/ block used `try_files $uri $uri/ /index.php?$args` — a common pattern in WordPress configs. This caused a 301 redirect to / on all GET requests to /wp-json/: the chain was try_files → internal redirect to /index.php → `location = /index.php { return 301 /; }` defined in redis.conf.

Never use try_files in locations that don't serve static files. For /wp-json/, use fastcgi_pass directly.

Trap: testing cache with curl -sI

During validation, testing with `curl -sI` (HEAD request) always returns `X-SRCache-Store-Status: BYPASS` — without a body in the response, srcache stores nothing. The correct test is with GET:

# First request: MISS (PHP called, response stored)
curl -s https://portal.example.com/wp-json/api/v1/glossary?per_page=100&page=1 \
  -o /dev/null -w '%{http_code} %header{X-SRCache-Fetch-Status}'
# 200 MISS

# Second request: HIT (Valkey, PHP not called)
# 200 HIT

WP Redis Object Cache on the same database

WordPress was also configured with Redis Object Cache (plugin) pointing to the same Valkey instance via stunnel, using database 0 — the same one as nginx srcache.

// wp-config.php
define( 'WP_REDIS_HOST',     '127.0.0.1' );
define( 'WP_REDIS_PORT',     6379 );
define( 'WP_REDIS_DATABASE', 0 );

The decision to share database 0 was intentional: in emergency situations, a FLUSHDB via the WP Redis plugin clears both the object cache and the nginx cache — desirable behavior when immediate complete invalidation is needed.

OCI network configuration

The WordPress VM and the OCI Cache cluster are in different subnets. It was necessary to add ingress rules to the cache cluster's Security List allowing TCP 6379 from the VM's subnets:

# VM subnet (public):   10.1.0.0/24
# OCI Cache subnet (private): 10.1.1.0/24
# Rule added: Ingress TCP 6379 from 10.1.0.0/24

Important: always use the OCI Cache FQDN, never the private IP. Private IPs of OCI managed services can change. The FQDN is automatically updated by the service.

The result

After the configuration, the cache hit rate reached 93% on the first day. PHP-FPM stopped saturating. The 168,551 requests/day continue reaching nginx — but 93% of them are answered directly by Valkey, without touching PHP.

X-SRCache-Fetch-Status: HIT → Valkey responded, PHP was not called

X-SRCache-Fetch-Status: MISS → first request, PHP called, response cached

X-SRCache-Store-Status: BYPASS → request with auth cookie, not cached (correct behavior)

The definitive fix for the problem is in code — increasing revalidate in Next.js from 60s to 3600s would reduce ~168k to ~36k requests/day. But while the development team implements it, the infra absorbs the load without degradation.

Stack summary

OpenResty (nginx + srcache + redis2 modules) — intercepts /wp-json/ requests before PHP

stunnel — local TLS proxy (127.0.0.1:6379 → OCI Cache FQDN:6379)

OCI Cache (Valkey 7.2) — managed Redis-compatible cluster, $19/month, sa-saopaulo-1 region

WP Redis Object Cache — database 0, same cluster, unified invalidation

Total cache infra cost: $19/month. Savings in PHP-FPM and CPU: not directly measurable, but avoided horizontal scaling of the OKE cluster that was being considered before the solution.

168,551 requests/day saturating PHP-FPM: solved with nginx srcache + Valkey via stunnel

The problem: ISR cascade at scale

The solution: nginx srcache + Valkey via stunnel

Why stunnel?

stunnel configuration

nginx srcache configuration (OpenResty)

Critical trap: try_files breaks everything

Trap: testing cache with curl -sI

WP Redis Object Cache on the same database

OCI network configuration

The result

Stack summary

Next.js ISR with multiple pods: why revalidatePath only invalidates 1 of 2 pods — and the fan-out pattern via Headless Service that fixes it

10 traps I hit when deploying Kubernetes on Oracle Cloud (OKE)

wp-login taking 1 minute: how auth_basic behind a CDN creates an invisible 401 loop in nginx

504 with no high CPU, no queue, no RDS: when the infrastructure is green but the payment gateway stopped responding

168,551 requests/day saturating PHP-FPM: solved with nginx srcache + Valkey via stunnel

The problem: ISR cascade at scale

The solution: nginx srcache + Valkey via stunnel

Why stunnel?

stunnel configuration

nginx srcache configuration (OpenResty)

Critical trap: try_files breaks everything

Trap: testing cache with curl -sI

WP Redis Object Cache on the same database

OCI network configuration

The result

Stack summary

Related articles

Next.js ISR with multiple pods: why revalidatePath only invalidates 1 of 2 pods — and the fan-out pattern via Headless Service that fixes it

10 traps I hit when deploying Kubernetes on Oracle Cloud (OKE)

wp-login taking 1 minute: how auth_basic behind a CDN creates an invisible 401 loop in nginx

504 with no high CPU, no queue, no RDS: when the infrastructure is green but the payment gateway stopped responding