Security

131 PHP-FPM crashes in 10 minutes: how a crawler locked the server via systemd-coredump (not via PHP)

Published on April 27, 2026

The high CPU alert that wasn't a CPU attack

Grafana fired a high CPU alert on a WordPress publisher. I connected to the server and ran the initial diagnosis.

ps aux --sort=-%cpu | head -12

The output showed 7 `systemd-coredump` processes at the top, collectively consuming 47% CPU. No PHP-FPM processes with high CPU. No abnormal nginx processes. The 'CPU attack' was the crash handler, not PHP.

When you see high CPU on systemd-coredump, the incident has already happened. systemd is compressing the remains of what crashed. The real question is: what generated 131 core dumps in 10 minutes?

The 131 core dumps and 4 GB of disk

Checking the core dump directory confirmed the scale of the problem:

ls -lh /var/lib/systemd/coredump/ | grep 'php-fpm' | wc -l
# 131

du -sh /var/lib/systemd/coredump/
# 4.0G

ls -lh /var/lib/systemd/coredump/ | grep 'php-fpm' | head -5
# core.php-fpm.33.abc1234.1234567890.xz  31M
# core.php-fpm.33.abc1235.1234567891.xz  31M
# core.php-fpm.33.abc1236.1234567892.xz  31M

131 core dump files, each approximately 31 MB after compression by systemd-coredump, totaling 4 GB. All with UID 33 — which is www-data, the PHP-FPM worker user. Disk usage had jumped from 39% to 50% in under 10 minutes.

systemd-coredump compresses each core dump in the background using lz4 or xz. With 131 simultaneous crashes, there were 7 compression processes running in parallel, consuming nearly half the server's CPU capacity while trying to process the dump queue.

The root cause: crawler from range <crawler-range>/24

Investigation in the nginx error.log revealed the pattern:

grep '<crawler-prefix>' /var/log/nginx/error.log | head -10
# 2026/03/03 14:20:15 [error] upstream timed out (110) GET /post-about-dogs client: <crawler-IP-1>
# 2026/03/03 14:20:17 [error] upstream timed out (110) GET /another-post client: <crawler-IP-1>
# 2026/03/03 14:20:19 [error] upstream timed out (110) GET /tag/animals client: <crawler-IP-1>
# ...95 timeout entries from the same range
# 2026/03/03 14:33:04 [error] ModSecurity: Access denied 403 GET /.env client: <crawler-IP-2>

Starting at 14:20, a crawler from the `<crawler-range>/24` range launched a parallel sweep against WordPress posts, tag pages (/tag/), and category pages (/categoria/). The site used GoCache CDN, but this category of URLs — tags and categories — had a cold cache. Each request reached PHP-FPM uncached.

Why PHP-FPM crashed

The WordPress publisher had an 8-vCPU server with PHP-FPM configured with multiple workers. Under bombardment of parallel requests to uncached pages, workers began colliding — multiple workers attempting to generate the same tag page with heavy content simultaneously, exhausting memory before finishing. A PHP-FPM worker that hits memory_limit generates SIGSEGV and produces a core dump.

The /.env access attempt at 14:33 confirmed the crawler's profile: it wasn't a legitimate indexing bot. It was reconnaissance scanning with credential exposure attempts.

The degradation loop

The sequence that led to load 15.18 on an 8-vCPU server:

14:20 — Crawler starts parallel requests to /tag/ and /category/

14:20-14:30 — PHP-FPM workers exhaust memory and crash (131 occurrences)

14:20+ — systemd-coredump generates 131 files of ~31 MB each = 4 GB on disk

14:30 — Load peaks at 15.18 (7x above normal ~0.5 for this server)

14:31 — Crawler backs off / workers stop crashing

14:38 — Load drops to 0.82

14:44 — Load normalized: 0.02

The response actions

1. Clear the core dumps (priority: free CPU and disk)

With 7 compression processes still running, the first action was to break the cycle and free resources:

# Remove all php-fpm core dumps
sudo rm -f /var/lib/systemd/coredump/core.php-fpm.*

# Result:
# Disk: 50% → 39%
# systemd-coredump CPU: 47% → 0%
# Load began dropping immediately

2. Block the crawler range

# Block the attacker range via iptables
sudo iptables -I INPUT -s <CRAWLER_CIDR> -j DROP   -m comment --comment 'crawler-ban: php-fpm crash 2026-03-03'

# Verify the rule was applied
sudo iptables -L INPUT -n | grep '<crawler-prefix>'

The server normalized completely within 6 minutes after clearing the dumps and banning the range. Load from 15.18 to 0.02. PHP-FPM with 10 healthy workers. Site responding in 0.32s.

The correct diagnosis vs the initial diagnosis

The alert said 'high CPU'. The natural initial diagnosis would be: volumetric attack, stuck process, PHP consuming CPU. None of those were right.

# Process state during the incident (reconstructed from logs)
# Top 5 by CPU at peak (14:30):
#
# PID    USER     %CPU  COMMAND
# 12341  systemd  9.2   systemd-coredump  ← compressing dump 1
# 12342  systemd  8.8   systemd-coredump  ← compressing dump 2
# 12343  systemd  8.7   systemd-coredump  ← compressing dump 3
# 12344  systemd  7.9   systemd-coredump  ← compressing dump 4
# 12345  systemd  7.4   systemd-coredump  ← compressing dump 5
# ...
# No php-fpm processes with high CPU — because they had already crashed

The high CPU was in the crash handler, not in PHP. This is a common misdiagnosis pattern: systemd-coredump processes dumps in the background and appears at the top of ps/top as if it were the attacker, when in reality it's the cleanup service. The attacker has already finished its work.

Why cold cache on /tag/ and /category/ was decisive

GoCache CDN was configured in forward mode for these URLs. Under normal conditions, tag and category pages are accessed organically by few users and the cache stays warm. A crawler accessing hundreds of unique tag URLs in parallel won't find cache — each URL is new to the CDN.

With cold cache, each request reached PHP-FPM. A WordPress tag page can be heavy — multiple database queries (posts in the tag, sidebar, related content), PHP rendering a full template. Under 50+ parallel requests to different tags, memory pressure on workers is proportional to the number of simultaneous requests.

The combination of aggressive crawler + cold cache + enabled systemd-coredump creates a silent failure that looks like a CPU attack but is actually a crash cascade. The server isn't overloaded by traffic — it's overloaded cleaning up the crash aftermath.

Missing protections and what to implement

This server had ModSecurity active (the /.env access was blocked with 403), but the following protections were absent or disabled:

Bot mitigation on GoCache: status false. With bot mitigation active, the crawler would have been blocked at the edge before reaching the origin.

Rate limiting on GoCache: status false. Per-IP request throttling at the edge is the first line of defense against aggressive crawlers.

CrowdSec bouncer: Docker container running, but host bouncer inactive. CrowdSec detects the sweep pattern and bans automatically — without an active bouncer, detection doesn't produce blocking.

PHP-FPM pm.max_children: no limit configured to prevent a crash loop from exhausting resources. Also configuring SystemMaxUse in coredump.conf limits disk impact.

# Limit total core dump size in systemd
# /etc/systemd/coredump.conf
[Coredump]
Storage=external
Compress=yes
ProcessSizeMax=2G
ExternalSizeMax=2G
MaxUse=1G          # max 1 GB total in /var/lib/systemd/coredump/
KeepFree=1G        # keep at least 1 GB free on the filesystem

# Apply without reboot:
sudo systemctl daemon-reload

With MaxUse=1G, systemd-coredump automatically discards old dumps when the limit is reached — preventing a 131-crash attack from filling the disk and prolonging the crisis with compression CPU.

Final state and lesson

Six minutes after identifying the real cause, the server was normalized. The sequence:

rm -f /var/lib/systemd/coredump/core.php-fpm.* — 4 GB freed, CPU normalized

iptables -I INPUT -s <crawler-range>/24 -j DROP — crawler blocked

The server returned to its pre-incident state with no restart required. The healthy PHP-FPM workers that hadn't crashed continued serving traffic normally throughout the entire response process.

When ps/top shows systemd-coredump at the top with high CPU, don't try to kill systemd-coredump. Identify the process that crashed (the dump's UID points to the user), find out what caused the crash, and only then clean up the dumps. Killing the handler without understanding the cause leaves the disk full and the incident undiagnosed.

131 PHP-FPM crashes in 10 minutes: how a crawler locked the server via systemd-coredump (not via PHP)

The high CPU alert that wasn't a CPU attack

The 131 core dumps and 4 GB of disk

The root cause: crawler from range <crawler-range>/24

Why PHP-FPM crashed

The degradation loop

The response actions

1. Clear the core dumps (priority: free CPU and disk)

2. Block the crawler range

The correct diagnosis vs the initial diagnosis

Why cold cache on /tag/ and /category/ was decisive

Missing protections and what to implement

Final state and lesson

141 OWASP rules active, zero false positives: configuring OCI WAF for WordPress

ModSecurity blocked its own CDN: when the WAF doesn't know it's behind Akamai and bans the edge nodes

Malicious WordPress redirect with zero infected files: how to diagnose DNS hijack in 5 minutes before wiping the server

wp-login taking 1 minute: how auth_basic behind a CDN creates an invisible 401 loop in nginx

131 PHP-FPM crashes in 10 minutes: how a crawler locked the server via systemd-coredump (not via PHP)

The high CPU alert that wasn't a CPU attack

The 131 core dumps and 4 GB of disk

The root cause: crawler from range <crawler-range>/24

Why PHP-FPM crashed

The degradation loop

The response actions

1. Clear the core dumps (priority: free CPU and disk)

2. Block the crawler range

The correct diagnosis vs the initial diagnosis

Why cold cache on /tag/ and /category/ was decisive

Missing protections and what to implement

Final state and lesson

Related articles

141 OWASP rules active, zero false positives: configuring OCI WAF for WordPress

ModSecurity blocked its own CDN: when the WAF doesn't know it's behind Akamai and bans the edge nodes

Malicious WordPress redirect with zero infected files: how to diagnose DNS hijack in 5 minutes before wiping the server

wp-login taking 1 minute: how auth_basic behind a CDN creates an invisible 401 loop in nginx