Buzeli
buzeliSoluções Digitais
Security

Cloud Armor active, 139 attacks blocked in 30 days — but zero real traffic passing through: how direct VM DNS bypasses GCP's WAF

Published on May 3, 2026

The infrastructure that looked protected

The client's GCP infrastructure had everything in the right place: 6 Compute Engine instances in us-west1-c, a Cloud SQL MySQL 8.0 with 200 GB SSD, and a Cloud Load Balancer at IP 198.51.100.19. The Cloud Armor policy was attached to the LB backend, with active OWASP rules — and the logs showed activity: 139 blocks in 30 days, all real attacks (LFI, PHP RCE attempts, scanners).

The first read was reassuring. The WAF was working, blocking, with no false positives. Until the DNS analysis.

Copy
# DNS verified via Cloudflare API
# www.client-media.com.br → 198.51.100.200 (direct VM IP, proxied=True)
# client-media.com.br     → 198.51.100.200 (direct VM IP, proxied=True)
# GCP LB at: 198.51.100.19 — NOT in the current traffic path

The GCP Load Balancer, with Cloud Armor applied, was at IP 198.51.100.19. DNS pointed to 198.51.100.200 — the direct VM IP. Real user traffic never passed through the LB. The 139 blocks in the logs came from scanners that had somehow discovered the LB IP — they represented zero protection for production traffic.

WAF correctly configured, applied to the right backend, with working rules — completely irrelevant to real traffic. The only requirement for Cloud Armor to protect the application is that traffic passes through the LB. If DNS doesn't point to the LB, there is no protection.

Why the previous fix attempt failed

This was not the first diagnosis of this problem. A previous attempt to point DNS to the LB had been reverted after generating 404 errors. The root cause of that second problem was identified in the nginx configuration:

Copy
# server_name in nginx — problematic configuration
server {
    listen 80;
    server_name client-media.com.br;   # root domain only
    # www.client-media.com.br MISSING
    ...
}

When DNS was pointed to the LB, requests with Host: www.client-media.com.br arrived at nginx without a matching server_name. Nginx fell back to the default server and returned 404. The apparent fix was 'revert the DNS' — when the real fix was to add www to server_name before the cutover.

The failure pattern here is the same in any cloud: the security fix (WAF) and the routing fix (nginx) need to be coordinated. Doing one without the other generates the revert that leaves the WAF useless for more months.

The anti-pattern: WAF on the LB, DNS direct to VM

This pattern is more common than it seems and exists in all clouds, not just GCP:

GCP: Cloud Armor applied to HTTP(S) Load Balancer backend, but DNS pointing to the external IP of the Compute Engine instance.

AWS: WAF associated with ALB, but domain pointing directly to EC2 IP or to an old ELB.

Cloudflare WAF: Active protection on the Cloudflare proxy, but server with port 80/443 exposed directly to the internet — origin pull bypasses the proxy.

In all cases, the bypass vector is the same: DNS or network configuration creates a path that avoids the security control. The WAF only processes traffic that flows through it.

Copy
# Quick check: is traffic passing through the WAF?

# GCP: check if the LB is in the path
gcloud compute forwarding-rules list --format="table(name,IPAddress,target)"
dig +short www.client-media.com.br

# If dig IP != LB IP, the WAF is not in the path

# AWS: check if domain resolves to the ALB
nslookup www.mysite.com.br
# Should return ALB DNS (*.elb.amazonaws.com), not an EC2 IP

# Cloudflare: check if proxied (orange cloud in dashboard)
# IP should return 104.x.x.x (Cloudflare range), not the server IP

The second critical risk: Cloud SQL without PITR

During the infrastructure audit, a second critical risk was identified — unrelated to the WAF, but equally severe:

Copy
# Cloud SQL — availability and backup configuration
# Availability: ZONAL (no replica in another zone)
# Binary log: DISABLED
# Automatic backup: daily at 03:00 UTC, 7-day retention — SUCCESSFUL

# Consequence of disabled binary log:
# - Point-in-Time Recovery (PITR) unavailable
# - In case of data corruption or accidental deletion,
#   recovery only possible to the last daily backup
# - Potential data loss: up to 24 hours

With ZONAL availability and binary log disabled, the data loss window in a corruption or accidental deletion incident is up to 24 hours — the time between daily backups. For a system with transactional data, this means potential loss of all transactions for the day.

Two critical risks in the same client, discovered in a single audit: the WAF was not protecting real traffic, and the database had no PITR. Neither generated any alert. Both were invisible without an active audit.

The coordinated remediation plan

Fixing the WAF bypass required three coordinated changes, in the correct order:

1. Nginx: add www.client-media.com.br to server_name before any DNS change.

2. Certificate: verify that the SSL certificate on the LB covers the www domain (the previous cert on the LB had expired — a new one would be needed before cutover).

3. DNS: only after nginx and certificate are validated, point www and root to the LB IP (198.51.100.19).

Copy
# Step 1: fix nginx (before DNS)
# Edit /etc/nginx/sites-available/client-media.com.br
server {
    listen 80;
    listen 443 ssl;
    server_name client-media.com.br www.client-media.com.br;
    ...
}
nginx -t && nginx -s reload

# Step 2: validate that nginx responds for www via LB
curl -H "Host: www.client-media.com.br" http://198.51.100.19/
# should return 200, not 404

# Step 3: only then perform DNS cutover
# www.client-media.com.br → 198.51.100.19
# client-media.com.br     → 198.51.100.19

For Cloud SQL: enabling binary log and configuring Regional availability (replica in another zone) are changes that require a maintenance window with instance restart. Both were documented in the client's Risk Map as critical risks pending execution.

How to audit whether your WAF is actually in the path

The simplest check: the IP that DNS resolves to should be the IP of your WAF/proxy/LB — not the IP of the origin server.

Copy
# Test 1: compare DNS IP with LB IP
DOMAIN="www.mysite.com.br"
DNS_IP=$(dig +short $DOMAIN | head -1)
LB_IP="YOUR_LB_IP"

if [ "$DNS_IP" = "$LB_IP" ]; then
    echo "OK: traffic passing through LB/WAF"
else
    echo "ALERT: DNS resolves to $DNS_IP, LB at $LB_IP — WAF may be bypassed"
fi

# Test 2 (GCP): check real access logs in Cloud Armor
# If there is legitimate traffic (real user-agents, varied BR IPs),
# Cloud Armor is in the path.
# If logs show only bots/scanners, real traffic is going directly.
gcloud logging read   'resource.type="http_load_balancer" AND jsonPayload.enforcedSecurityPolicy.name="policy-www-client-media-com-br"'   --limit=20   --format="table(timestamp,jsonPayload.remoteIp,jsonPayload.requestUrl,jsonPayload.enforcedSecurityPolicy.outcome)"

A WAF that shows only scanner blocks — and never logs of legitimate browsing (normal paths, browser user-agents, real user IPs) — is a sign that real traffic is going through another path.

The lesson: path position is the zero requirement

Cloud Armor, AWS WAF, Cloudflare — all are inline controls. They only protect traffic that flows through them. No matter how well-configured the policy, how refined the rules, how zero the false positives: if DNS doesn't point to the WAF, the policy doesn't exist for real traffic.

Infrastructure security audits need to verify not just the control configuration, but the traffic path. In projects with a history of DNS changes, partial migrations, or configurations made by different teams at different times, DNS bypass is one of the easiest risks to create and the hardest to notice.

139 blocks in 30 days looked like evidence of protection. They were evidence that scanners had discovered the LB IP. Real traffic — users, buyers, Google crawlers — never passed through the WAF.