Buzeli
buzeliSoluções Digitais
Kubernetes

10 traps I hit when deploying Kubernetes on Oracle Cloud (OKE)

Published on March 10, 2026

10 armadilhas ao subir Kubernetes no Oracle OKE — Buzeli Digital

Context

I was tasked with migrating the entire infrastructure of a high-volume e-commerce portal from AWS to Oracle Cloud Infrastructure (OCI). The motivation was purely economic — a projected 78% reduction in monthly cloud costs.

The AWS stack ran ECS Fargate for Next.js, ARM EC2 for WordPress, managed database, and ALB. The OCI target was OKE (Oracle Kubernetes Engine) with ARM64 nodes on VM.Standard.A1.Flex — which falls under Oracle's Always Free Tier.

I'm CKA certified, so Kubernetes itself wasn't the concern. The OCI-specific quirks were.

Here are the 10 traps I hit, documented in real time during execution.

1. Port 12250 — poorly documented

Workers stayed NotReady for 20 minutes with no clear error. Port 12250 must be open in the NSG for the OKE endpoint. Without it, nodes cannot register with the control plane. After adding the ingress rule, nodes registered in under 2 minutes.

2. The CCM creates its own Load Balancer

The oci-load-balancer-id annotation is ignored by OCI CCM v1.34. It always creates a new LB. The CCM-managed LB IP is stable in practice while the Kubernetes Service exists.

3. OCI WAF Access Control does NOT bypass Protection Rules

Access Control and Protection Rules are independent pipelines. An ALLOW rule in Access Control has no effect on Protection Rules execution. Bypass must be implemented as a condition on the Protection Rule itself:

!contains(to_string(keys(http.request.cookies)), 'wordpress_logged_in_')

4. http.request.cookies vs http.request.headers.cookie

Only the parsed object (http.request.cookies) works reliably in OCI WAF runtime conditions. The raw header string (http.request.headers.cookie) does not behave as expected.

5. OCI Cache requires TLS — OpenResty doesn't support it natively

OCI Cache (Valkey-based) mandates TLS. OpenResty's srcache Redis module doesn't support TLS. Solution: stunnel as a local TLS proxy between OpenResty (127.0.0.1:6379) and the OCI Cache endpoint.

6. try_files in a REST API location causes 301 loops

Using try_files in a /wp-json/ location sends REST API requests through internal redirects, causing 301 loops. Fix: use fastcgi_pass directly without file existence checks.

7. File.exists? was removed in Ruby 3.2

fluent-plugin-oci-logging uses File.exists?, removed in Ruby 3.2. The Fluentd pod crashes silently. Fix: patch the gem files in the Dockerfile with sed to replace File.exists? with File.exist?.

8. OKE Dynamic Groups require compartment OCID

Dynamic group rules for OKE nodes must use instance.compartment.id, not the node pool OCID. Using the node pool OCID results in nodes not being recognized as group members.

9. CRI-O on Oracle Linux 8 enforces fully qualified image names

Short image names like prom/prometheus fail. All Kubernetes manifests must use fully qualified registry names: docker.io/prom/prometheus, registry.k8s.io/metrics-server/metrics-server.

10. required-server-files.json contains all baked build variables

To recover NEXT_PUBLIC_* environment variables from a production Next.js Docker image without CI/CD access: docker run --rm --entrypoint cat <image>:<tag> /app/.next/required-server-files.json. This file contains all env vars resolved at build time.

Conclusion

OKE is a solid managed Kubernetes option, especially with A1.Flex nodes in the Always Free Tier. But OCI's learning curve is real. Most of the issues above lack clear error messages pointing to root cause. If you're evaluating OKE for production, I hope this list saves you a few hours.