10 traps I hit when deploying Kubernetes on Oracle Cloud (OKE)
Published on March 10, 2026

Context
I was tasked with migrating the entire infrastructure of a high-volume e-commerce portal from AWS to Oracle Cloud Infrastructure (OCI). The motivation was purely economic — a projected 78% reduction in monthly cloud costs.
The AWS stack ran ECS Fargate for Next.js, ARM EC2 for WordPress, managed database, and ALB. The OCI target was OKE (Oracle Kubernetes Engine) with ARM64 nodes on VM.Standard.A1.Flex — which falls under Oracle's Always Free Tier.
I'm CKA certified, so Kubernetes itself wasn't the concern. The OCI-specific quirks were.
Here are the 10 traps I hit, documented in real time during execution.
1. Port 12250 — poorly documented
Workers stayed NotReady for 20 minutes with no clear error. Port 12250 must be open in the NSG for the OKE endpoint. Without it, nodes cannot register with the control plane. After adding the ingress rule, nodes registered in under 2 minutes.
2. The CCM creates its own Load Balancer
The oci-load-balancer-id annotation is ignored by OCI CCM v1.34. It always creates a new LB. The CCM-managed LB IP is stable in practice while the Kubernetes Service exists.
3. OCI WAF Access Control does NOT bypass Protection Rules
Access Control and Protection Rules are independent pipelines. An ALLOW rule in Access Control has no effect on Protection Rules execution. Bypass must be implemented as a condition on the Protection Rule itself:
!contains(to_string(keys(http.request.cookies)), 'wordpress_logged_in_')
4. http.request.cookies vs http.request.headers.cookie
Only the parsed object (http.request.cookies) works reliably in OCI WAF runtime conditions. The raw header string (http.request.headers.cookie) does not behave as expected.
5. OCI Cache requires TLS — OpenResty doesn't support it natively
OCI Cache (Valkey-based) mandates TLS. OpenResty's srcache Redis module doesn't support TLS. Solution: stunnel as a local TLS proxy between OpenResty (127.0.0.1:6379) and the OCI Cache endpoint.
6. try_files in a REST API location causes 301 loops
Using try_files in a /wp-json/ location sends REST API requests through internal redirects, causing 301 loops. Fix: use fastcgi_pass directly without file existence checks.
7. File.exists? was removed in Ruby 3.2
fluent-plugin-oci-logging uses File.exists?, removed in Ruby 3.2. The Fluentd pod crashes silently. Fix: patch the gem files in the Dockerfile with sed to replace File.exists? with File.exist?.
8. OKE Dynamic Groups require compartment OCID
Dynamic group rules for OKE nodes must use instance.compartment.id, not the node pool OCID. Using the node pool OCID results in nodes not being recognized as group members.
9. CRI-O on Oracle Linux 8 enforces fully qualified image names
Short image names like prom/prometheus fail. All Kubernetes manifests must use fully qualified registry names: docker.io/prom/prometheus, registry.k8s.io/metrics-server/metrics-server.
10. required-server-files.json contains all baked build variables
To recover NEXT_PUBLIC_* environment variables from a production Next.js Docker image without CI/CD access: docker run --rm --entrypoint cat <image>:<tag> /app/.next/required-server-files.json. This file contains all env vars resolved at build time.
Conclusion
OKE is a solid managed Kubernetes option, especially with A1.Flex nodes in the Always Free Tier. But OCI's learning curve is real. Most of the issues above lack clear error messages pointing to root cause. If you're evaluating OKE for production, I hope this list saves you a few hours.


