Buzeli
buzeliSoluções Digitais
Kubernetes

OCI CCM v1.34: the annotations that don't exist and the Token Collision that freezes the LB

Published on April 21, 2026

OCI CCM v1.34: annotations que não existem e Token Collision 409

Context

During the migration of a Next.js application to OKE (Oracle Kubernetes Engine), we used a LoadBalancer-type Service so that the Cloud Controller Manager (CCM) would automatically provision and manage the OCI Load Balancer. The CCM version was v1.34, paired with the OKE control plane.

What should have been simple — declaring annotations on the Service and having CCM obey them — turned into a sequence of silent failures, 409 errors, and orphaned LBs accumulating in the account. This post documents what actually works, what is silently ignored, and the most dangerous trap: Token Collision.

Annotations that don't exist in OCI CCM v1.34

The official OCI CCM documentation lists a set of supported annotations. But when running real workloads, some annotations found in forums, community examples, or older versions simply don't work — and CCM gives no warning. It silently ignores them and continues.

1. Reusing an existing LB

Attempt: point CCM to a Load Balancer that already existed in the account, created manually before the cluster.

Copy
# FAILED — annotation does not exist in OCI CCM v1.34
service.beta.kubernetes.io/oci-load-balancer-id: "<ocid-of-manual-lb>"

Result: CCM completely ignored the annotation and created a new LB with an auto-generated UUID name. The manual LB remained active and billing, with no traffic.

2. Reserved IP (fixed IP for the LB)

Attempt: pin the Load Balancer's public IP using an OCI Reserved IP to avoid IP change if the Service were recreated.

Copy
# FAILED — annotation ignored, CCM creates its own IP
service.beta.kubernetes.io/oci-load-balancer-reserved-ip-id: "<ocid-of-publicip>"

# Also FAILED
oci.oraclecloud.com/load-balancer-ip: "192.0.2.4"

Result: CCM ignored both and provisioned the LB with an ephemeral IP (but stable as long as the Service exists). The Reserved IP remained in AVAILABLE state, generating cost without use (~$1.80/month).

In practice, the IP of the CCM-managed LB does not change as long as the Kubernetes Service is not deleted. For production DNS, this IP is stable enough — Reserved IP is unnecessary.

3. Security Rule Management via NSG

Attempt: use NSG mode so that CCM manages traffic rules via Network Security Groups instead of Security Lists.

Copy
# FAILED — requires IAM policies that basic OKE does not provision
service.beta.kubernetes.io/oci-load-balancer-security-rule-management-mode: "NSG"

# Also FAILED — 404 without VirtualNetwork Manage permission
service.beta.kubernetes.io/oci-network-security-groups: "<ocid-of-nsg>"

Result: 404 errors on the VirtualNetwork API. The CCM service account in basic OKE does not have the IAM policies required to create or modify NSGs. The working solution is to use `security-list-management-mode: None` and manage network rules manually.

Annotations confirmed working in OCI CCM v1.34

After discarding the problematic annotations, this is the set tested in production that works consistently:

Copy
annotations:
  # Subnet where the LB will be created
  service.beta.kubernetes.io/oci-load-balancer-subnet1: "<ocid-of-public-subnet>"

  # Flexible shape (recommended over fixed shape)
  service.beta.kubernetes.io/oci-load-balancer-shape: "flexible"
  service.beta.kubernetes.io/oci-load-balancer-shape-flex-min: "10"
  service.beta.kubernetes.io/oci-load-balancer-shape-flex-max: "100"

  # CCM does not touch Security Lists (manual management)
  service.beta.kubernetes.io/oci-load-balancer-security-list-management-mode: "None"

  # Health check via kube-proxy healthz (port 10256)
  service.beta.kubernetes.io/oci-load-balancer-health-check-protocol: "HTTP"
  service.beta.kubernetes.io/oci-load-balancer-health-check-path: "/healthz"
  service.beta.kubernetes.io/oci-load-balancer-health-check-retries: "3"
  service.beta.kubernetes.io/oci-load-balancer-health-check-interval: "10000"
  service.beta.kubernetes.io/oci-load-balancer-health-check-timeout: "5000"

  # Backend protocol
  service.beta.kubernetes.io/oci-load-balancer-backend-protocol: "HTTP"

  # HTTPS (when TLS secret is created in the cluster)
  service.beta.kubernetes.io/oci-load-balancer-ssl-ports: "443"
  service.beta.kubernetes.io/oci-load-balancer-tls-secret: "myapp/myapp-tls"

Token Collision: the most dangerous problem

Token Collision is the most silent trap in OCI CCM. It manifests as an HTTP 409 on the OCI API — and when it occurs, CCM stops reconciling the LB without emitting any clear alert in the cluster.

How the idempotency token works

OCI CCM generates an idempotency token for each Load Balancer creation operation. The token is derived from:

Copy
token = hash("cluster~createLoadBalancer~{serviceUID}")

The Service UID is fixed as long as the Service exists. The problem: any change to the Service spec (adding or removing an annotation, changing a port, altering the shape) does not change the UID — but it changes the content of the request sent to the OCI API. OCI returns 409 because the token was already used with a different payload.

Symptom

CCM enters a reconciliation loop unable to update the LB. Service events show repeated errors, but no high-level cluster log indicates what happened. The LB in OCI stays in the previous state, without reflecting the Service changes.

The only way out: delete and recreate the Service

There is no way to resolve Token Collision without recreating the Service. A new Service generates a new UID → new token → clean OCI operation. The correct procedure is:

Copy
# 1. Remove the finalizer to unblock deletion
kubectl patch service myapp-svc -n myapp -p   '{"metadata":{"finalizers":[]}}' --type=merge

# 2. Delete the Service (CCM deletes the LB automatically)
kubectl delete service myapp-svc -n myapp

# 3. Recreate the Service with the correct annotations
kubectl apply -f service.yaml

# 4. Reactivate WAF on the new LB (if a WAF policy is associated)
# The LB OCID changes — the WAF instance must be recreated
CRITICAL: never delete the Service in production without planning the WAF re-registration. CCM deletes the old LB when the Service is deleted — and the associated WAF instance loses its reference.

Golden rule to avoid Token Collision

Define the final Service spec before the first apply. Once created with a UID, any structural change (ports, shape annotations, subnet) requires the full delete + recreate cycle. Do not try to edit the Service in production with `kubectl edit` or `kubectl patch` to change functional annotations — this triggers the 409.

Mandatory ports the documentation doesn't highlight

Two network requirements are critical and appear only as secondary notes in the official documentation, but cause silent failures when absent.

Port 12250 — node registration

Worker nodes communicate with the OKE management endpoint over TCP port 12250. Without this ingress rule in the endpoint's NSG, nodes never complete registration.

Copy
# OKE endpoint NSG
Direction: INGRESS
Protocol: TCP
Source: Worker node NSG (or node subnet CIDR)
Destination port: 12250
Description: Workers -> OKE management endpoint

Symptom without port 12250: all nodes remain in `UnknownNodeError` state with the message "has not been seen for more than 20 minutes" — even when freshly provisioned by the node pool.

Port 10256 — Load Balancer health check

CCM configures the LB health check pointing to port 10256 on the nodes, which is the kube-proxy `/healthz` endpoint. Without the rule allowing this port from the LB subnet to the nodes, all backends are marked unhealthy and no traffic is routed.

Copy
# Worker node NSG (or Security List)
Direction: INGRESS
Protocol: TCP
Source: LB subnet CIDR (e.g. 10.1.0.0/24)
Destination port: 10256
Description: OCI LB health check -> kube-proxy healthz

Note: CCM uses 10256 as the default health check port regardless of any health-check-path annotations configured — the port is hardcoded in the controller behavior.

HTTPS managed by CCM: declare in the Service, not in the LB

A common trap is adding HTTPS listeners directly to the OCI Load Balancer via CLI or console. CCM reconciles the LB periodically (on every node or pod change) and deletes anything not declared in the Kubernetes Service.

The correct way to have HTTPS managed by CCM:

Copy
# 1. Create the TLS secret in the cluster
kubectl create secret tls myapp-tls -n myapp \
  --cert=myapp.crt \
  --key=myapp.key

# 2. Declare port 443 in the Service spec
spec:
  ports:
    - name: http
      port: 80
      targetPort: 3000
      protocol: TCP
    - name: https
      port: 443
      targetPort: 3000
      protocol: TCP

# 3. Add the TLS annotations
annotations:
  service.beta.kubernetes.io/oci-load-balancer-ssl-ports: "443"
  service.beta.kubernetes.io/oci-load-balancer-tls-secret: "myapp/myapp-tls"

With this configuration, CCM creates two listeners and two backend sets. When nodes change, CCM reconciles and recreates the listeners automatically — without manual intervention.

CCM backend auto-update: confirmed in ~2 minutes

After resolving all annotation and port problems, we validated CCM's auto-update behavior: we manually removed a node via the OCI console. The node pool provisioned a replacement, and CCM detected the new node and updated the LB backends in approximately 2 minutes, without intervention.

Copy
# Service events during node replacement
Warning  UnAvailableLoadBalancer  11m   service-controller  There are no available provisioned nodes for LoadBalancer
Normal   UpdatedLoadBalancer      9m21s (x4 over 13m)  service-controller  Updated load balancer with new hosts

The real downtime was not caused by CCM — it was dominated by container image pull time (2.1 GB taking 1m34s per new node). CCM does its part; the bottleneck is image size.

To reduce downtime during node replacements: use Next.js standalone output, multi-stage builds, and proper .dockerignore. A 2.1 GB image can drop to ~500 MB — and pull time follows.

Summary of traps

For anyone setting up OKE + CCM v1.34 for the first time, the main traps in order of impact:

1. Token Collision 409 — never edit the Service spec after creation. Any structural change requires delete + recreate with finalizer removal.

2. Missing port 12250 — nodes stay in UnknownNodeError indefinitely. Without this port in the endpoint NSG, the cluster does not work.

3. Missing port 10256 — all LB backends are unhealthy. Without this port, traffic never reaches applications.

4. Non-existent annotations — oci-load-balancer-id, reserved-ip-id and security-rule-management-mode: NSG are silently ignored. Don't use them.

5. HTTPS outside the Service — listeners added directly to the OCI LB are deleted by CCM on the next reconciliation.