After migrating my Kubernetes cluster from Calico to Cilium, enabling kube-proxy replacement and migrating from ingress-nginx to Cilium Ingress, the final step was moving to the Kubernetes Gateway API. What looked like a straightforward migration turned into several hours of debugging. Hopefully this post saves someone else the same time.
Environment
- Kubernetes 1.33.13
- Cilium 1.19.5
- Host-network Gateway deployment
- Gateway nodes selected via node labels
- Route53 health checks instead of a cloud LoadBalancer
My goal was simple:
- keep ingress-nginx running
- add Cilium Ingress and Cilium Gateway in parallel
- migrate applications one by one
- switch DNS when ready
1. Gateway API installation
Installing the CRDs is straightforward:
wget https://github.com/kubernetes-sigs/gateway-api/releases/download/v1.6.0/standard-install.yaml
kubectl apply --server-side -f standard-install.yaml
However, this immediately leads to the first compatibility issue.
2. Gateway API v1.5+ is NOT compatible with Cilium before 1.20
The Cilium operator immediately entered CrashLoopBackOff with:
failed to create gateway controller:
failed to setup field indexer
no matches for kind "TLSRoute"
in version "gateway.networking.k8s.io/v1alpha2"
Initially this was very confusing because Cilium documentation mentions Gateway API v1.5.1 support. The problem is the TLSRoute API version. Gateway API 1.5 promoted TLSRoute into the Standard channel and serves it as v1. Cilium 1.19 still expects gateway.networking.k8s.io/v1alpha2.
The workaround is:
- Gateway API Standard v1.4.1
- TLSRoute CRD from the v1.4.1 Experimental channel
After doing this the GatewayClass was finally accepted:
NAME CONTROLLER ACCEPTED
cilium io.cilium/gateway-controller True
3. Host Network listeners require NET_BIND_SERVICE
I wanted dedicated Gateway nodes, so my configuration looked like this:
gatewayAPI:
enabled: true
hostNetwork:
enabled: true
nodes:
matchLabels:
ingress: gateway
The Gateway was accepted but remained:
PROGRAMMED=False
The Envoy logs contained:
cannot bind '0.0.0.0:80': Permission denied
The fix is not obvious. Besides enabling host networking, Envoy also needs permission to bind privileged ports. This configuration solved it:
envoy:
securityContext:
capabilities:
keepCapNetBindService: true
envoy:
- NET_ADMIN
- SYS_ADMIN
- NET_BIND_SERVICE
The important part is that both settings are required. Simply enabling keepCapNetBindService is not enough.
4. Verifying host-network listeners
After the capability change:
ss -lntp | egrep '80|443'
returned:
LISTEN 0.0.0.0:80 users:(("cilium-envoy"))
LISTEN 0.0.0.0:443 users:(("cilium-envoy"))
which confirmed that Envoy was now listening directly on the host.
5. Gateway still reports Programmed=False
Interestingly, even though Envoy was listening correctly, the Gateway status still showed:
PROGRAMMED=False
ADDRESS=
Reason: AddressNotAssigned
However, this turned out to be only a status issue.
Creating a simple test deployment:
- nginx Deployment
- ClusterIP Service
- HTTPRoute
produced:
curl -H "Host: test.example.com" http://gateway-node/
HTTP/1.1 200 OK
server: envoy
Traffic successfully reached the backend through the Gateway. Functionally everything worked despite the Gateway status still reporting Programmed=False.
I suspect this is a limitation or bug in the current Cilium 1.19 implementation when using host networking without a LoadBalancer address.
Final architecture
The migration strategy allowed running three ingress technologies simultaneously:
ingress-nginx: Existing applications
Cilium Ingress: Fast migration away from ingress-nginx
Cilium Gateway API: Future production traffic
Each technology used dedicated nodes selected via labels:
ingress=nginxingress=ciliumingress=gateway
Route53 health checks handled failover between the selected nodes.
This made it possible to migrate applications one by one without risking the existing production traffic.
Conclusions
The migration itself is actually quite smooth once the hidden compatibility issues are understood. The two biggest pitfalls are:
- Gateway API 1.5+ vs Cilium 1.19 TLSRoute incompatibility
- Missing NET_BIND_SERVICE capability for host-network listeners
Once those are addressed, Gateway API works very well, and migrating existing Ingress resources to HTTPRoute becomes the easy part.
I expect both workarounds to disappear after upgrading to Cilium 1.20 together with Gateway API 1.6, but until then these are worth keeping in mind.
Complete Working Configuration
For Cilium 1.19.5 I ended up using the Gateway API v1.4.1 Standard CRDs together with the v1.4.1 Experimental TLSRoute CRD because the Gateway controller still expected the v1alpha2 TLSRoute API. Future Cilium releases should remove the need for this workaround.
The Cilium’s values.yaml:
envoy:
securityContext:
capabilities:
keepCapNetBindService: true
envoy:
- NET_ADMIN
- SYS_ADMIN
- NET_BIND_SERVICE
gatewayAPI:
enabled: true
hostNetwork:
enabled: true
nodes:
matchLabels:
ingress: gateway
ingressController:
enabled: true
default: false
loadbalancerMode: shared
hostNetwork:
enabled: true
sharedListenerPort: 80
nodes:
matchLabels:
ingress: cilium
kubeProxyReplacement: true
k8sServiceHost: kubeapi-ip
k8sServicePort: 6443