What can I do if the pods of a Kubernetes cluster on the data plane cannot access the IP address of the CLB instance that is configured in an ingress gateway? - Alibaba Cloud Service Mesh

When the ingress gateway service uses externalTrafficPolicy: Local, pods on some nodes can reach the Classic Load Balancer (CLB) IP address while pods on other nodes cannot.

Symptoms

After you add a Kubernetes cluster to your Service Mesh (ASM) instance and configure a CLB instance for the ingress gateway with externalTrafficPolicy: Local:

Pods on some nodes can access the CLB IP address of the ingress gateway.
Pods on other nodes cannot access the same CLB IP address.

Diagnose the issue

Before you apply a fix, confirm that externalTrafficPolicy: Local is causing the problem.

Check the externalTrafficPolicy setting of the ingress gateway service: If the output is Local, this issue applies.
```
kubectl get svc istio-ingressgateway -n istio-system -o jsonpath='{.spec.externalTrafficPolicy}'
```

List the nodes that run ingress gateway pods: Note the NODE column.

kubectl get pods -n istio-system -l app=istio-ingressgateway -o wide

Identify the node of the failing pod: If the failing pod runs on a node that does not appear in the output from step 2, the root cause is confirmed.
```
kubectl get pod <failing-pod-name> -n <namespace> -o wide
```

Root cause

When externalTrafficPolicy is set to Local, kube-proxy adds iptables (or IP Virtual Server (IPVS)) rules that only route traffic to endpoints on the local node. The following sequence shows how kube-proxy processes an in-cluster request to the CLB IP address:

A pod sends a request to the CLB external IP address.
kube-proxy recognizes the CLB IP as the external IP of the ingress gateway service and intercepts the request instead of forwarding it to the CLB.
kube-proxy looks for ingress gateway endpoints on the same node as the requesting pod.
If an ingress gateway pod exists on that node, kube-proxy forwards the request to it. The request succeeds.
If no ingress gateway pod exists on that node, kube-proxy finds no local endpoint. The request fails.

This is the expected Kubernetes behavior for externalTrafficPolicy: Local. For details, see Why kube-proxy add external-lb's address to node local iptables rule?.

Solutions

Choose a solution based on whether you need to preserve source IP addresses and your cluster's CNI plugin.

Solution	Preserves source IP	Prerequisites	Configuration change
Use the service DNS name	Yes (for external traffic)	None	None
Set externalTrafficPolicy to Cluster	No	None	Modify `IstioGateway` CRD
Use ENI direct connection	Yes	Terway CNI with ENI mode	Modify `IstioGateway` CRD and add annotation

Use the service DNS name (recommended)

Instead of using the CLB external IP address from inside the cluster, access the ingress gateway through its ClusterIP or DNS name. ClusterIP traffic uses a separate kube-proxy routing path that forwards to all endpoints cluster-wide, bypassing the externalTrafficPolicy: Local restriction.

Use the following service DNS name for in-cluster access:

istio-ingressgateway.istio-system

For example, to send a request from a pod:

curl http://istio-ingressgateway.istio-system

Note

Note: This approach requires no configuration changes, works with any CNI plugin, and preserves source IP addresses for external traffic entering through the CLB.

Set externalTrafficPolicy to Cluster

If you do not need to preserve source IP addresses, set externalTrafficPolicy to Cluster. This tells kube-proxy to forward traffic to ingress gateway pods on any node, not just the local one.

Update the IstioGateway Custom Resource Definition (CRD):

apiVersion: istio.alibabacloud.com/v1beta1
kind: IstioGateway
metadata:
  name: ingressgateway
  namespace: istio-system
  ....
spec:
  externalTrafficPolicy: Cluster
....

For details about the CRD fields, see CRD fields for a gateway.

Note

Note: With externalTrafficPolicy: Cluster, incoming requests undergo Source Network Address Translation (SNAT), which replaces the client source IP with the node's internal IP. If your application relies on client source IP addresses, use one of the other solutions.

Use ENI direct connection

If your cluster uses elastic network interfaces (ENIs) through Terway or runs in inclusive ENI mode, set externalTrafficPolicy to Cluster while preserving source IP addresses through direct ENI connection.

Update the IstioGateway CRD to set externalTrafficPolicy: Cluster and add the service.beta.kubernetes.io/backend-type: eni annotation:

apiVersion: istio.alibabacloud.com/v1beta1
kind: IstioGateway
metadata:
  name: ingressgateway
  namespace: istio-system
  ....
spec:
  externalTrafficPolicy: Cluster
  maxReplicas: 5
  minReplicas: 2
  ports:
    - name: status-port
      port: 15020
      targetPort: 15020
    - name: http2
      port: 80
      targetPort: 80
    - name: https
      port: 443
      targetPort: 443
    - name: tls
      port: 15443
      targetPort: 15443
  replicaCount: 2
  resources:
    limits:
      cpu: '2'
      memory: 2G
    requests:
      cpu: 200m
      memory: 256Mi
  runAsRoot: false
  serviceAnnotations:
    service.beta.kubernetes.io/backend-type: eni
  serviceType: LoadBalancer

For details about the CRD fields, see CRD fields for a gateway.

Verify the fix

After you apply your chosen solution, verify that the previously failing pod can reach the ingress gateway:

kubectl exec <failing-pod-name> -n <namespace> -- curl -s -o /dev/null -w "%{http_code}" http://istio-ingressgateway.istio-system

A 200 response code (or another valid HTTP response) confirms that the pod can reach the ingress gateway.