This topic describes how to diagnose and troubleshoot LoadBalancer service issues.
Background information
When you set the service type toType=LoadBalancer, the ACK Cloud Controller Manager (CCM) component automatically creates or configures an Alibaba Cloud Classic Load Balancer (CLB) instance for the service. This includes resources such as the CLB instance, listeners, and backend server groups. For more information about the automatic update policy for CLB, see Notes on Server Load Balancer configurations for services.
Diagnostic process
Ensure that your CCM component version is V1.9.3.276-g372aa98-aliyun or later. For more information about how to upgrade the CCM component, see Upgrade the CCM component. For the release notes of CCM, see Cloud Controller Manager.
Run the following command to identify the service associated with the CLB instance.
kubectl get svc -A |grep -i LoadBalancer|grep {XXX.XXX.XXX.XXX} # XXX.XXX.XXX.XXX is the load balancer IP address.Run the following command to check whether the service has error events.
kubectl -n {your-namespace} describe svc {your-svc-name}If error events exist, see Service error events and solutions.
If no error events exist, see Troubleshooting methods.
Service error events and solutions
The following table describes the solutions for different error messages.
Error message | Description and solution |
| The number of backend servers for the CLB instance has reached the quota limit. Solution: Optimize your quota usage in one of the following ways:
|
| Shared CLB instances do not support ENIs. Solution: If the CLB backend uses ENIs, you must select a high-performance CLB instance. You can add the Important Make sure that the annotation is compatible with your CCM version. For more information about the mapping between annotations and CCM versions, see Use annotations to configure a Classic Load Balancer (CLB) instance. |
| The CLB instance has no backend servers. Check whether the service is associated with a pod and whether the pod is running as expected. Solution:
|
| The CLB instance cannot be associated based on the service. Solution: Log on to the Server Load Balancer console. In the region where the service resides, search for the CLB instance based on the
|
| Your account has an overdue payment. |
| Your account balance is insufficient. |
| The CLB OpenAPI is being throttled. Solution:
|
| The listener associated with the vServer group cannot be deleted. Solution:
|
| This error occurs if you reuse an internal-facing CLB instance that is not in the same VPC as the cluster. Solution: Make sure that your CLB instance and cluster are in the same VPC. |
| The number of available IP addresses in the vSwitch is insufficient. Solution: Use |
| The ENI mode does not support a string value for Solution: Change the value of the |
| Earlier versions of CCM create shared CLB instances by default. However, shared CLB instances are discontinued. Solution: Upgrade the CCM component. |
| The resource group of a CLB instance cannot be changed after the instance is created. Solution: Remove the |
| The specified ENI IP address cannot be found in the VPC. Solution: Check whether the |
| The billing method of the CLB instance cannot be changed from pay-as-you-go to pay-by-specification. Solution:
|
| This error occurs when a CLB instance created by CCM is reused. Solution:
|
| The type of a CLB instance cannot be changed after the instance is created. This error occurs if you change the CLB type after you create the service. Solution: Delete and recreate the service. |
| The service is already attached to a CLB instance and cannot be attached to another one. Solution: You cannot reuse an existing CLB instance by changing the CLB ID in the |
Troubleshooting methods
For issues that are not service errors, troubleshoot them as described in the following table.
Issue category | Symptom | Solution |
CLB access issues | Uneven load distribution across CLB backends | |
A 503 error occurs when you access the CLB instance during an application update | A 503 error occurs when you access the CLB instance during an application update | |
The CLB instance cannot be accessed from within the cluster | ||
The CLB instance cannot be accessed from outside the cluster | The CLB instance cannot be accessed from outside the cluster | |
An error " | ||
Configuration class for CLB | Service annotations do not take effect | |
The CLB configuration is modified | ||
Reusing an existing CLB instance does not take effect | ||
No listener is configured when an existing CLB instance is reused | Why is no listener configured when I reuse an existing CLB instance? | |
Inconsistent CLB backends | ||
CLB deletion issues | The CLB instance is deleted | |
The CLB instance is not deleted after the service is deleted |
Uneven load distribution across CLB backends
Cause
The scheduling algorithm of the CLB instance is not properly configured.
Symptom
The load is unevenly distributed across the backend servers of the CLB instance.
Solution
For a service in Local mode (
externalTrafficPolicy: Local), set the CLB scheduling algorithm to weighted round-robin. You can do this by adding theservice.beta.kubernetes.io/alibaba-cloud-loadbalancer-scheduler:"wrr"annotation to the service.If your application uses persistent connections, the load may be unbalanced because multiple requests are sent over each connection. In this case, set the CLB scheduling algorithm to weighted least connections by adding the
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-scheduler:"wlc"annotation to the service.
A 503 error occurs when you access the CLB instance during an application update
Cause
Connection draining is not configured for the CLB listener, or graceful termination is not configured for the pod.
Symptoms
When you access the CLB instance during an application update, a 503 error is returned.
Solution
Use annotations such as
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drainto configure connection draining for the CLB listener. For more information about the annotations, see Common operations to manage listeners.Configure
preStopandreadinessProbefor the pod based on your container network mode.readinessProbeis a readiness probe. A pod is added to an Endpoint only after the pod passes the readiness probe. After ACK detects a change in the Endpoint, it attaches the node to the backend of the CLB instance. You must properly configure the probe frequency, delay, and unhealthy threshold for thereadinessProbe. Some applications take a long time to start. If the timeout period is too short, the pod may restart repeatedly.Set the
preStopperiod to the time required for the application to process all remaining requests. Set theterminationGracePeriodSecondsperiod to a value that is at least 30 seconds longer than thepreStopperiod.
Pod configuration example:
apiVersion: v1 kind: Pod metadata: name: nginx namespace: default spec: containers: - name: nginx image: nginx # Liveness probe livenessProbe: failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 30 successThreshold: 1 tcpSocket: port: 5084 timeoutSeconds: 1 # Readiness probe readinessProbe: failureThreshold: 3 initialDelaySeconds: 30 periodSeconds: 30 successThreshold: 1 tcpSocket: port: 5084 timeoutSeconds: 1 # Graceful exit lifecycle: preStop: exec: command: - sleep - 30 terminationGracePeriodSeconds: 60
The CLB instance cannot be accessed from within the cluster
Cause
The externalTrafficPolicy: Local setting is configured for the Service. When this setting is used, kube-proxy forwards traffic only to local endpoints. If a request originates from a node that does not have a backend pod for the Service, the request fails. Although the CLB address is intended for access from outside the cluster, requests from within the cluster are still routed by kube-proxy based on this policy.
If no backend service pod for the Service exists on the node that receives the request, a network connection failure occurs. If a backend service pod exists on the node, the access is successful. For more information about this issue, see kube-proxy adds external-lb address to node-local iptables rule.
Symptom
The CLB instance cannot be accessed from within the cluster.
Solution
From within the Kubernetes cluster, access the service using its ClusterIP or service name.
The service name of the Ingress is
nginx-ingress-lb.kube-system.Change the value of externalTrafficPolicy in the LoadBalancer service to Cluster. However, this causes the source IP address to be lost in the application. The following command shows how to modify the Ingress service:
NoteIf you use an Ingress CLB instance, a pod can access a service that is exposed through the Ingress or CLB instance only on the node where the Ingress pod resides.
kubectl edit svc nginx-ingress-lb -n kube-systemIf your cluster uses Terway with ENIs or multiple IP addresses per ENI, you can change the value of externalTrafficPolicy in the LoadBalancer service to Cluster and add an annotation for ENI pass-through, such as
annotation: service.beta.kubernetes.io/backend-type: "eni". The following code provides an example. This method preserves the source IP address and lets you access the service from within the cluster without issues. For more information, see Use annotations to configure a Classic Load Balancer (CLB) instance.apiVersion: v1 kind: Service metadata: annotations: service.beta.kubernetes.io/backend-type: eni labels: app: nginx-ingress-lb name: nginx-ingress-lb namespace: kube-system spec: externalTrafficPolicy: Cluster
The CLB instance cannot be accessed from outside the cluster
Cause
An access control list (ACL) is configured for the CLB instance, or the CLB instance is not running as expected.
Symptom
The CLB instance cannot be accessed from outside the cluster.
Solution
Run the following command to view the service event information and resolve any error events. For more information, see Service error events and solutions.
kubectl -n {your-namespace} describe svc {your-svc-name}Check whether an ACL is configured for the CLB instance.
If an ACL is configured, check whether the ACL allows access from the client IP address. For more information about CLB ACL configurations, see Resource Access Management.
Check whether the CLB vServer group is empty.
If the vServer group is empty, check whether the application pod is associated with the service and whether the application pod is running as expected. If the associated pod is abnormal, locate and resolve the pod issue. For more information, see Troubleshoot pod issues.
Check whether the health check for the CLB listener is normal.
If the CLB health check is abnormal, check whether the application pod is running as expected. For more information about CLB health check issues, see CLB health check FAQ.
Cannot connect to the backend HTTPS service
Cause
After a certificate is configured on the CLB instance, decryption is performed on the CLB instance. As a result, the requests that are sent to the backend pods are HTTP requests.
Symptoms
Cannot connect to the backend HTTPS service.
Solution
Set the targetPort that corresponds to the HTTPS port in the service to an HTTP port. For example, if Nginx uses HTTPS port 443, the corresponding targetPort must be changed to 80.
Configuration example:
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-protocol-port: "https:443"
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-cert-id: "${YOUR_CERT_ID}"
name: nginx
namespace: default
spec:
ports:
- port: 80
protocol: TCP
targetPort: 80
- port: 443
protocol: TCP
targetPort: 80
selector:
run: nginx
type: LoadBalancer