This topic describes the procedure for diagnosing LoadBalancer Services and how to troubleshoot errors.

Background information

If you specify Type=LoadBalancer for a Service, the Cloud Controller Manager (CCM) of Container Service for Kubernetes (ACK) automatically creates a Server Load Balancer (SLB) instance for the Service and configures listeners and vServer groups. For more information about the policies that are used to automatically update SLB resources, see Considerations for configuring a LoadBalancer type Service.

Diagnostic procedure

Make sure that the CCM version is 1.9.3.276-g372aa98-aliyun or later before troubleshooting. For more information about how to update the CCM, see Manually update the CCM. For more information about the release notes for the CCM, see Cloud Controller Manager.

Service troubleshooting process
  1. Run the following command to query the Service that is associated with the SLB instance:
    kubectl get svc -A |grep -i LoadBalancer|grep ${your-loadbalancer-id}
  2. Run the following command to check whether events are generated for Service errors:
    kubectl -n {your-namespace} describe svc {your-svc-name}
    Notice If no events are generated for Service errors, check whether the CMM version is 1.9.3.276-g372aa98-aliyun or later. For more information about how to view and update the CCM version, see Manually update the CCM.
  3. If the issue persists, Submit a ticket.

Service errors and solutions

The following table describes how to fix the errors that occur in Services.
Error message Description and solution
The backend server number has reached to the quota limit of this load balancers

The quota on the backend servers of the SLB instance is insufficient.

Solution: You can use the following methods to reduce the number of vServer groups that are required.
  • By default, an SLB instance supports 200 backend servers. For more information about how to query and increase the quota, see Quota Management page in the SLB console.
  • We recommend that you set externalTrafficPolicy of the SLB instance to Local (externalTrafficPolicy: Local). The system may create a large number of backend servers in Cluster mode. If you want to use the Cluster mode, we recommend that you use the label service.beta.kubernetes.io/alibaba-cloud-loadbalancer-backend-label to specify the backend servers that you want to use. This reduces the number of backend servers that are required. For more information about how to associate backend servers with an SLB instance, see Use annotations to configure load balancing.
  • If multiple Services share an SLB instance, all backend servers that are used by the Services are counted. We recommend that you create an SLB instance for each Service.
The loadbalancer does not support backend servers of eni type Shared-resource SLB instances do not support elastic network interfaces (ENIs).
Solution: If you want to specify an ENI as a backend server, create a high-performance SLB instance. Add the annotation: service.beta.kubernetes.io/alibaba-cloud-loadbalancer-spec: "slb.s1.small" annotation to the Service.
Notice Make sure that the annotations that you add meet the requirements of the CCM version. For more information about the correlation between annotations and CCM versions, see Common annotations.
There are no available nodes for LoadBalancer No backend server is associated with the SLB instance. Check whether pods are associated with the Service and whether the pods run as normal.
Solution:
  • If no pod is associated with the Service, associate application pods with the Service.
  • If the pods that are associated with the Service do not run as normal, identify the cause and troubleshoot the error. For more information, see Troubleshoot application errors in ACK.
  • If no backend server is associated with the SLB instance, but the pods run as normal, check whether the pods are deployed on master nodes. If the pods are deployed on master nodes, evict the pods to worker nodes. If the pods are not deployed on master nodes, Submit a ticket.
  • alicloud: not able to find loadbalancer named [%s] in openapi, but it's defined in service.loaderbalancer.ingress. this may happen when you removed loadbalancerid annotation
  • alicloud: can not find loadbalancer, but it's defined in service

The system fails to associate a Service with the SLB instance.

Solution: Log on to the SLB console, select the region where the SLB instance is deployed, and then find the SLB instance based on the EXTERNAL-IP of the Service.
  1. If the SLB instance does not exist and the Service is no longer used, delete the Service.
  2. If the SLB instance exists, perform the following steps:
    1. If the SLB is created in the SLB console, add the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id annotation in the Service. For more information, see Use annotations to configure load balancing.
    2. If the SLB instance is automatically created by the CCM, check whether the kubernetes.do.not.delete label is added to the SLB instance. If the label is not added to the SLB instance, add the label to the SLB instance. For more information, see How do I rename an SLB instance when the CCM version is V1.9.3.10 or earlier?.
ORDER.ARREARAGE Message: The account is arrearage. Your account has overdue payments.
PAY.INSUFFICIENT_BALANCE Message: Your account does not have enough balance. Your account balance is insufficient. Top up your account balance.
Status Code: 400 Code: Throttlingxxx API throttling is triggered for SLB.
Solution:
  1. Go to the Quota Management page in the SLB console and check whether the SLB resource quotas are sufficient.
  2. Run the following command to check whether errors occur in the Service. If errors occur in the Service, refer to the information provided in this table to troubleshoot the errors.
    kubectl -n {your-namespace} describe svc {your-svc-name}
Status Code: 400 Code: RspoolVipExist Message: there are vips associating with this vServer group. The listener that is associated with the vServer group cannot be deleted.
Solution:
  1. Check whether the annotation of the Service contains the ID of the SLB instance. Example: service.beta.kubernetes.io/alibaba-cloud-loadbalancer-id: {your-slb-id}.

    If the annotation of the Service contains the ID of the SLB instance, the SLB instance is reused.

  2. Log on to the SLB console and delete the listener that uses the Service port. For more information about how to delete listeners for an SLB instance, see Manage forwarding rules for a listener.
Status Code: 400 Code: NetworkConflict The reused internal-facing SLB instance and the cluster are not deployed in the same virtual private cloud (VPC).

Solution: Make sure that your SLB instance and the cluster are deployed in the same VPC.

Status Code: 400 Code: VSwitchAvailableIpNotExist Message: The specified VSwitch has no available ip. The idle IP addresses in the vSwitch are insufficient.

Solution: Specify another vSwitch in the same VPC by using the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-vswitch-id: "${YOUR_VSWITCH_ID}" annotation.

The specified Port must be between 1 and 65535. The targetPort field does not support STRING type values in ENI mode.

Solution: Set the targetPort field in the Service YAML file to a value of the INTEGER type or update the CCM. For more information about how to update the CCM, see Manually update the CCM.

Status Code: 400 Code: ShareSlbHaltSales Message: The share instance has been discontinued. By default, earlier versions of CCM automatically create shared-resource SLB instances, which are no longer available for purchase.

Solution: Manually update the CCM.

can not change ResourceGroupId once created You cannot modify the resource group of an SLB instance after the resource group is created.

Solution: Delete the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-resource-group-id:"rg-xxxx" annotation from the Service.

can not find eniid for ip x.x.x.x in vpc vpc-xxxx The specified IP address of the ENI cannot be found in the VPC.

Solution: Check whether the service.beta.kubernetes.io/backend-type: eni annotation is added to the Service. If the annotation is added to the Service, check whether Flannel is used as the network plug-in of the cluster. If Flannel is used, delete the annotation from the Service. Flannel does not support ENI mode.

Troubleshooting

You can refer to the information provided in the following table to troubleshoot errors other than Service errors.

Category Symptom Solution
Issues that occur when you access an SLB instance The SLB instance does not evenly distribute traffic. The SLB instance does not evenly distribute traffic
The 503 error occurs when I access the SLB instance during application updates. The 503 error occurs when I access the SLB instance during application updates
The SLB instance cannot be accessed from within the cluster. The IP address of the SLB instance that is associated with the LoadBalancer Service cannot be accessed from within the cluster
The SLB instance cannot be accessed from outside the cluster. The SLB instance cannot be accessed from outside the cluster
The following error occurs when a request is sent to an HTTPS port: The plain HTTP request was sent to HTTPS port. Errors occur when a request is sent to an HTTPS port
Issues related to SLB configurations The annotations of the Service do not take effect. What do I do if the annotations of a Service do not take effect?
The configuration of the SLB instance is modified. Why is the configuration of an SLB instance modified?
The system fails to reuse an existing SLB instance. Why does the system fail to use an existing SLB instance for more than one Services?
No listener is created when an existing SLB instance is reused. Why is no listener created when I reuse an existing SLB instance?
The endpoint of the Service is different from that specified for the backend server of the SLB instance. What do I do if the vServer groups of an SLB instance are not updated?
Issues related to SLB deletion The SLB instance is deleted. When is the SLB instance automatically deleted?
The SLB instance is not deleted together with the Service. When is the SLB instance automatically deleted?

The SLB instance does not evenly distribute traffic

Causes

You do not specify a proper scheduling algorithm for the SLB instance.

Symptom

Traffic is not evenly distributed to the backend servers of an SLB instance.

Solutions
  • If you set externalTraficPolicy to Local for a Service, set the scheduling algorithm of the SLB instance to weighted round-robin (WRR) by adding the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-scheduler:"wrr" annotation to the Service.
  • If long-lived connections are established to your Service, set the scheduling algorithm of the SLB instance to Weighted Least Connections (WLC) by adding the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-scheduler:"wlc" annotation to the Service.

The 503 error occurs when I access the SLB instance during application updates

Causes

You do not configure connection draining for the SLB listener or graceful shutdown for the pod.

Symptom

The 503 error occurs when you access the SLB instance during application updates.

Solutions
  1. Configure connection draining for the SLB listener by using the service.beta.kubernetes.io/alibaba-cloud-loadbalancer-connection-drain annotation. For more information about annotations, see Listeners.
  2. Set the preStop and readinessProbe parameters for the pod based on the network mode of the pod.
    • readinessProbe checks whether the container is ready to accept network traffic. The pod is added to the endpoint only if the pod passes the readiness probing. The node is attached to the SLB instance only if ACK detects that the endpoint is updated. You must set a proper probing interval, delay period, and unhealthy threshold for readinessProbe because some applications may require a long time period to start. If you specify a short time period, the application pods repeatedly restart.
    • We recommend that you set the value of preStop to a time period that the application pods require to handle the remaining requests. We recommend that you set the value of terminationGracePeriodSeconds to a time period that is 30 seconds longer than that of preStop.
    Example of a pod configuration:
    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx
      namespace: default
    spec:
      containers:
      - name: nginx
        image: nginx
        # Liveness probing
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 30
          successThreshold: 1
          tcpSocket:
            port: 5084
          timeoutSeconds: 1
        # Readiness probing
        readinessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 30
          successThreshold: 1
          tcpSocket:
            port: 5084
          timeoutSeconds: 1
        # Graceful shutdown
        lifecycle:
          preStop:
            exec:
              command:
              - sleep
              - 30
      terminationGracePeriodSeconds: 60

The SLB instance cannot be accessed from outside the cluster

Causes

You configured access control list (ACL) rules for the SLB instance or the SLB instance does not run as normal.

Symptom

You cannot access the SLB instance from outside the cluster.

Solutions
  1. Run the following command to query Service events and troubleshoot errors. For more information, see Service errors and solutions.
    kubectl -n {your-namespace} describe svc {your-svc-name}
  2. Check whether ACL rules are configured for the SLB instance.

    If ACL rules are configured for the SLB instance, check whether the client IP address is allowed to access the SLB instance. For more information about how to configure ACL rules for an SLB instance, see Overview.

  3. Check whether the SLB instance is associated with a vServer group.

    If no vServer group is associated, check whether the application pods are associated with the Service and whether the application pods run as normal. If the application pods do not run as normal, identify the causes and troubleshoot the errors. For more information, see Troubleshoot application errors in ACK.

  4. Check whether unhealthy backend servers are detected by the SLB listeners.

    If unhealthy backend servers are detected, check whether the application pods run as normal. For more information about the health checks of SLB, see Execute a health check script.

  5. If the issue persists, Submit a ticket.

Backend HTTPS services cannot be accessed

Causes

After you specify the certificate information in the SLB instance, the SLB instance decrypts HTTPS requests and then sends HTTP requests to the backend pods.

Symptom

You cannot access backend HTTPS services.

Solutions

Set targetPort to an HTTP port in the Service. targetPort specifies the port to which the HTTPS port is mapped. For example, the HTTPS port is 443 in the following NGINX Service. In this case, you must change the value of targetPort to 80.

Example:
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-protocol-port: "https:443"
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-cert-id: "${YOUR_CERT_ID}"
  name: nginx
  namespace: default
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  - port: 443
    protocol: TCP
    targetPort: 80
  selector:
    run: nginx
  type: LoadBalancer