All Products
Search
Document Center

Container Service for Kubernetes:Recommended configurations for high reliability

Last Updated:Dec 11, 2023

This topic describes the recommended configurations for creating a Container Service for Kubernetes (ACK) cluster where applications can run stably and reliably.

Recommended configurations for creating an ACK cluster

Disk type and disk size

Disk type

  • We recommend that you select an SSD.

  • When you create an ACK cluster, select Mount Data Disk for worker nodes. The data disk is mounted to /var/lib/docker to store local images in case the storage of the root disk is exhausted when the number of images increases. The disk may contain images that are no longer needed. To quickly delete these images, remove the node from the cluster, recreate a data disk, and then add the node to the cluster again.

Disk size

Docker images, system logs, and application logs are all stored on disks. Therefore, Kubernetes nodes require a large amount of disk space. When you create an ACK cluster, you must take into consideration the number of pods that you want to deploy on each node, the log size of each pod, the image size, the temporary data size, and the space reserved for the system.

We recommend that you reserve 8 GB of disk space for the operating system that runs on Elastic Compute Service (ECS) instances. The operating system occupies about 3 GB of disk space. The remaining space is used by Kubernetes resource objects.

Whether to create worker nodes when you create an ACK cluster

When you create an ACK cluster, you can set Node Type to the following values:

  • Pay-As-You-Go: creates worker nodes when you create the ACK cluster.

  • Subscription: does not create worker nodes when you create the ACK cluster. After the cluster is created, you can purchase ECS instances and add them to the cluster based on your requirements.

Network

  • If you want to connect your cluster to external services such as ApsaraDB RDS, you must use an existing virtual private cloud (VPC) instead of creating a VPC. VPCs are isolated from each other. You can create a vSwitch and attach the ECS instances of your cluster to the vSwitch. This facilitates cluster management.

  • You can select one of the following network plug-ins when you create an ACK cluster: Terway and Flannel. For more information, see Terway and Flannel.

  • Do not set a small CIDR block for the pod network. Otherwise, the number of supported nodes is limited. To set this parameter, consider the value that you want to specify for the Pod Number for Node parameter in the Advanced Settings section. For example, if you set the CIDR block of the pod network to X.X.X.X/16, 256 × 256 IP addresses are assigned to your cluster. If the Pod Number for Node parameter is set to 128, the maximum number of nodes supported by your cluster is 512.

Use cross-zone deployment

Alibaba Cloud supports cross-region deployment and each region has one or more zones. The zone in which you want to create the instance. Each region has multiple isolated locations known as zones. Each zone has its own independent power supply and networks. Cross-zone deployment enables cross-zone disaster recovery but increases network latency.

Associate a deployment set with a node pool

Deployment sets are used to manage the distribution of Elastic Compute Service (ECS) instances. ECS instances in a deployment set are distributed across multiple physical servers for high redundancy. This improves the availability of your applications and implements disaster recovery. A node pool that is associated with a deployment set contains ECS nodes that are distributed across multiple physical servers. You can configure pod affinity to deploy your application pods to different ECS nodes. For more information, see Best practices for associating deployment sets with node pools.

Recommended configurations for workloads

Claim resources for each pod

In an ACK cluster, excessive pods may be scheduled to one node. This overloads the node and the node can no longer provide services.

To avoid this issue, you can specify the requests parameter and the limits parameter when you deploy a pod in your cluster. This ensures that the pod is deployed on a node with sufficient resources. In the following example, the NGINX pod requires 1 CPU core and 1,024 MiB of memory. When the pod is running, the upper limit of resource usage is 2 CPU cores and 4,096 MiB of memory.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx
    resources: # Resource claim
      requests:
        memory: "1024Mi"
        cpu: "1000m"
      limits:
        memory: "4096Mi"
        cpu: "2000m"

ACK uses a static resource scheduling method and calculates the remaining resources on each node by using the following formula: Remaining resources = Total resources - Allocated resources. The allocated resources are not equivalent to the resources that are used. When you manually run a resource-consuming program, ACK does not deduct the resources used by the program.

You must claim resources for all pods. For a pod that does not have resource claims, after it is scheduled to a node, the resources used by the pod are not deducted from the total resources of the node. As a result, excessive pods may be scheduled to this node.

Wait until dependencies are ready instead of terminating an application during startup

Some applications may have external dependencies. For example, an application may need to read data from a database or call the interface of another service. However, when the application starts, the database or the interface may be unready. In manual O&M, the application is terminated when the external dependencies are unready. This is known as failfast. This strategy is not suitable for ACK clusters. Most O&M activities in ACK are automated. For example, you do not need to manually deploy an application, start the application on a selected node, or restart the application when the application fails. Applications in ACK clusters are automatically restarted upon failures. You can also scale the number of pods by using the Horizontal Pod Autoscaler (HPA) to deal with increased loads.

For example, Application A is reliant on Application B and both applications run on the same node. The node restarts due to some reason. After the node is restarted, Application A is started while Application B is not. In this case, the dependency of Application A is unready. The traditional way terminates Application A in this case. After Application B is started, Application A must be manually started.

The best way for ACK is to check whether the dependencies are ready in a round robin manner and wait until all dependencies are ready instead of terminating the application. This can be achieved by using Init Container.

Configure the restart policy

It is common for application processes running in a pod to be closed. Application processes may be closed due to bugs in the code, excessive memory use, or other reasons. The pod fails when the processes are closed. You can set the restartPolicy parameter for the pod. This ensures that the pod can be automatically restarted upon failures.

apiVersion: v1
kind: Pod
metadata:
  name: tomcat
spec:
  containers:
  - name: tomcat
    image: tomcat
    restartPolicy: OnFailure # 

Valid values of the restartPolicy parameter:

  • Always: automatically restarts the pod in all cases.

  • OnFailure: automatically restarts the pod upon failures (the state of the closed process is not 0).

  • Never: never restarts the pod.

Configure liveness probes and readiness probes

A pod may be unable to provide services even if the pod is in the Running state. The processes in a running pod may be locked. Consequently, the pod cannot provide services. However, Kubernetes does not restart such a pod because the pod is still running. Therefore, you must configure liveness probes for all pods in a cluster. The probes check whether the pods are alive and can provide services. When a liveness probe detects exceptions in a pod, the pod is automatically restarted.

A readiness probe is used to determine whether a pod is ready to provide services. It takes some time for an application to initialize during startup. During the initialization, the pod where the application runs cannot provide services. A readiness probe is used to inform Ingresses or Services whether the pod is ready to receive network traffic. When pod errors are detected by readiness probes, Kubernetes stops forwarding network traffic to the pod.

apiVersion: v1
kind: Pod
metadata:
  name: tomcat
spec:
  containers:
  - name: tomcat
    image: tomcat
    livenessProbe:
      httpGet:
        path: /index.jsp
        port: 8080
      initialDelaySeconds: 3
      periodSeconds: 3
    readinessProbe:
      httpGet:
        path: /index.jsp
        port: 8080

Run only one process in each container

Users who are new to the container service tend to use containers as VMs and run multiple processes in one container. The processes include the monitoring process, logging process, sshd process, and the systemd. This causes the following two issues:

  • It becomes complex to determine the resource usage of a pod. Implementing resource limit also becomes difficult.

  • Running only one process in each container ensures that the container engine can detect process failures and restart the container upon each failure. If a container contains multiple processes, the container may be not affected when one of the processes is terminated. The external container engine is unaware of the terminated process and therefore does not perform any action on the container. However, the container may not be able to function as expected in this case.

ACK allows you to run multiple processes together. For example, if you want NGINX and php-fpm to communicate with each other by using a UNIX domain socket, you can use a pod that contains two containers. Then, put the UNIX domain socket into a volume shared by the two containers.

Eliminate single points of failure (SPOF)

If an application uses only one ECS instance, the application becomes unavailable during the time period in which the ECS instance is restarted upon a failure. The application also becomes unavailable when it is upgraded or when a new version is released. Therefore, we recommend that you do not directly run applications in pods. Instead, deploy applications by using Deployments or StatefulSets and use more than two pods for each application.

Recommended configurations for components

You can choose to configure specific components when you create a cluster. You can also enable specific components for a created cluster to extend cluster features. For more information about the components, see Step 4: Configure cluster components and click the documentation links to view specific features. For more information about the component introductions and release notes of different features, see Release notes for components.

Perform routine O&M

  • Logs

    When you create an ACK cluster, select Enable Log Service.

  • Monitoring

    ACK integrates the CloudMonitor service of Alibaba Cloud. You can dynamically monitor your cluster by configuring node monitoring. You can add alert rules to help you locate the cause of unexpected high resource usage on nodes.

    When you create an ACK cluster in the ACK console, if you select CloudMonitor Agent, the system automatically creates container monitoring groups in the CloudMonitor console. You can add alert rules to container monitoring groups. The alert rules apply to all nodes in the groups. Newly added nodes are automatically classified into groups. You do not need to create alert rules for the new nodes. For more information, see Container Service Monitoring.

    For example, you can configure alert rules for ECS instances. To monitor ECS instances, set alert rules for the cpu, memory, and disk resources in routine O&M. We recommend that you store /var/lib/docker on a separate disk.

Properly deploy the NGINX Ingress controller

When you deploy the NGINX Ingress controller, make sure that the controller pods are distributed across different nodes. This helps prevent resource contention among controller pods and SPOFs. You can schedule the controller pods to exclusive nodes to ensure the performance and stability of the NGINX Ingress controller. For more information, see Use exclusive nodes to ensure the performance and stability of the NGINX Ingress controller.

We recommend that you do not set resource limits for the NGINX Ingress controller pods. This helps prevent service interruptions that are caused by out of memory (OOM) errors. If resource limits are required, we recommend that you set the CPU limit to 1,000 millicores or greater, and set the memory limit to 2 GiB or greater. The format of the CPU limit in the YAML file is 1000m. For more information about how to configure the NGINX Ingress controller, see Best practices for the NGINX Ingress controller.

Properly deploy CoreDNS

When you deploy CoreDNS pods in a cluster, we recommend that you deploy the CoreDNS pods on different cluster nodes across multiple zones. This prevents service disruptions when a single node or zone fails. By default, soft anti-affinity settings based on nodes are configured for CoreDNS. Some or all CoreDNS pods may be deployed on the same node due to insufficient nodes. In this case, we recommend that you delete the CoreDNS pods and reschedule the pods.

CoreDNS pods must not be deployed on cluster nodes whose CPU and memory resources are fully utilized. Otherwise, DNS QPS and response time are adversely affected. For more information about how to configure CoreDNS, see Best practices for DNS services.