Notes for Kubernetes clusters at different stages - Enterprise Distributed Application Service

Apply these best practices when you create, monitor, and delete Kubernetes clusters managed by Enterprise Distributed Application Service (EDAS).

Create a cluster

This section covers EDAS-specific guidance. For detailed information about Container Service for Kubernetes (ACK) Pro clusters, see ACK Pro clusters overview.

Use ACK Pro managed clusters

For production environments, use ACK Pro managed clusters instead of ACK standard clusters. ACK Pro clusters provide stronger resource isolation and higher service level agreements (SLAs).

For more information, see ACK Pro clusters overview.

Choose the Terway network plug-in

Select the Terway network plug-in when you create a production cluster. Terway assigns a dedicated elastic network interface (ENI) to each pod, which delivers optimal network performance.

For setup details, see Create an ACK Pro cluster.

Select Alibaba Cloud Linux as the operating system

Use Alibaba Cloud Linux, the default operating system for ACK. It supports reinforcement based on classified protection and provides long-term service support with performance optimizations for Alibaba Cloud infrastructure.

For more information, see Use Alibaba Cloud Linux 2.

Use advanced security groups

Select advanced security groups for production clusters. Advanced security groups support a larger node capacity than basic security groups.

If your cluster uses basic security groups and you cannot scale out nodes or pods during application scaling, check whether the security group specifications are limiting node capacity.

For more information, see Security groups overview.

Do not modify ACK or EDAS security group rules

ACK manages security group rules automatically. The initial configuration covers the Classless Inter-Domain Routing (CIDR) blocks for Elastic Compute Service (ECS) instances and pods. Modifying or deleting these rules can cause the following issues and other feature exceptions:

The cluster cannot be imported to EDAS.
Pods cannot be scheduled.
Pods lose connectivity with each other.
The kubectl logs command stops working.

Important

If a security group rule has already been modified, add the CIDR block of the cluster node back to the rule. To find the security group associated with your cluster, see View cluster resources.

Scale the Ingress controller for external traffic

If your cluster uses an ingress to expose services externally, configure enough Ingress controller pod replicas to handle the expected traffic. Insufficient replicas reduce overall throughput.

To determine the right replica count:

Open the Prometheus monitoring view for the Ingress controller.
Check CPU, memory, and request-rate metrics under load.
Increase replicas until resource utilization stays within acceptable thresholds.

For details on monitoring the Ingress controller, see Analyze and monitor the access log of nginx-ingress-controller.

Configure CoreDNS replicas

Enable CoreDNS and maintain a ratio of 1 CoreDNS pod for every 8 nodes in the cluster. This ratio keeps DNS-based service discovery performant as the cluster scales.

For configuration steps, see Best practices for DNS services.

Monitor a cluster

Set up the following three monitoring layers to maintain visibility into cluster health. This section covers only Prometheus monitoring, node monitoring, and cluster log collection. For all available monitoring options, see Observability system overview.

Enable Prometheus monitoring

Use Prometheus to collect metrics for nodes and workloads. Configure alerts for indicators such as CPU utilization, memory pressure, and pod restart counts to detect issues before they affect your applications.

Enable CloudMonitor for nodes

Enable CloudMonitor for nodes and set up alerts for ECS instance resource usage, including CPU, memory, and disk utilization. This provides infrastructure-level visibility that complements Kubernetes-native metrics.

Collect cluster logs with Simple Log Service

Enable the cluster log collection feature to forward logs to Simple Log Service (SLS). Centralized log analysis helps you troubleshoot application errors, track access patterns, and audit cluster operations.

Delete a cluster

Warning

Follow the deletion steps in this exact order. Skipping steps or changing the order leaves residual resources, which can prevent Virtual Private Clouds (VPCs) from being deleted.

To remove a Kubernetes cluster from EDAS:

Delete the application -- Remove all applications deployed in the cluster from the EDAS console. See Delete an application.
Cancel the cluster import -- Unregister the cluster from EDAS and clean up associated resources. See Cancel the import and clean up a Kubernetes cluster in the EDAS console.
Delete the cluster from ACK -- Remove the cluster itself from Container Service for Kubernetes. See Delete ACK clusters.