Enterprise Kubernetes Orchestration with Alibaba Cloud ACK

This article examines how Alibaba Cloud ACK delivers enterprise Kubernetes at scale, covering cluster foundation and node pool composition, pod netwo...

Modern enterprise applications are increasingly composed of containerised microservices that must be deployed, scaled, and operated across hundreds or thousands of compute instances spanning multiple availability zones. The orchestration layer responsible for this work, scheduling pods to nodes, maintaining declared state, exposing services, attaching persistent storage, and enforcing security policy, has standardised on Kubernetes across the industry. Running Kubernetes at enterprise scale, however, is itself a substantial engineering undertaking. Control plane availability, node pool composition, pod-to-pod network performance, persistent storage integration, and the operational machinery of cluster upgrades and observability all require deliberate design choices that materially affect production reliability.

Most organisations approaching Kubernetes adoption face a common trade-off: operate the control plane independently and assume responsibility for etcd backups, certificate rotation, and version upgrades, or consume Kubernetes as a managed service and direct engineering effort toward application workloads. Alibaba Container Service for Kubernetes (ACK) is built for the second path. This article documents the architecture, configuration decisions, and operational considerations for running ACK clusters at enterprise scale, organised across four layers: cluster foundation, pod networking, persistent storage, and observability with auto-scaling.

Layer 1: Cluster Foundation and Node Pool Composition

The cluster foundation layer comprises the Kubernetes control plane and the worker node pools on which application pods are scheduled. ACK offers two cluster types relevant to enterprise scale: ACK Managed clusters, in which the control plane is fully operated by Alibaba Cloud; and ACK Pro clusters, which extend the managed offering with multi-replica etcd, dedicated control plane infrastructure, audit log integration, and a 99.95% control plane SLA. For workloads with availability requirements above 99.9%, ACK Pro is the appropriate baseline.

The control plane runs the Kubernetes API server (port 6443), controller manager, scheduler, and a multi-replica etcd cluster. etcd is the durable state store; its performance is bounded by disk write latency and network round-trip time, and operational guidance recommends keeping the etcd database size below 8 GB to maintain consistent compaction and snapshot behaviour. Cluster operators do not interact with etcd directly under the managed model, but workload patterns that produce excessive Custom Resource volume, for example, high-frequency CronJobs without successfulJobsHistoryLimit configured, or unbounded Event accumulation, can pressure etcd and should be governed through ResourceQuota objects and TTL controllers.

Worker capacity is organised into node pools. A node pool is a group of ECS instances that share configuration: instance type family, OS image, kubelet parameters, taints, and labels. Heterogeneous workloads benefit from multiple node pools: a general-purpose pool sized to the median request profile, a memory-optimised pool for in-memory caches and analytical workloads, and a GPU pool where inference or training is required. Workloads are directed to the appropriate pool through nodeSelector, nodeAffinity, and taint tolerations.

Cluster Autoscaler integrates with node pools to scale worker counts in response to unschedulable pods. Default behaviour scales up when pods remain in the Pending state beyond the configured threshold (10 seconds), and scales down nodes that have remained under-utilised for 10 minutes. Spot instance node pools can be combined with on-demand pools to reduce compute cost on stateless, fault-tolerant workloads, with PodDisruptionBudgets configured to prevent cascading evictions when spot capacity is reclaimed by the underlying provider.

Layer 2: Pod Networking with Terway

ACK supports two CNI plug-ins: Flannel and Terway. Flannel implements an overlay network using VXLAN encapsulation, which simplifies setup but introduces per-packet processing overhead at scale. Terway is the recommended choice for production clusters; it allocates secondary ENIs (Elastic Network Interfaces) from the underlying VPC directly to pods, eliminating overlay encapsulation and providing native VPC routing between pods, ECS instances, and other VPC-resident services.

Terway operates in two modes. In ENI mode, each pod receives an independent secondary ENI, which constrains pod density to the ENI quota of the underlying ECS instance type. In Trunk ENI mode (also referred to as ENI multi-IP mode), secondary IP addresses are allocated from a single trunked ENI, allowing significantly higher pod density per node typically up to several dozen pods per ENI depending on instance type. Trunk ENI is the appropriate selection for general microservice workloads where high pod density is the priority; ENI mode is reserved for workloads requiring per-pod network bandwidth isolation.

Pod-to-pod and pod-to-service traffic within the cluster is routed natively through the VPC; cross-AZ latency is determined by the underlying VPC characteristics, typically 1–2 ms for intra-region cross-AZ traffic. Network Policies are enforced through Terway's integration with the Calico policy engine, allowing namespace-level and label-based ingress and egress restrictions without modifying application code.

External traffic enters the cluster through Service resources of type LoadBalancer or through Ingress controllers. For LoadBalancer services, ACK provisions a CLB or NLB instance and binds it to the service through annotations on the Service object. For HTTP and HTTPS routing with host- and path-based rules, the ALB Ingress controller is the production-grade option, integrating directly with Application Load Balancer and supporting TLS termination, weighted traffic splitting, and integration with WAF for application-layer filtering. The Nginx Ingress controller remains available for clusters requiring direct compatibility with existing nginx configurations.

Layer 3: Persistent Storage Integration

Stateful workloads require durable, network-attached storage that follows pods across rescheduling events. ACK provides this through CSI drivers integrated with three Alibaba Cloud storage services: Cloud Disk (block), NAS (file), and OSS (object). Each is exposed to Kubernetes through dynamically-provisioned StorageClasses, allowing pods to request storage declaratively through PersistentVolumeClaims.

Cloud Disk provides single-attachment block storage suitable for workloads requiring low-latency random I/O relational databases, search indices, and message queue brokers. ESSD (Enhanced SSD) categories range from PL0 through PL3, with PL1 (up to 50,000 IOPS per disk) appropriate for general database workloads and PL2 or PL3 reserved for I/O-intensive systems. A single ECS instance has a hard limit on attachable cloud disks (typically 16, varying by instance type), which constrains the maximum number of cloud-disk-backed pods that can be scheduled per node.

NAS provides POSIX-compliant shared file storage that can be mounted concurrently by multiple pods across nodes. It is the appropriate choice for workloads requiring shared state content management systems, build artefact caches, and machine learning training datasets where multiple workers consume the same input. NAS Performance type offers higher throughput at higher cost; Capacity type is suitable for archival and infrequent-access workloads.

OSS is mounted through the ossfs FUSE driver. It is well-suited to backup targets, model artefact storage, and read-heavy reference data, but is not appropriate for general-purpose POSIX workloads owing to the eventual-consistency semantics and metadata operation latency inherent to object storage.

StatefulSets, combined with volumeClaimTemplates, provide stable network identity and persistent storage binding for applications such as Redis, Elasticsearch, and Kafka. The VolumeSnapshot CRD integrates with Cloud Disk's snapshot service for point-in-time backup and restore of block volumes.

Layer 4: Observability and Auto-Scaling

Production clusters require visibility into workload behaviour, resource utilisation, and control plane health. ACK integrates with three Alibaba Cloud observability services to provide this visibility without requiring self-hosted monitoring infrastructure.

Managed Service for Prometheus (ARMS Prometheus) collects metrics from kubelet, kube-state-metrics, node-exporter, and application-defined ServiceMonitor and PodMonitor resources. Default scrape intervals of 15–30 seconds balance metric resolution against storage and query cost. Pre-built dashboards cover cluster, node, namespace, and workload-level resource consumption, with PromQL alerting rules surfacing CPU saturation, memory pressure, and pod restart anomalies.

Simple Log Service (SLS) collects container stdout and stderr output and application file logs through the alibaba-log-controller and the Logtail DaemonSet. Log indexing supports full-text and structured queries across the cluster log volume; SQL extensions enable aggregation and trend analysis directly within the SLS console without exporting data to a separate analytical system.

Application performance traces are collected through ARMS APM via OpenTelemetry-compatible SDKs or automatic Java agent injection, surfacing distributed call paths, database query latency, and external service dependencies. Trace correlation with metrics and logs is achieved through shared TraceID propagation.

Workload-level scaling is governed by Horizontal Pod Autoscaler (HPA), which adjusts replica counts based on CPU, memory, or custom metrics sourced from Prometheus through the metrics adapter. HPA decisions interact with Cluster Autoscaler to propagate scaling to the node level: when HPA adds replicas that exceed available scheduling capacity, Cluster Autoscaler provisions additional nodes within the relevant pool. For workloads with predictable diurnal load patterns, CronHPA supplements reactive scaling by adjusting replica counts on a fixed schedule.

Operational Considerations

Three operational factors materially affect the reliability of an ACK cluster under production load.

Multi-AZ topology design: Production clusters should distribute nodes across at least three availability zones, with topologySpreadConstraints applied to critical workloads to enforce even distribution. Single-AZ pools may be retained for workloads with tight network-locality requirements (such as inter-pod GPU communication for training jobs), but stateless application tiers should always be multi-AZ to tolerate single-zone failure without availability impact.
Cluster version upgrades: Kubernetes maintains a one-minor-version skew policy between the control plane and worker nodes. Upgrade sequencing must therefore follow control plane first, then node pools, with workloads validated between phases. ACK supports surge upgrades on node pools, draining and replacing nodes in batches without manual intervention; PodDisruptionBudgets must be configured on critical workloads in advance to prevent simultaneous eviction of all replicas during a node pool upgrade pass.
Workload identity through RRSA: RAM Roles for Service Accounts (RRSA) binds Kubernetes service accounts to RAM roles, allowing pods to access OSS, RDS, or other Alibaba Cloud services with scoped, short-lived credentials issued through the metadata service. RRSA replaces the practice of mounting AccessKey pairs into pods or relying on node-level instance roles, both of which violate the principle of least privilege and expand the credential blast radius beyond the workload boundary.

Conclusion

The four-layer architecture cluster foundation, pod networking, persistent storage, and observability with auto-scaling provides a structured path to operating Kubernetes at enterprise scale on Alibaba Cloud. Each layer carries distinct configuration decisions: cluster type and node pool composition for the foundation, CNI selection and ingress controller for networking, storage class taxonomy for stateful workloads, and metrics, logs, and trace integration for observability. The managed control plane removes etcd, scheduler, and API server operational burden, allowing engineering effort to concentrate on application architecture and workload reliability.

Engineers extending this architecture should evaluate three patterns based on their workload profile. Service mesh adoption through Alibaba Cloud Service Mesh (ASM) becomes valuable when traffic management, mutual TLS, and fine-grained observability requirements exceed what Ingress controllers and CNI-level network policies can express. Virtual nodes backed by Elastic Container Instance (ECI) enable burst capacity without provisioning additional ECS nodes, well-suited to variable batch workloads and CI build farms. For workloads with strict isolation requirements, sandboxed container runtimes such as Kata Containers provide VM-level isolation for individual pods while preserving the standard Kubernetes scheduling and lifecycle model.

Disclaimer: The views expressed herein are for reference only and don’t necessarily represent the official views of Alibaba Cloud.

Community

Enterprise Kubernetes Orchestration with Alibaba Cloud ACK

Layer 1: Cluster Foundation and Node Pool Composition

Layer 2: Pod Networking with Terway

Layer 3: Persistent Storage Integration

Layer 4: Observability and Auto-Scaling

Operational Considerations

Conclusion

Read previous post:

PM - C2C_Yuan

You may also like

Comments

PM - C2C_Yuan

Related Products

Container Service for Kubernetes

ACK One

Auto Scaling

Container Compute Service (ACS)