All Products
Search
Document Center

Container Service for Kubernetes:Application distribution overview

Last Updated:Apr 17, 2025

The application distribution feature of Distributed Cloud Container Platform for Kubernetes (ACK One) is supported. You can use this feature to distribute an application from a Fleet instance to multiple clusters that are associated with the Fleet instance. This feature allows you to configure distribution policies on a fleet instance. You can use the policies to efficiently distribute eligible Kubernetes resources to clusters that match the policies. In addition, you can configure differentiated policies to meet the deployment requirements of different clusters and applications. Compared with GitOps, this distribution method does not require Git repositories.

How application distribution works

After you create resources for your application on a Fleet instance, you create a PropagationPolicy and a ClusterPropagationPolicy to configure a distribution policy that is used to distribute specific resources to the clusters that are associated with the Fleet instance. You can also create an OverridePolicy and a ClusterOverridePolicy to override specific resources during distribution based on your business requirements.

Note

You only need to create a PropagationPolicy and an OverridePolicy when you initialize your application. The PropagationPolicy and OverridePolicy remain effective after they are created. You can configure match rules for the resources that you want to distribute in the PropagationPolicy and configure match rules for the resources that you want to override in the OverridePolicy. Subsequent resource upgrades or updates will be automatically synchronized to the associated clusters. You can use the AMC command-line tool to query the distribution progress of an application in an associated cluster.

image

Advanced features

Workload scheduling based on static weights and dynamic weights

In multi-cluster application scheduling scenarios, ACK One Fleet instances schedule pod replicas to multiple clusters based on distribution policies. The following distribution policies are supported:

  • Distribution policy based on static weights: The cluster administrator can specify a static weight for an associated cluster in the policy. The scheduler schedules a specific number of pod replicas to the cluster based on the specified weight. For more information about how to configure a distribution policy based on static weights, see replicaScheduling.

  • Distribution policy based on dynamic weights: The scheduler generates a dynamic weight for a cluster by calculating the number of pods that each associated cluster can host based on the amount of available resources in the cluster. The scheduler schedules a specific number of pod replicas to the cluster based on the dynamic weight. For more information about how to configure a distribution policy based on dynamic weights, see Dynamic distribution and descheduling.

Descheduling

In multi-cluster application scheduling scenarios, the amount of available resources in each associated cluster dynamically changes. As a result, pods may fail to be scheduled to a cluster due to reasons such as low pod priorities or insufficient resources. When a pod fails to be scheduled to an associated cluster, the descheduler schedules the pod again to another cluster to ensure that the pod can run as normal. Descheduling is enabled by default. For more information about how to verify descheduling, see Verify descheduling.

Application-level failover

In multi-cluster application scheduling scenarios, you can improve resource utilization by colocating online services and offline services in the same cluster. To prevent online services from being affected by offline services, you can assign offline services lower priorities than online services or run offline services on preemptible instances. In this case, offline services may stop running due to reasons such as node exceptions, resource preemption by pods with higher priorities, or preemptible instance releases.

To address the preceding issue, ACK One Fleet instances support application-level failover. When a job or task stops running, the application-level failover feature automatically migrates the job or task to another cluster. For more information, see How to use Kube Queue on a Fleet instance and schedule PyTorchJob by using gang scheduling.

Multi-cluster gang scheduling

Gang scheduling ensures that a group of correlated pods are scheduled at the same time. If the scheduling requirements are not met, none of the pods is scheduled. Gang scheduling provides a solution to job scheduling in all-or-nothing scenarios. In multi-cluster application distribution scenarios, multi-cluster gang scheduling uses resource pre-allocation or dynamic resource checks to schedule a group of correlated pods to the same cluster.

PyTorch and TensorFlow training jobs use a master-worker architecture and Spark jobs use a driver-executor architecture. When you run a PyTorch or TensorFlow training job that uses multiple GPUs on multiple nodes or run a Spark job, you can use the multi-cluster gang scheduling feature to schedule the pods of the job to the same cluster. This allows you to ensure communication between the master pod and workers and between the driver pod and executor pods.

With the multi-cluster gang scheduling, descheduling, and application-level failover features, ACK One Fleet instances ensure that AI jobs can be scheduled to clusters that provide sufficient resources and run as normal. For more information, see Job distribution.

Distributable resources

The following table describes the resources that are supported by multi-cluster application distribution and differentiated deployment.

Note

By default, if you have the permissions to create resources on a Fleet instance, you have the permissions to distribute resources from the Fleet instance to the clusters that are associated with the instance.

Resource level

Resource type

APIVersion

Distribution policy

Override policy

Cluster

Namespace

v1

Supported

Supported

PersistentVolume

v1

Supported

Supported

StorageClass

storage.k8s.io/v1

Supported

Supported

CutomResourceDefinition

apiextensions.k8s.io/v1

Supported

Supported

Namespace

Deployment

apps/v1

Supported

Supported

StatefulSet

apps/v1

Supported

Supported

DaemonSet

apps/v1

Supported

Supported

Job

batch/v1

Supported

Supported

CronJob

batch/v1

Supported

Supported

Ingress

networking.k8s.io/v1

Supported

Supported

Service

v1

Supported

Supported

PersistentVolumeClaim

v1

Supported

Supported

ConfigMap

v1

Supported

Supported

Secret

v1

Supported

Supported

Pod

v1

Supported

Supported

LimitRange

v1

Supported

Supported

ResourceQuota

v1

Supported

Supported

HorizontalPodAutoscaler

autoscaling/v2

Supported

Supported

References

Feature

Description

References

Use distribution policies to deploy applications

This topic describes how to use kubectl to create a PropagationPolicy to distribute specific resources to the clusters that are associated with the Fleet instance and create an OverridePolicy to override specific resources during distribution.

Getting started with application distribution

Policy description

This topic describes how to create a distribution policy and an override policy, and describes the parameters in the policy template. You can read this topic to obtain a comprehensive understanding of application distribution policies.

PropagationPolicy and OverridePolicy

Use AMC to query application status in associated clusters

This topic describes how to run the kubectl amc command to query the distribution progress of an application in an associated cluster.

Use AMC command line