通過ack-co-scheduler組件實現協同調度 - Container Service for Kubernetes

與原生的Kubernetes調度器相比，Container ServiceACK的調度器擴充出更多其他功能，例如Gang Scheduling、CPU拓撲感知、ECI彈性調度等。本文介紹通過為註冊叢集安裝ack-co-scheduler組件的方式，以實現在您的本地叢集中使用阿里雲Container ServiceACK的調度能力，讓您能夠便捷地使用Container Service對於巨量資料、AI等應用擴充出的差異化能力，提高應用的運行效率。

前提條件

已建立註冊叢集，並將自建Kubernetes叢集接入註冊叢集。具體操作，請參見建立註冊叢集。
系統組件版本要求具體如下表所示。
組件
版本要求
Kubernetes
1.18.8及以上版本
Helm
3.0及以上版本
Docker
19.03.5
作業系統
CentOS 7.6、CentOS 7.7、Ubuntu 16.04、Ubuntu 18.04、Alibaba Cloud Linux

注意事項

在部署任務時需要指定調度器的名稱為ack-co-scheduler，即將.template.spec.schedulerName配置為ack-co-scheduler。

安裝ack-co-scheduler組件

通過onectl安裝

在本地安裝配置onectl。具體操作，請參見通過onectl管理註冊叢集。

執行以下命令，安裝ack-co-scheduler組件。

onectl addon install ack-co-scheduler

預期輸出：

Addon ack-co-scheduler, version **** installed.

通過控制台安裝

登入Container Service管理主控台，在左側導覽列選擇叢集列表。
在叢集列表頁面，單擊目的地組群名稱，然後在左側導覽列，選擇營運管理 > 組件管理。
在組件管理頁面，單擊其他頁簽，找到ack-co-scheduler組件，在卡片右下方單擊安裝。
在提示對話方塊中單擊確定。

Gang scheduling

阿里雲Container ServiceACK基於新版的Kube-scheduler架構實現Gang scheduling的能力，解決原生調度器無法支援All-or-Nothing作業調度的問題。

使用以下模板向叢集中提交TensorFlow分布式作業。關於如何運行TensorFlow的分布式作業，請參見使用Gang scheduling。

apiVersion: "kubeflow.org/v1"
kind: "TFJob"
metadata:
  name: "tf-smoke-gpu"
spec:
  tfReplicaSpecs:
    PS:
      replicas: 1
      template:
        metadata:
          creationTimestamp: null
          labels:
            pod-group.scheduling.sigs.k8s.io/name: tf-smoke-gpu
            pod-group.scheduling.sigs.k8s.io/min-available: "2"
        spec:
          schedulerName: ack-co-scheduler   #指定調度器的名稱為ack-co-scheduler。
          containers:
          - args:
            - python
            - tf_cnn_benchmarks.py
            - --batch_size=32
            - --model=resnet50
            - --variable_update=parameter_server
            - --flush_stdout=true
            - --num_gpus=1
            - --local_parameter_device=cpu
            - --device=cpu
            - --data_format=NHWC
            image: registry.cn-hangzhou.aliyuncs.com/kubeflow-images-public/tf-benchmarks-cpu:v20171202-bdab599-dirty-284af3
            name: tensorflow
            ports:
            - containerPort: 2222
              name: tfjob-port
            resources:
              limits:
                cpu: '10'
            workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
          restartPolicy: OnFailure
    Worker:
      replicas: 4
      template:
        metadata:
          creationTimestamp: null
          labels:
            pod-group.scheduling.sigs.k8s.io/name: tf-smoke-gpu
            pod-group.scheduling.sigs.k8s.io/min-available: "2"
        spec:
          schedulerName: ack-co-scheduler
          containers:
          - args:
            - python
            - tf_cnn_benchmarks.py
            - --batch_size=32
            - --model=resnet50
            - --variable_update=parameter_server
            - --flush_stdout=true
            - --num_gpus=1
            - --local_parameter_device=cpu
            - --device=gpu
            - --data_format=NHWC
            image: registry.cn-hangzhou.aliyuncs.com/kubeflow-images-public/tf-benchmarks-cpu:v20171202-bdab599-dirty-284af3
            name: tensorflow
            ports:
            - containerPort: 2222
              name: tfjob-port
            resources:
              limits:
                cpu: 10
            workingDir: /opt/tf-benchmarks/scripts/tf_cnn_benchmarks
          restartPolicy: OnFailure

CPU拓撲感知調度

在使用CPU拓撲感知調度前，需要預先部署組件resource-controller。具體操作，請參見管理組件。

使用以下模板在Deployment中進行CPU拓撲感知調度。關於CPU拓撲感知調度的更多資訊，請參見啟用CPU拓撲感知調度。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-numa
  labels:
    app: nginx-numa
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx-numa
  template:
    metadata:
      annotations:
        cpuset-scheduler: "true"
      labels:
        app: nginx-numa
    spec:
      schedulerName: ack-co-scheduler #指定調度器的名稱為ack-co-scheduler。
      containers:
      - name: nginx-numa
        image: nginx:1.13.3
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 4
          limits:
            cpu: 4

ECI彈性調度

ECI彈性調度是阿里雲提供的彈性調度策略。您可以在部署服務時通過添加Annotations來聲明只使用ECS或ECI彈性資源，或者是在ECS資源不足時自動申請ECI資源。通過ECI彈性調度可以滿足您在不同工作負載的情境下對彈性資源的不同需求。

在使用ECI彈性調度前，需要在叢集中預先部署組件ack-virtual-node。具體操作，請參見ACK使用ECI。

使用以下模板在Deployment中使用ECI彈性調度。關於如何使用ECI彈性調度，請參見通過ElasticResource實現ECI彈性調度（停止維護）。

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 4
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      name: nginx
      annotations:
        alibabacloud.com/burst-resource: eci #添加註解，選擇彈性調度的資源類型。
      labels:
        app: nginx
    spec:
      schedulerName: ack-co-scheduler #指定調度器的名稱為ack-co-scheduler。
      containers:
      - name: nginx
        image: nginx
        resources:
          limits:
            cpu: 2
          requests:
            cpu: 2

在template. metadata下配置Pod的Annotations alibabacloud.com/burst-resource，聲明彈性調度資源的類型。alibabacloud.com/burst-resource取值說明如下：

預設不填Annotations時：只使用叢集現有的ECS資源。
eci：當前叢集ECS資源不足時，使用ECI彈性資源。
eci_only：只使用ECI彈性資源，不使用叢集的ECS資源。

共用GPU調度

關於如何使用共用GPU調度，請參見運行共用GPU調度組件、監控和隔離GPU資源、基於節點池管理共用GPU。

組件	版本要求
Kubernetes	1.18.8及以上版本
Helm	3.0及以上版本
Docker	19.03.5
作業系統	CentOS 7.6、CentOS 7.7、Ubuntu 16.04、Ubuntu 18.04、Alibaba Cloud Linux