跨地區調度ACS Pod -

在AI大模型訓練和推理情境中，單地區GPU資源存在型號覆蓋有限、庫存波動等問題，易導致算力不足或任務等待。阿里雲ACK One註冊叢集通過ACK Virtual Node提供的虛擬節點技術，將多地區的Serverless算力資源無縫接入Kubernetes叢集，實現跨地區GPU資源的動態調度與統一管理。該方案突破地區資源瓶頸，支援按需自動擷取目標地區的異構算力，解決GPU型號適配與庫存限制問題，顯著提升AI任務的資源使用率與商務持續性，同時降低混合雲部署的複雜性和成本。

工作原理

涉及Region資訊表示資料中心要訪問Serverless算力所在地區的Region ID，例如張家口地區為cn-zhangjiakou。
對每個Serverless Pod，ACK Virtual Node會建立雲上Serverless算力執行個體，無需額外維護節點。
資料中心和阿里雲多地區VPC使用專線互聯。

前提條件

建立ACK One註冊叢集並接入資料中心或者其他雲廠商的Kubernetes叢集（建議版本為1.24及以上）。具體操作，請參見建立註冊叢集。

安裝組件並開啟多地區Serverless算力調度

安裝ack-virtual-node組件。
1. 為ack-virtual-node組件配置RAM許可權。
2. 安裝ack-virtual-node組件。
說明
若您已安裝該組件，請確保組件版本為2.13.0及以上。若不滿足，請及時升級組件。

配置ack-virtual-node組件。

在組件管理頁面中，找到ack-virtual-node組件，單擊配置。

配置全域參數。

參數	說明	樣本值
是否使用VPC內網訪問	組件使用的鏡像地址和API地址是否均採用VPC內網網域名稱。	勾選
APIServerHost	資料中心Kubernetes叢集API Server地址。	192.168.1.1
APIServerPort	資料中心Kubernetes叢集API Server連接埠。	6443
是否啟用多地區Virtual Node	判斷是否啟用多地區Serverless算力。若啟用，則需進行相應地區資訊配置。	勾選

配置主地區。

參數	說明	樣本值
Region ID	Serverless算力使用的地區ID。	cn-beijing
VPC ID	Serverless算力使用的VPC ID。	vpc-xxxxx
vSwitch ID(s)	Serverless算力使用的交換器ID，多個交換器之間使用英文半形逗號分隔。	vsw-xxxxx,vsw-xxxxx
SecurityGroup ID	Serverless算力使用的安全性群組ID。	sg-xxxxx
Virtual Node 所使用的地區是否為預設地區	使用本地區作為預設地區。重要有且僅有一個地區可作為主地區。	勾選

配置次地區（可以配置多個）。單擊右下角添加，可配置多個地區。

參數	說明	樣本值
Region ID	Serverless算力使用的地區ID。	cn-hangzhou
VPC ID	Serverless算力使用的VPC ID。	vpc-xxxxx
vSwitch ID(s)	Serverless算力使用的交換器ID，多個交換器之間使用英文半形逗號分隔。	vsw-xxxxx,vsw-xxxxx
SecurityGroup ID	Serverless算力使用的安全性群組ID。	sg-xxxxx

配置完成後，單擊確定。

使用樣本

CPU情境

使用預設地區。

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx-default-region
  name: nginx-deployment-default-region
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-default-region
  template:
    metadata:
      labels:
        alibabacloud.com/acs: "true"
        alibabacloud.com/compute-class: general-purpose 
        alibabacloud.com/compute-qos: default 
        app: nginx-default-region
    spec:  
      containers:
        - image: 'mirrors-ssl.aliyuncs.com/nginx:stable-alpine'
          imagePullPolicy: IfNotPresent
          name: nginx
          ports:
            - containerPort: 80
              protocol: TCP

指定Serverless算力地區。需要添加alibabacloud.com/serverless-region-id: <RegionID> 標籤。

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx-specified-region
  name: nginx-deployment-specified-region
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-specified-region
  template:
    metadata:
      labels:
        alibabacloud.com/acs: "true" 
        alibabacloud.com/compute-class: general-purpose 
        alibabacloud.com/compute-qos: default 
        alibabacloud.com/serverless-region-id: cn-beijing # 通過該標籤指定地區
        app: nginx-specified-region
    spec:  
      containers:
        - image: 'mirrors-ssl.aliyuncs.com/nginx:stable-alpine'
          imagePullPolicy: IfNotPresent
          name: nginx
          ports:
            - containerPort: 80
              protocol: TCP

GPU情境

使用預設地區。

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx-gpu-default-region
  name: nginx-gpu-deployment-default-region
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-gpu-default-region
  template:
    metadata:
      labels:
        alibabacloud.com/acs: "true"
        alibabacloud.com/compute-class: gpu
        alibabacloud.com/compute-qos: default
        alibabacloud.com/gpu-model-series: example-model  # GPU卡型，請按照實際情況替換，例如T4
        app: nginx-gpu-default-region
    spec:  
      containers:
        - image: 'mirrors-ssl.aliyuncs.com/nginx:stable-alpine'
          imagePullPolicy: IfNotPresent
          name: nginx
          ports:
            - containerPort: 80
              protocol: TCP
          resources:
            limits:
              cpu: 1
              memory: 1Gi
              nvidia.com/gpu: "1"
            requests:
              cpu: 1
              memory: 1Gi
              nvidia.com/gpu: "1"

指定Serverless算力地區。需要添加alibabacloud.com/serverless-region-id: <RegionID> 標籤。

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx-gpu-specified-region
  name: nginx-gpu-deployment-specified-region
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-gpu-specified-region
  template:
    metadata:
      labels:
        alibabacloud.com/acs: "true" 
        alibabacloud.com/compute-class: gpu
        alibabacloud.com/compute-qos: default
        alibabacloud.com/gpu-model-series: example-model  # GPU卡型，請按照實際情況替換，例如T4
        alibabacloud.com/serverless-region-id: cn-beijing # 通過該標籤指定地區
        app: nginx-gpu-specified-region
    spec:  
      containers:
        - image: 'mirrors-ssl.aliyuncs.com/nginx:stable-alpine'
          imagePullPolicy: IfNotPresent
          name: nginx
          ports:
            - containerPort: 80
              protocol: TCP
          resources:
            limits:
              cpu: 1
              memory: 1Gi
              nvidia.com/gpu: "1"
            requests:
              cpu: 1
              memory: 1Gi
              nvidia.com/gpu: "1"