Run Reinforcement Learning Jobs on ACK - Container Service for Kubernetes

This document shows how to run a typical reinforcement learning job on an ACK cluster by using the VeRL framework and the Qwen2.5-3B-Instruct model, including environment preparation, image building, job submission, resource monitoring, and best practices.

Container Service for Kubernetes (ACK) provides an efficient, elastic, and scalable containerized platform for enterprises. Reinforcement learning (RL), a key branch of artificial intelligence, often involves substantial computing resources, distributed training, and complex environment simulations. With ACK, you can easily deploy, manage, and scale RL training jobs by using the scheduling capabilities of Kubernetes and the elastic infrastructure of Alibaba Cloud. The following figure shows the component architecture for this job.

Prerequisites

You have created an ACK managed cluster.
- We recommend using GPU instances to accelerate training. This example uses one Lingjun node with eight GU8TF GPUs.
You have obtained the cluster kubeconfig and connected to the cluster by using kubectl.
You have installed the KubeRay Operator component.
(Optional) Enable Object Storage Service (OSS) to persist model checkpoints, logs, and training data.

Step 1: Prepare the training image

This example uses the VeRL framework to run a reinforcement learning job. You can use the official VeRL image or build your own. If you build your own image, ensure that it includes all required dependencies, such as VeRL, vLLM, SGLang, and Ray. Here is an example Dockerfile:

from verl/verl:vllm012.latest

WORKDIR /home/verl

COPY . .

RUN apt update && apt install -y openssh-server vim
RUN apt remove python3-blinker -y; pip install -e .

Step 2: Configure MCP Server and ACK Sandbox

Install the MCP Server and Sandbox components.

Open-source version

# Clone the code repository
git clone https://github.com/openkruise/agents
cd agents
# Generate the deployment YAML for the agents operator
kubectl kustomize config/default >operator-install.yaml
# Modify the configuration in operator-install.yaml as needed
kubectl apply -f operator-install.yaml

# Deploy the test sandbox-manager. For more information, see https://github.com/openkruise/agents/blob/master/config/sandbox-manager/README.md
kubectl kustomize config/sandbox-manager >sandbox-manager.yaml
# The MCP code has not been merged yet, so you must manually change the sandbox-manager image to:
# baicun-business-registry.cn-beijing.cr.aliyuncs.com/baicun-dev/sandbox:sandbox-manager-v12

# Verify that the management pods are running correctly
kubectl get pod -l "app.kubernetes.io/name=sandbox-manager" -A
kubectl get pod -l "app.kubernetes.io/name=sandbox-controller-manager" -A

Marketplace version

On the Clusters page, click the name of the target cluster. In the left navigation pane, choose Add-ons.

Install the Ingress Controller and Sandbox-related components.

Install ack-agent-sandbox-controller
Install the component with the default configuration.

Install `ack-sandbox-manager`

Prepare an E2B domain name.
For detailed instructions on preparing a domain name, configuring DNS resolution, and applying for a certificate, see Use in a production environment.

Configure the component parameters.

Set className to alb (this example uses an installed ALB Ingress Controller), set domain to your actual domain name, and set adminApiKey to a custom API key. Keep other settings at their default values. After installation, an Ingress named sandbox-manager is created in the sandbox-system namespace.

Parameter details

Parameter	Parameter	Description
sandboxManager	replicaCount	The number of sandbox-manager instances. Default: 3.
E2B	domain	The E2B domain name that you prepared in the preceding step.
	Enable E2B_API_KEY verification	Specifies whether to enable API_KEY authentication. Enabled by default.
	adminApiKey	If authentication is enabled, use this parameter to specify the initial key during the first installation. Set this parameter to your custom API key.
Controller	logLevel	The controller log level. Default: 1.
	resources.requests.cpu	The controller's requested CPU resources. Default: 2.
	resources.requests.memory	The controller's requested memory resources. Default: 4Gi.
Proxy	resources.requests.cpu	The proxy's requested CPU resources. Default: 2.
Proxy	resources.requests.memory	The proxy's requested memory resources. Default: 4Gi.
Ingress	className	The name of the IngressClass configured in the cluster, such as `alb` or `mse`.

If you are using the ALB Ingress Controller, you must also add an HTTPS:443 listener configuration for both the ALB instance and the Ingress.
Update the AlbConfig to add an HTTPS:443 listener for the ALB instance.
1. In the left-side navigation pane, choose Workloads > Custom Resources. On the Resource Objects tab, search for AlbConfig and click the search result.
2. In the list of AlbConfig resource objects, find the alb resource and click Edit YAML in the Actions column.
3. Add the spec.listeners.port: 443 and spec.listeners.protocol: HTTPS fields, then click OK.
Update the Ingress to associate the HTTPS:443 listener.
1. In the left-side navigation pane, choose Network > Ingresses. For the sandbox-manager Ingress, click Update in the Actions column.
2. Add the following configuration and click OK.
  - Annotations: alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS": 443}]'

Save the following content as sandbox.yaml and run kubectl apply -f sandbox.yaml to deploy the Sandbox definition. The SandboxSet creates a warm pool of size 3. During reinforcement learning, the SandboxManager continuously consumes Sandboxes from this warm pool.

---
apiVersion: v1
kind: Service
metadata:
  name: mcp-sandbox
spec:
  selector:
    app.kubernetes.io/instance: release-name
    app.kubernetes.io/name: ack-sandbox-manager
    component: sandbox-manager
  type: ClusterIP
  sessionAffinity: None
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800
  ports:
  - name: vllm
    protocol: TCP
    port: 8000
    targetPort: 18082
---
apiVersion: agents.kruise.io/v1alpha1
kind: SandboxSet
metadata:
  annotations:
    # Enable the Envd initialization capability of SandboxManager.
    e2b.agents.kruise.io/should-init-envd: "true"
  name: code-interpreter
  namespace: default
spec:
  # The size of the warm pool. We recommend setting this slightly larger than the estimated request burst.
  replicas: 3
  template:
    spec:
      initContainers:
        - name: init
          image: registry-cn-hangzhou.ack.aliyuncs.com/acs/agent-runtime:v0.0.1
          imagePullPolicy: IfNotPresent
          terminationMessagePolicy: File
          volumeMounts:
            - name: envd-volume
              mountPath: /mnt/envd
          env:
            - name: ENVD_DIR
              value: /mnt/envd
          restartPolicy: Always
      containers:
        - name: sandbox
          image: acs-image-test-01-registry.cn-hangzhou.cr.aliyuncs.com/e2b/code-interpreter:v1.6
          imagePullPolicy: IfNotPresent
          terminationMessagePolicy: File
          env:
            - name: ENVD_DIR
              value: /mnt/envd
          volumeMounts:
            - name: envd-volume
              mountPath: /mnt/envd
          lifecycle:
            postStart:
              exec:
                command:
                  - bash
                  - /mnt/envd/envd-run.sh
          startupProbe:
            failureThreshold: 20
            successThreshold: 1
            httpGet:
              path: /health
              port: 49999
              scheme: HTTP
            initialDelaySeconds: 1
            periodSeconds: 2
            timeoutSeconds: 1
      # Ensure fast container termination to increase the probability of reuse.
      terminationGracePeriodSeconds: 1
      restartPolicy: Always
      dnsPolicy: ClusterFirst
      volumes:
        - name: envd-volume
          emptyDir: { }

(Optional) Step 3: Prepare the dataset

In VeRL, you can download datasets from a remote source by specifying data.train_files. However, because datasets are often large and require preprocessing, we recommend using a preprocessing job to download, preprocess, and upload the data to cloud storage in a production environment.

Save the following content as data.yaml and run kubectl apply -f data.yaml to download data from Hugging Face, preprocess it, and upload it to an OSS bucket.

apiVersion: v1
kind: Secret
metadata:
  name: hf-oss-credentials
  namespace: default
type: Opaque
stringData:
  # Hugging Face token
  HF_TOKEN: "hf_xxxxx"
  # Alibaba Cloud OSS credentials (the alibabacloud-oss-v2 SDK uses environment variables for authentication)
  akId: "xxx"
  akSecret: "xxx"
  OSS_REGION: "xxx"
  OSS_BUCKET: "xxx"
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: preprocess-script
  namespace: default
data:
  preprocess.py: |
    #!/usr/bin/env python3
    """
    Example dataset preprocessing script
    """
    import os
    import json
    from datasets import load_from_disk
    
    def preprocess_dataset(input_dir, output_dir):
        """Preprocess the dataset"""
        print(f"Loading dataset from {input_dir}")
        dataset = load_from_disk(input_dir)
        
        train_dataset = dataset["train"]
        test_dataset = dataset["test"]

        instruction_following = "Let's think step by step and output the final answer after `####`."
    
        # add a row to each data item that represents a unique id
        def make_map_fn(split):
            def process_fn(example, idx):
                question_raw = example.pop("question")
    
                question = question_raw + " " + instruction_following
    
                answer_raw = example.pop("answer")
                solution = extract_solution(answer_raw)
                data = {
                    "data_source": data_source,
                    "agent_name": "tool_agent",
                    "prompt": [
                        {
                            "role": "system",
                            "content": (
                                "You are a math expert. You are given a question and you need to solve it step by step. "
                                "Reasoning step by step before any tool call. "
                                "You should use the `calc_gsm8k_reward` tool after step by step solving the question, "
                                "before generate final answer at least once and refine your answer if necessary. "
                                "Put your final answer in the format of `#### <answer>`."
                            ),
                        },
                        {
                            "role": "user",
                            "content": question,
                        },
                    ],
                    "ability": "math",
                    "reward_model": {"style": "rule", "ground_truth": solution},
                    "extra_info": {
                        "split": split,
                        "index": idx,
                        "answer": answer_raw,
                        "question": question_raw,
                        "need_tools_kwargs": True,
                        "tools_kwargs": {
                            "calc_gsm8k_reward": {
                                "create_kwargs": {"ground_truth": solution},
                                # "execute_kwargs": {},
                                # "calc_reward_kwargs": {},
                                # "release_kwargs": {},
                            },
                        },
                        "interaction_kwargs": {
                            "query": question,
                            "ground_truth": solution,
                        },
                    },
                }
                return data
    
            return process_fn

        train_dataset = train_dataset.map(function=make_map_fn("train"), with_indices=True, num_proc=8)
        test_dataset = test_dataset.map(function=make_map_fn("test"), with_indices=True, num_proc=8)
        
        # Save the processed dataset
        os.makedirs(output_dir, exist_ok=True)
        train_dataset.to_parquet(os.path.join(output_dir, "train.parquet"))
        test_dataset.to_parquet(os.path.join(output_dir, "test.parquet"))
        print(f"Processed dataset saved to {output_dir}")
        
        return output_dir
    
    if __name__ == "__main__":
        input_path = os.environ.get("INPUT_PATH", "/data/raw")
        output_path = os.environ.get("OUTPUT_PATH", "/data/processed")
        preprocess_dataset(input_path, output_path)
---
apiVersion: batch/v1
kind: Job
metadata:
  name: dataset-pipeline
  namespace: default
  labels:
    app: dataset-pipeline
spec:
  backoffLimit: 3
  template:
    metadata:
      labels:
        app: dataset-pipeline
    spec:
      restartPolicy: OnFailure
      volumes:
        # Preprocessing script
        - name: scripts
          configMap:
            name: preprocess-script
            defaultMode: 0755
      containers:
        - name: dataset-pipeline
          image: python:3.10-slim
          command:
            - /bin/bash
            - -c
            - |
              set -e
              
              #==========================================
              # Step 1: Install all dependencies
              #==========================================
              echo "=== Installing dependencies ==="
              pip install --no-cache-dir datasets huggingface_hub pandas numpy alibabacloud-oss-v2 Pillow
              
              #==========================================
              # Step 2: Download the dataset from Hugging Face
              #==========================================
              echo "=== Downloading dataset from Hugging Face ==="
              python3 << 'EOF'
              import os
              from datasets import load_dataset
              from huggingface_hub import login
              
              # Log in to Hugging Face (required for private datasets)
              hf_token = os.environ.get("HF_TOKEN")
              if hf_token:
                  login(token=hf_token)
              
              # Download the dataset
              dataset_name = os.environ.get("DATASET_NAME", "hiyouga/geometry3k")
              dataset_config = os.environ.get("DATASET_CONFIG", None)
              
              print(f"Downloading dataset: {dataset_name}")
              dataset = load_dataset(dataset_name, dataset_config)
              
              # Save locally
              output_path = "/data/raw"
              dataset.save_to_disk(output_path)
              print(f"Dataset saved to {output_path}")
              EOF
              echo "=== Download completed ==="
              
              #==========================================
              # Step 3: Run the preprocessing script
              #==========================================
              echo "=== Running preprocessing script ==="
              python3 /scripts/preprocess.py
              echo "=== Preprocessing completed ==="
              
              #==========================================
              # Step 4: Upload to OSS (using the alibabacloud-oss-v2 SDK)
              #==========================================
              echo "=== Uploading to OSS ==="
              python3 << 'EOF'
              import os
              from pathlib import Path
              import alibabacloud_oss_v2 as oss
              
              # OSS configuration
              bucket_name = os.environ["OSS_BUCKET"]
              region = os.environ["OSS_REGION"]
              oss_prefix = os.environ.get("OSS_PREFIX", "data/geo3k-processed/")
              local_path = os.environ.get("OUTPUT_PATH", "/data/processed")
              
              # Use the environment variable credentials provider (automatically reads OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET)
              credentials_provider = oss.credentials.EnvironmentVariableCredentialsProvider()
              
              # Load the default configuration and set the credentials provider
              cfg = oss.config.load_default()
              cfg.credentials_provider = credentials_provider
              cfg.region = region
              
              # Create an OSS client
              client = oss.Client(cfg)
              
              def upload_directory(local_dir, oss_prefix):
                  """Recursively upload a directory to OSS"""
                  local_path = Path(local_dir)
                  uploaded_count = 0
                  failed_count = 0
                  
                  for file_path in local_path.rglob("*"):
                      if file_path.is_file():
                          relative_path = file_path.relative_to(local_path)
                          oss_key = f"{oss_prefix}{relative_path}"
                          
                          try:
                              # Read file content
                              with open(file_path, 'rb') as f:
                                  data = f.read()
                              
                              # Upload to OSS
                              result = client.put_object(oss.PutObjectRequest(
                                  bucket=bucket_name,
                                  key=oss_key,
                                  body=data,
                              ))
                              print(f"Uploaded: {file_path} -> {oss_key} (status: {result.status_code})")
                              uploaded_count += 1
                          except Exception as e:
                              print(f"Failed to upload {file_path}: {e}")
                              failed_count += 1
                  
                  return uploaded_count, failed_count
              
              uploaded, failed = upload_directory(local_path, oss_prefix)
              print(f"=== Upload completed: {uploaded} files uploaded, {failed} files failed ===")
              if failed > 0:
                  raise Exception(f"{failed} files failed to upload")
              EOF
              
              echo "=== Pipeline completed successfully ==="
          env:
            # Hugging Face configuration
            - name: HF_TOKEN
              valueFrom:
                secretKeyRef:
                  name: hf-oss-credentials
                  key: HF_TOKEN
            - name: DATASET_NAME
              value: "hiyouga/geometry3k"
            - name: HF_HOME
              value: "/tmp/huggingface"
            # Preprocessing configuration
            - name: INPUT_PATH
              value: "/data/raw"
            - name: OUTPUT_PATH
              value: "/data/processed"
            # OSS configuration
            - name: OSS_ACCESS_KEY_ID
              valueFrom:
                secretKeyRef:
                  name: hf-oss-credentials
                  key: akId
            - name: OSS_ACCESS_KEY_SECRET
              valueFrom:
                secretKeyRef:
                  name: hf-oss-credentials
                  key: akSecret
            - name: OSS_REGION
              valueFrom:
                secretKeyRef:
                  name: hf-oss-credentials
                  key: OSS_REGION
            - name: OSS_BUCKET
              valueFrom:
                secretKeyRef:
                  name: hf-oss-credentials
                  key: OSS_BUCKET
            - name: OSS_PREFIX
              value: "data/geo3k-processed/"
          volumeMounts:
            - name: scripts
              mountPath: /scripts
          resources:
            requests:
              memory: "2Gi"
              cpu: "1"
            limits:
              memory: "16Gi"
              cpu: "4"

Step 4: Apply job configuration

Save the following content as pvpvc.yaml and run kubectl apply -f pvpvc.yaml to mount an OSS static PersistentVolume using a PersistentVolume and a PersistentVolumeClaim.

The following example uses an AccessKey pair for authentication. For RRSA authentication, see Use a static PersistentVolume with ossfs 2.0.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: ym-dataset
  labels:
    alicloud-pvname: ym-dataset
spec:
  capacity:
    storage: 20Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: ossplugin.csi.alibabacloud.com
    volumeHandle: ym-dataset # Must be the same as the PV name.
    nodePublishSecretRef:
      name: hf-oss-credentials
      namespace: default
    volumeAttributes:
      bucket: "xxxx" # Replace with your actual bucket name.
      url: "oss-ap-southeast-1-internal.aliyuncs.com" # Replace with your actual OSS endpoint.
      otherOpts: "-o umask=022 -o max_stat_cache_size=100000 -o allow_other -o dbglevel=debug -o curldbg"
      path: "/"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ym-dataset
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 20Gi
  selector:
    matchLabels:
      alicloud-pvname: ym-dataset
      
# (Optional) The model can be downloaded on demand by specifying a Hugging Face repository path.
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: ym-models
  labels:
    alicloud-pvname: ym-models
spec:
  capacity:
    storage: 20Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: ossplugin.csi.alibabacloud.com
    volumeHandle: ym-models # Must be the same as the PV name.
    nodePublishSecretRef:
      name: hf-oss-credentials
      namespace: default
    volumeAttributes:
      bucket: "xxxx" # Replace with your actual bucket name.
      url: "oss-ap-southeast-1-internal.aliyuncs.com" # Replace with your actual OSS endpoint.
      otherOpts: "-o umask=022 -o max_stat_cache_size=100000 -o allow_other -o dbglevel=debug -o curldbg"
      path: "/"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ym-models
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 20Gi
  selector:
    matchLabels:
      alicloud-pvname: ym-models

Save the following content as configs.yaml and run kubectl apply -f configs.yaml to apply the job-related configurations.

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: gsm8k-configs
  namespace: default
data:
  gsm8k_multiturn_grpo.yaml: |
    hydra:
      searchpath:
        - file://verl/trainer/config
    defaults:
      - ppo_trainer
      - _self_
    data:
      max_prompt_length: 1024
      max_response_length: 1024
      train_batch_size: 256
      return_raw_chat: True
    actor_rollout_ref:
      hybrid_engine: True
      rollout:
        name: vllm
        multi_turn:
          enable: True
          max_assistant_turns: 5
  mcp_server.json: |
    {
        "mcpServers": {
            "Tavily Expert": {
                "url": "xxxxx", # Replace with the Sandbox MCP Ingress endpoint.
                "api_key": "xxxxx" # If an API key is needed, you can add an NGINX container to the Ray cluster as a proxy.
            }
        }
    }
  gsm8k_mcp_tool_config.yaml: |
    tools:
    - class_name: verl.tools.mcp_search_tool.MCPSearchTool
      config:
        rate_limit: 120
        timeout: 120
        type: mcp
      mcp:
        mcp_servers_config_path: /var/configs/mcp_server.json
        tool_selected_list: 
          - run_code_once
    - class_name: "verl.tools.gsm8k_tool.Gsm8kTool"
      config: 
        type: native
      tool_schema:
        type: "function"
        function:
          name: "calc_gsm8k_reward"
          description: "A tool for calculating the reward of gsm8k. (1.0 if parsed answer is correct, 0.0 if parsed answer is incorrect or not correctly parsed)"
          parameters:
            type: "object"
            properties:
              answer:
                type: "string"
                description: "The model's answer to the GSM8K math problem, must be a digits"
            required: ["answer"]

Step 5: Submit the job

In VeRL, you can use the MCPSearchTool to query tools provided by the MCP Server. At the start of each use case, an AgentLoop connects to the MCP Server and calls tools during a multi-turn conversation.

Save the following content as rayjob.yaml and run kubectl apply -f rayjob.yaml to submit the reinforcement learning job.

---
apiVersion: ray.io/v1
kind: RayJob
metadata:
  name: rayjob-example
  namespace: default
spec:
  shutdownAfterJobFinishes: false
  # ttlSecondsAfterFinished: 300
  runtimeEnvYAML: |
    working_dir: /home/verl
  submissionMode: SidecarMode
  entrypoint: |
    python3 -m verl.trainer.main_ppo \
      --config-path=/var/configs \
      --config-name='gsm8k_multiturn_grpo' \
      algorithm.adv_estimator=grpo \
      data.train_batch_size=16 \
      data.max_prompt_length=1024 \
      data.max_response_length=1024 \
      data.filter_overlong_prompts=True \
      data.truncation='error' \
      data.return_raw_chat=True \
      actor_rollout_ref.model.path=/var/model/Qwen2.5-3B-Instruct \
      actor_rollout_ref.actor.optim.lr=1e-6 \
      actor_rollout_ref.model.use_remove_padding=True \
      actor_rollout_ref.actor.ppo_mini_batch_size=8 \
      actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=1 \
      actor_rollout_ref.actor.use_kl_loss=True \
      actor_rollout_ref.actor.kl_loss_coef=0.001 \
      actor_rollout_ref.actor.kl_loss_type=low_var_kl \
      actor_rollout_ref.actor.entropy_coeff=0 \
      actor_rollout_ref.model.enable_gradient_checkpointing=True \
      actor_rollout_ref.actor.fsdp_config.param_offload=False \
      actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
      actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=1 \
      actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
      actor_rollout_ref.rollout.name=vllm \
      actor_rollout_ref.rollout.mode=async \
      actor_rollout_ref.rollout.gpu_memory_utilization=0.5 \
      actor_rollout_ref.rollout.n=16 \
      actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=1 \
      actor_rollout_ref.ref.fsdp_config.param_offload=True \
      actor_rollout_ref.rollout.trace.backend=mlflow \
      actor_rollout_ref.rollout.trace.token2text=True \
      algorithm.use_kl_in_reward=False \
      trainer.critic_warmup=0 \
      trainer.logger='["console","mlflow"]' \
      trainer.project_name='gsm8k_tool-agent' \
      trainer.experiment_name='qwen2.5-3b_function_rm-gsm8k-vllm-tool-agent-verify-n16' \
      trainer.n_gpus_per_node=8 \
      trainer.nnodes=1 \
      trainer.save_freq=1 \
      trainer.test_freq=20 \
      trainer.total_training_steps=1 \
      data.train_files=/var/model-dataset/processed-gsm8k/train20.parquet \
      data.val_files=/var/model-dataset/processed-gsm8k/test100.parquet \
      actor_rollout_ref.rollout.multi_turn.tool_config_path="/var/configs/gsm8k_mcp_tool_config.yaml" \
      actor_rollout_ref.actor.checkpoint.save_contents='["hf_model", "model"]' \
      trainer.total_epochs=1 
  rayClusterSpec:
    headGroupSpec:
      rayStartParams:
        dashboard-host: 0.0.0.0
      serviceType: ClusterIP
      template:
        metadata:
          annotations: 
          labels:
        spec:
          affinity: {}
          tolerations:
          - key: node-role.alibabacloud.com/lingjun
          containers:
          - env:
            - name: VERL_ROOT
              value: /home/verl
            image: registry-ap-southeast-1.ack.aliyuncs.com/dev/verl:vllm012.latest.43dc9a44
            imagePullPolicy: IfNotPresent
            name: ray-head
            resources:
              limits:
                cpu: "100"
                memory: 500Gi
                nvidia.com/gpu: "8"
            securityContext:
              runAsUser: 0
            volumeMounts:
            - mountPath: /var/configs
              name: configs
            - mountPath: /var/model
              name: model
            - mountPath: /var/model-dataset
              name: model-dataset
          imagePullSecrets: 
          - name: regcred-hangzhou
          - name: regcred-ap-southeast
          volumes:
          - name: configs
            configMap:
              name: gsm8k-configs
          - name: model
            persistentVolumeClaim:
              claimName: ym-models
          - name: model-dataset
            persistentVolumeClaim:
              claimName: ym-dataset

Container Service for Kubernetes:Run Reinforcement Learning Jobs on ACK

Prerequisites

Step 1: Prepare the training image

Step 2: Configure MCP Server and ACK Sandbox

Open-source version

Marketplace version

Install `ack-agent-sandbox-controller`

Install `ack-sandbox-manager`

(Optional) Step 3: Prepare the dataset

Step 4: Apply job configuration

Step 5: Submit the job

Prerequisites

Step 1: Prepare the training image

Step 2: Configure MCP Server and ACK Sandbox

Open-source version

Marketplace version

Install ack-agent-sandbox-controller

Install ack-sandbox-manager

(Optional) Step 3: Prepare the dataset

Step 4: Apply job configuration

Step 5: Submit the job

Install `ack-agent-sandbox-controller`

Install `ack-sandbox-manager`