All Products
Search
Document Center

Container Service for Kubernetes:Securely Deploy vLLM Inference Services in ACK Heterogeneous Confidential Computing Clusters

Last Updated:Mar 07, 2026

Large Language Model (LLM) inference involves sensitive data and core model assets. Running LLMs in untrusted environments risks data and model leakage. ACK Confidential AI (ACK-CAI), a confidential AI solution provided by ACK, integrates hardware confidential computing technologies such as Intel TDX and GPU trusted execution environment (TEE) to provide end-to-end security for model inference.

You can use ACK-CAI to deploy vLLM model inference services in ACK heterogeneous confidential computing clusters. This ensures secure isolation and encryption protection for models and data. The advantages are as follows:

  • Hardware-level security isolation: Build a hardware-level trusted execution environment (TEE) using Intel® TDX and NVIDIA GPU TEE technology. This ensures the confidentiality and integrity of models and data during computation.

  • Trusted key distribution: Strictly verify the runtime environment using a remote attestation mechanism. After successful verification, the dedicated Trustee service distributes model decryption keys to the trusted environment.

  • End-to-end data encryption: Establish a client-to-server encrypted channel using a Trusted Network Gateway (TNG). This protects the security of inference requests and response data during transmission.

  • Non-intrusive to applications: Automatically inject security components into Pods based on Kubernetes Webhook. Simply enable security capabilities for applications using Annotation, without modifying business code or images.

How it works

ACK-CAI provides transparent confidential computing capabilities for application Pods by dynamically injecting a set of Sidecar containers named Trustiflux. Its core security mechanism relies on remote attestation, ensuring that models and data are accessed only in a trusted environment.

image

Expand to view core component descriptions

  1. ACK heterogeneous confidential computing cluster: A Kubernetes cluster built on TDX confidential instances and GPU confidential computing capabilities.

  2. Trustee remote attestation service: Provides runtime environment trustworthiness verification and distributes model decryption keys after successful verification.

  3. Runtime Trustiflux: A confidential computing runtime component provided as a Sidecar, including the following core modules:

    • Attestation Agent (AA): Performs remote attestation and key retrieval.

    • Confidential Data Hub (CDH): Handles ciphertext data decryption.

    • Trusted Network Gateway Server (TNG Server): Establishes a secure communication channel.

    • Cachefs: Provides model decryption support.

  4. Inference service: A container that carries out actual Large Language Model inference tasks.

  5. Inference program: The client-side program for accessing model inference services.

  6. Trusted Network Gateway Client (TNG Client): Establishes a secure communication channel with the cluster, ensuring communication security.

Expand to view core security mechanisms

The solution primarily includes two security mechanisms:

  • Remote attestation-based encrypted model distribution:

    1. When the Pod starts, the Attestation Agent (AA) in the Sidecar sends a request to the Trustee remote attestation service.

    2. The Trustee service performs trustworthiness verification for the CPU (TDX) and GPU confidential environments.

    3. After successful verification, the Trustee service securely distributes the model decryption key to the Pod.

    4. The Confidential Data Hub (CDH) and Cachefs in the Sidecar use this key to decrypt the encrypted model files and mount them into the inference service container.

  • Remote attestation-based end-to-end encrypted inference:

    1. The end user's inference program sends requests through the local Trusted Network Gateway Client (TNG Client).

    2. Requests remain encrypted throughout transmission, preventing man-in-the-middle attacks.

    3. After reaching the server-side, the Trusted Network Gateway (TNG) module in the Sidecar decrypts the requests, which are then processed by the inference service.

    4. The TNG encrypts the inference results and securely returns them to the client.

Process and environment guide

Deploying and accessing a secure vLLM inference service involves the following stages:

Step

Purpose

Environment

Step 1: Prepare the encrypted model

Encrypt the inference model and upload it to Object Storage Service (OSS) to ensure secure static storage.

A dedicated data preparation server

Step 2: Deploy the Trustee remote attestation service

Deploy a dedicated Trustee verification service as the root of trust to verify the environment and distribute keys.

A dedicated Trustee server

Step 3: Configure the ACK confidential computing cluster

Create and configure Kubernetes nodes for confidential computing tasks.

  • Alibaba Cloud Management Console (ACK, ECS)

  • Shell environment of an ecs.gn8v-tee instance

Step 4: Deploy ACK-CAI components

Install CAI components in the cluster to dynamically inject security capabilities into applications.

ACK console

Step 5: Deploy the vLLM model inference service

Deploy the vLLM service to the cluster using Helm, and enable confidential computing protection via Annotation.

A machine with kubectl and Helm configured, connected to the API Server

Step 6: Securely access the inference service

Start a client security agent to access the deployed model service through an encrypted channel.

Client environment

Step 1: Prepare encrypted models

This section uses an encryption tool to process model data and uploads it to Object Storage Service (OSS), preparing for subsequent encrypted distribution.

Execution environment: To achieve secure isolation, prepare a temporary ECS instance to download, encrypt, and upload models. We recommend that the ECS instance is in the same region as the OSS Bucket for high-speed upload of encrypted model data over the private network.
Model files are large, and the process takes a long time. To quickly experience the solution, skip this section. You can use the encrypted model example files for a trial, and proceed directly to Step 2: Deploy the Trustee remote attestation service.

1. Download a model

Before deploying a model to the cloud, encrypt it and upload it to cloud storage. The decryption key is managed by KMS and controlled by the remote attestation service. Perform model encryption in a local or trusted environment. This example uses the Qwen2.5-3B-Instruct LLM.

Note

If you already have a model, you can skip this section and proceed to 2. Encrypt the model.

The Qwen2.5-3B-Instruct model requires Python 3.9 or later. To download the model using the ModelScope tool, run the following command in the terminal.

pip3 install modelscope importlib-metadata
modelscope download --model Qwen/Qwen2.5-3B-Instruct

After success, the model downloads to ~/.cache/modelscope/hub/models/Qwen/Qwen2.5-3B-Instruct/.

2. Encrypt models

Currently, you can encrypt models using Gocryptfs encryption mode (based on the AES256-GCM open standard).

  1. Install the Gocryptfs tool to encrypt models. Currently, only Gocryptfs v2.4.0 that uses default encryption parameters is supported. You can choose one of the following installation methods:

    Method 1: (Recommended) Install from a yum source

    If you use the Alinux 3 or AnolisOS 23 operating system, you can use a yum source to install Gocryptfs.

    Alinux 3
    sudo yum install gocryptfs -y
    AnolisOS 23
    sudo yum install anolis-epao-release -y
    sudo yum install gocryptfs -y

    Method 2: Directly download the precompiled binary file

    # Download the precompiled Gocryptfs package.
    wget https://github.jobcher.com/gh/https://github.com/rfjakob/gocryptfs/releases/download/v2.4.0/gocryptfs_v2.4.0_linux-static_amd64.tar.gz
    
    # Decompress and install the package.
    tar xf gocryptfs_v2.4.0_linux-static_amd64.tar.gz
    sudo install -m 0755 ./gocryptfs /usr/local/bin
  2. Create a Gocryptfs key file to use as the model encryption key. You must upload this key to the Trustee remote attestation service for management in a subsequent step.

    In this topic, 0Bn4Q1wwY9fN3P is used as the key to encrypt the model. The key content is stored in the cachefs-password file. You can also customize the key. In practice, we recommend that you use a randomly generated strong key.

    cat << EOF > ~/cachefs-password
    0Bn4Q1wwY9fN3P
    EOF
  3. Use the created key to encrypt the model.

    1. Configure the path of the plaintext model.

      Note

      Specify the path where the plaintext model you just downloaded is located. If you have other models, replace the path with the actual path of your target model.

      PLAINTEXT_MODEL_PATH=~/.cache/modelscope/hub/models/Qwen/Qwen2.5-3B-Instruct/
    2. Use Gocryptfs to encrypt the model directory tree.

      After the encryption is complete, the model is stored as ciphertext in the ./cipher directory.

      mkdir -p ~/mount
      cd ~/mount
      mkdir -p cipher plain
      
      # Install Gocryptfs runtime dependencies.
      sudo yum install -y fuse
      
      # Initialize Gocryptfs.
      cat ~/cachefs-password | gocryptfs -init cipher
      
      # Mount to plain.
      cat ~/cachefs-password | gocryptfs cipher plain
      
      # Move the AI model to ~/mount/plain.
      cp -r ${PLAINTEXT_MODEL_PATH}/. ~/mount/plain

3. Upload the model

Prepare an OSS Bucket in the same region where you will deploy the heterogeneous instance. Upload the encrypted model to Alibaba Cloud OSS. This lets you pull and deploy the model from the heterogeneous instance later.

Take OSS as an example. You can create a bucket and a directory named qwen-encrypted, such as oss://examplebucket/qwen-encrypted/. For more information, see Quick Start for the console. Because the model file is large, we recommend using ossbrowser to upload the encrypted model to this directory.

Step 2: Deploy the Trustee remote attestation service

Following the zero trust principle, any confidential computing environment must pass verification before gaining permissions to access sensitive data, such as model decryption keys. This step deploys a dedicated Trustee service to verify the runtime environment of models and inference services. This ensures that model decryption keys are injected only when the environment is confirmed trustworthy, and verifies the environment's trustworthiness when the client-side initiates an inference request.

Execution environment: A dedicated, standalone server deployed outside the ACK cluster, such as an ECS instance or an on-premises private server.

1. Select deployment solutions

Based on the principles of security isolation and trust independence, Trustee must be deployed on a standalone server outside the ACK heterogeneous confidential computing cluster. There are two recommended solutions, depending on different trust level requirements:

Trust level: The higher the degree of software and hardware control a cloud service provider has over the Trustee deployment environment, the lower the trust level. This is because the Trustee service acts as the root of trust for remote attestation and distributing confidential computing/trusted computing resources in the cloud. Under a strict trust model, the Trustee owner must have full control over all software and hardware in the deployment environment, ensuring it runs in a customer-controlled trusted environment.
  • ECS instances

    Create an additional ECS instance within the same VPC as the ACK cluster to specifically run the Trustee service. This allows for efficient and secure communication over the Alibaba Cloud private network, and ensures complete logical and physical isolation between the Trustee service and the confidential computing environment.

  • On-premises private servers

    For scenarios with extremely high security requirements, deploy Trustee in your own data center or on-premises server. Connect it to the cloud VPC network via a leased line or VPN. This ensures that you have full control over the software and hardware environment of the root of trust, unaffected by cloud vendors.

Before use, ensure the server has public network access and that port 8081 is open.

2. Deploy the Trustee service

Trustee is packaged in RPM format and included in the official YUM repositories of Alibaba Cloud Linux 3.x and Anolis (8.x and later). Install it using a system package management tool. After installation, systemd automatically manages and starts the service.

  1. On the prepared server, execute the following command to install and start Trustee using the YUM repository:

    yum install trustee-1.5.2

    Trustee automatically starts and listens on port 8081 by default. You can directly access it over the network as a URL, using the deployment environment IP plus the service port number, for example, http://<trustee-ip>:8081/api.

    Here, <trustee-ip> is the IP address of the server where Trustee is deployed.
    If you use Trustee in a production environment, we recommend configuring HTTPS access for Trustee to enhance security.
  2. Run the following command to check the health status of service components:

    You can execute sudo yum install -y jq to install the jq tool.
    # Replace <trustee-ip> with the Trustee server IP
    curl http://<trustee-ip>:8081/api/services-health | jq

    In the expected output, if all service statuses are ok, the service is normal.

    {
      "gateway": {
        "status": "ok",
        "timestamp": "2025-08-26T13:46:13+08:00"
      },
      "kbs": {
        "status": "ok",
        "timestamp": "2025-08-26T13:46:13+08:00"
      },
      "as": {
        "status": "ok",
        "timestamp": "2025-08-26T13:46:13+08:00"
      },
      "rvps": {
        "status": "ok",
        "timestamp": "2025-08-26T13:46:13+08:00"
      }
    }

Expand to view common Trustee service management commands

The Trustee service is managed by systemd. Use the systemctl command for lifecycle management. Common operations include the following:

  • Start service: systemctl start trustee

  • Stop service: systemctl stop trustee

  • Restart service: systemctl restart trustee

  • Check status: systemctl status trustee

3. Import model decryption keys to the Trustee instance

After the Trustee service is deployed, you must provide it with the model decryption key. This key is the basis for subsequent remote attestation and secure key distribution to inference services.

Trustee manages keys by mapping local file paths to resource IDs. The following operations create and import a model decryption key into the default key storage directory.

  1. Execute the following commands to create a key directory (create a subdirectory named aliyun in the local directory /opt/trustee/kbs/repository/default/) and write the key content:

    Replace <model decryption key> with the actual key string. This example uses 0Bn4Q1wwY9fN3P.
    sudo mkdir -p /opt/trustee/kbs/repository/default/aliyun/
    sudo sh -c 'echo -n "<model decryption key>" > /opt/trustee/kbs/repository/default/aliyun/model-decryption-key'
  2. Verify the key ID.

    After completing the above operations, the key stored at the file path .../aliyun/model-decryption-key will have a corresponding key ID in the Trustee system: kbs:///default/aliyun/model-decryption-key.

Step 3: Configure the ACK confidential computing cluster

This section builds an underlying infrastructure with hardware-level security isolation capabilities for running confidential computing tasks. It involves creating an ACK cluster and adding ecs.gn8v-tee instances with Intel TDX and NVIDIA TEE capabilities as worker nodes.

Execution environment: ECS, ACK console (for creating clusters, node pools, and ECS instances), and the Shell environment of the created ecs.gn8v-tee instance (for installing drivers).
  1. Create an ACK managed cluster Pro edition in the China (Beijing) region. For more information, see Create an ACK managed cluster.

  2. Create a node pool for the cluster to manage confidential computing instances. For more information, see Create and manage node pools.

    • vSwitch: Select the virtual switch in China (Beijing) Zone L.

    • Scaling Mode: Keep default configurations. Do not enable automatic elastic scaling.

    • Instance type: ecs.gn8v-tee.4xlarge and above.

    • Operating System: Alibaba Cloud Linux 3.2104 LTS 64-bit.

    • System Disk: 100 GiB or more.

    • Expected Number of Nodes: The initial number of nodes in the node pool. Keep default configurations, which is 0.

    • Node Labels: Add labels (Key: ack.aliyun.com/nvidia-driver-version, Value: 550.144.03) to specify the NVIDIA driver version.

  3. Create EGS confidential computing instances as cluster nodes. For more information, see Custom purchase instances.

    • Region: China (Beijing).

    • Network and zone: VPC consistent with the cluster VPC, Zone L.

    • Instance type: ecs.gn8v-tee.4xlarge and above.

      gn8v-tee instance types have CPU and GPU confidential computing features enabled by default. No need to additionally select confidential virtual machines.
    • Image: Alibaba Cloud Linux 3.2104 LTS 64-bit.

  4. Log on to the created EGS instance and install NVIDIA drivers and CUDA toolkit. For more information, see Step 1: Install NVIDIA drivers and CUDA toolkit.

  5. Add the EGS instance to the previously created node pool. Select manual addition as the method. For more information, see Add existing nodes.

Step 4: Deploy ACK-CAI components

Based on ACK-CAI components, enable non-intrusive confidential computing capabilities for applications in the cluster. This component includes a Webhook controller that automatically injects Sidecar containers into Pods based on their Annotation. These Sidecar containers provide remote attestation, model decryption, and secure communication.

Execution environment: ACK console.
  1. Log on to the Container Service Management Console . In the navigation pane on the left, click Clusters.

  2. On the Clusters page, click the name of your cluster. In the navigation pane on the left, click Applications > Helm.

  3. Click Create and follow the on-screen prompts to install the latest version of ACK-CAI.

    After installation is complete, view the deployment status in the Helm Chart list.

Step 5: Deploy the vLLM model inference service

After the basic environment and security components are ready, this section uses Helm to deploy the vLLM service. Add specific Annotations to declare that the application requires security enhancements by ACK-CAI.

Execution environment: A machine with kubectl and Helm configured and access to the cluster. Use it directly in Workbench or CloudShell.
  1. Create an empty Helm Chart directory.

    mkdir -p ack-cai-vllm-demo
    cd ack-cai-vllm-demo
  2. Initialize a Helm Chart for deploying the vLLM service.

    This Helm Chart forcibly schedules the vLLM inference service to confidential computing GPU nodes and uses a CSI plugin to store models in OSS.

    Expand to view Helm Chart initialization script

    # Create template file
    mkdir -p ./templates
    cat <<EOF >templates/vllm.yaml
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pv-oss
      namespace: {{ .Release.Namespace }}
      labels:
        alicloud-pvname: pv-oss
    spec:
      capacity:
        storage: 5Gi
      accessModes:
        - ReadOnlyMany
      persistentVolumeReclaimPolicy: Retain
      csi:
        driver: ossplugin.csi.alibabacloud.com
        volumeHandle: pv-oss
        volumeAttributes:
          bucket: {{ .Values.oss.bucket }}
          path: {{ .Values.oss.path }}
          url: {{ .Values.oss.url }}
          otherOpts: "-o umask=022 -o max_stat_cache_size=0 -o allow_other"
        nodePublishSecretRef:
          name: oss-secret
          namespace: {{ .Release.Namespace }}
    
    ---
    
    apiVersion: v1
    kind: Secret
    metadata:
      name: oss-secret
      namespace: {{ .Release.Namespace }}
    stringData:
      akId: {{ .Values.oss.akId }}
      akSecret: {{ .Values.oss.akSecret }}
    
    ---
    
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: pvc-oss
      namespace: {{ .Release.Namespace }}
    spec:
      accessModes:
        - ReadOnlyMany
      resources:
        requests:
          storage: 5Gi
      selector:
        matchLabels:
          alicloud-pvname: pv-oss
    
    ---
    
    apiVersion: v1
    kind: Service
    metadata:
      name: cai-vllm-svc
      namespace: {{ .Release.Namespace }}
      {{- if .Values.loadbalancer}}
      {{- if .Values.loadbalancer.aclId }}
      annotations:
        service.beta.kubernetes.io/alibaba-cloud-loadbalancer-acl-status: "on"
        service.beta.kubernetes.io/alibaba-cloud-loadbalancer-acl-id: {{ .Values.loadbalancer.aclId }}
        service.beta.kubernetes.io/alibaba-cloud-loadbalancer-acl-type: "white"
      {{- end }}
      {{- end }}
      labels:
        app: cai-vllm
    spec:
      ports:
      - port: 8080
        protocol: TCP
        targetPort: 8080
      selector:
        app: cai-vllm
      type: LoadBalancer
    
    ---
    
    apiVersion: v1
    kind: Pod
    metadata:
      name: cai-vllm
      namespace: {{ .Release.Namespace }}
      labels:
        app: cai-vllm
        trustiflux.alibaba.com/confidential-computing-mode: "ACK-CAI"
      annotations:
        trustiflux.alibaba.com/ack-cai-options: |
    {{ .Values.caiOptions | indent 6 }}
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node.kubernetes.io/instance-type
                operator: In
                values:
                  - ecs.gn8v-tee.4xlarge
                  - ecs.gn8v-tee.6xlarge
                  - ecs.gn8v-tee-8x.16xlarge
                  - ecs.gn8v-tee-8x.48xlarge
      containers:
        - name: inference-service
          image: egslingjun-registry.cn-wulanchabu.cr.aliyuncs.com/egslingjun/llm-inference:vllm0.5.4-deepgpu-llm24.7-pytorch2.4.0-cuda12.4-ubuntu22.04
          command:
            - bash
          args: ["-c", "vllm serve /tmp/model --port 8080 --host 0.0.0.0 --served-model-name qwen2.5-3b-instruct --device cuda --dtype auto"]
          ports:
            - containerPort: 8080
          resources:
            limits:
              nvidia.com/gpu: 1  # Request 1 GPU card for this container
          volumeMounts:
            - name: pvc-oss
              mountPath: "/tmp/model"
    
      volumes:
        - name: pvc-oss
          persistentVolumeClaim:
            claimName: pvc-oss
    
    EOF
    
    # Create Helm Chart description file
    cat <<EOF > ./Chart.yaml
    apiVersion: v2
    name: vllm
    description: A test based on vllm for ack-cai
    type: application
    version: 0.1.0
    appVersion: "0.1.0"
    EOF
    
    # Create empty Helm Chart variables file values.yaml
    touch values.yaml
    
    
    
  3. Edit the values.yaml file and fill in the environment context.

    Replace <trustee-ip> with the Trustee address, and replace the actual OSS parameter information.
    caiOptions: |
      {
          "cipher-text-volume": "pvc-oss",
          "model-decryption-key-id" : "kbs:///default/aliyun/model-decryption-key",
          "trustee-address": "http://<trustee-ip>:8081/api"
      }
    oss:
      bucket: "conf-ai"                          # Replace with the OSS Bucket name where encrypted models are stored
      path: "/qwen2.5-3b-gocryptfs/"             # Replace with the path of the encrypted model file within the OSS Bucket
      url: "https://oss-cn-beijing-internal.aliyuncs.com"   # Replace with the OSS Endpoint
      akId: "xxxxx"                              # Replace with the Alibaba Cloud AK ID
      akSecret: "xxxxx"                          # Replace with the Alibaba Cloud AK Secret
  4. Deploy the vLLM service.

    helm install vllm . -n default
  5. Check if the CAI component's Sidecar containers are successfully injected into the Pod.

    kubectl get pod cai-vllm -n default -o jsonpath='{range .status.initContainerStatuses[*]}{.name}{"\t"}{range .state.running}Running{end}{.state.*.reason}{"\n"}{end}{range .status.containerStatuses[*]}{.name}{"\t"}{range .state.running}Running{end}{.state.*.reason}{"\n"}{end}'

    In the expected output, the following 5 containers are displayed, indicating successful injection. Wait until all containers change from PodInitializing to Running, which indicates that the service has started.

    cai-sidecar-attestation-agent   Running
    cai-sidecar-confidential-data-hub       Running
    cai-sidecar-tng Running
    cai-sidecar-cachefs     Running
    inference-service       Running
  6. Get and record the access address of the vLLM service.

    kubectl get service cai-vllm-svc -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].port}{"\n"}'

    The expected output will return a URL in a similar format (<vllm-ip>:<port>):

    http://182.XX.XX.225:8080

Step 6: Securely access the inference service

In addition to server-side security protection, establish an end-to-end encrypted communication link from client to server. This ensures the security of inference data during transmission. This section starts a TNG security gateway on the client, creating a local proxy that automatically encrypts all requests sent to the vLLM service and decrypts received responses.

Execution environment: Client environment, which is any machine that needs to invoke the vLLM inference service.
  1. Start the TNG gateway on the client to establish a secure communication channel.

    The TNG gateway creates a local proxy on the client to encrypt requests sent to the server.
    Replace <IP> with the Trustee address.
    docker run -d \
        --network=host \
        confidential-ai-registry.cn-shanghai.cr.aliyuncs.com/product/tng:2.2.4 \
        tng launch --config-content '
          {
            "add_ingress": [
              {
                "http_proxy": {
                  "proxy_listen": {
                    "host": "0.0.0.0",
                    "port": 41000
                  }
                },
                "encap_in_http": {},
                "verify": {
     "as_addr": "http://<trustee-ip>:8081/api/attestation-service/",
                  "policy_ids": [
                    "default"
                  ]
                }
              }
            ]
          }
    '
  2. Access the vLLM service through the TNG proxy.

    Replace <vllm-ip>:<port> with the previously obtained access address of the vLLM service.
    # Set the http_proxy environment variable
    export http_proxy=http://127.0.0.1:41000
    
    # Send a curl request
    curl http://<vllm-ip>:<port>/v1/completions \
      -H "Content-type: application/json" \
      -d '{
        "model": "qwen2.5-3b-instruct",
        "prompt": "San Francisco is a",
        "max_tokens": 7,
        "temperature": 0
        }'

Reference information

caiOptions configuration description

caiOptions accepts a JSON-formatted configuration object. ACK CAI's Admission Webhook parses these parameters and dynamically injects and configures necessary security components (such as AA, CDH, etc.) into the Pod. This achieves functions such as transparent encryption/decryption, remote attestation, and trusted network.

The following is a complete caiOptions configuration example.

{
  "cipher-text-volume": "pvc-oss",
  "model-decryption-key-id": "kbs:///default/aliyun/model-decryption-key",
  "trustee-address": "http://<trustee-ip>:8081/api",
  "aa-version": "1.3.1",
  "cdh-version": "1.3.1",
  "tng-version": "2.2.4",
  "cachefs-version": "1.0.7-2.6.1",
  "tdx-ra-enable": true,
  "gpu-ra-enable": true,
  "tng-http-secure-ports": [
    {
      "port": 8080
    }
  ]
}

Configuration item details:

Configuration item

Optional

Description

cipher-text-volume

Required

The PVC name that stores encrypted model data. ACK-CAI automatically decrypts the data mounted by this PVC in a trusted environment.

model-decryption-key-id

Required

The KBS URI of the model decryption key. The format is kbs:///<repository>/<group>/<key>.

trustee-address

Required

The address of the Trustee service, used for remote attestation and key retrieval.

aa-version

Optional

The version of the Attestation Agent (AA) component.

cdh-version

Optional

The version of the Confidential Data Hub (CDH) component.

tng-version

Optional

The version of the Trusted Network Gateway (TNG) component.

cachefs-version

Optional

The version of the Cachefs component.

tdx-ra-enable

Optional

Whether to enable remote attestation support for CPU (TDX confidential instances). The default is true.

gpu-ra-enable

Optional

Whether to enable remote attestation support for GPU. The default is true.

tng-http-secure-ports

Optional

Configure TNG to perform TLS encryption for traffic on specific HTTP ports. This accepts an array of objects, where each object represents a port encryption rule.

"tng-http-secure-ports": [
  {
    "port": 8080,
    "allow-insecure-request-regexes": [
      "/api/builtin/.*"
    ]
  }
]

Where:

  • port: The HTTP service port number that TNG needs to protect with TLS encryption.

  • allow-insecure-request-regexes: An array of regular expressions. If the path of any HTTP request sent to the specified Port matches any regular expression in this array, TNG will not encrypt this request.

Encrypted model example files

The following provides encrypted models available for testing and their related configuration information. These models are stored in public-read OSS Buckets and are encrypted using the specified method.

Click here for encrypted model file information

Model name

Encryption method

Encryption password

Public-read OSS region

Object Storage Service (OSS) Endpoint

Model storage location

Qwen3-32B

Gocryptfs

0Bn4Q1wwY9fN3P

cn-beijing

oss-cn-beijing-internal.aliyuncs.com

conf-ai:/qwen3-32b-gocryptfs/

Sam

conf-ai:/qwen3-32b-sam/

Qwen2.5-3B-Instruct

Gocryptfs

0Bn4Q1wwY9fN3P

cn-beijing

oss-cn-beijing-internal.aliyuncs.com

conf-ai:/qwen2.5-3b-gocryptfs/

Sam

conf-ai:/qwen2.5-3b-sam/