Securely Deploy vLLM Inference Services in ACK Heterogeneous Confidential Computing Clusters - Container Service for Kubernetes

Large Language Model (LLM) inference involves sensitive data and core model assets. Running LLMs in untrusted environments risks data and model leakage. ACK Confidential AI (ACK-CAI), a confidential AI solution provided by ACK, integrates hardware confidential computing technologies such as Intel TDX and GPU trusted execution environment (TEE) to provide end-to-end security for model inference.

You can use ACK-CAI to deploy vLLM model inference services in ACK heterogeneous confidential computing clusters. This ensures secure isolation and encryption protection for models and data. The advantages are as follows:

Hardware-level security isolation: Build a hardware-level trusted execution environment (TEE) using Intel® TDX and NVIDIA GPU TEE technology. This ensures the confidentiality and integrity of models and data during computation.
Trusted key distribution: Strictly verify the runtime environment using a remote attestation mechanism. After successful verification, the dedicated Trustee service distributes model decryption keys to the trusted environment.
End-to-end data encryption: Establish a client-to-server encrypted channel using a Trusted Network Gateway (TNG). This protects the security of inference requests and response data during transmission.
Non-intrusive to applications: Automatically inject security components into Pods based on Kubernetes Webhook. Simply enable security capabilities for applications using Annotation, without modifying business code or images.

How it works

ACK-CAI provides transparent confidential computing capabilities for application Pods by dynamically injecting a set of Sidecar containers named Trustiflux. Its core security mechanism relies on remote attestation, ensuring that models and data are accessed only in a trusted environment.

Expand to view core component descriptions

ACK heterogeneous confidential computing cluster: A Kubernetes cluster built on TDX confidential instances and GPU confidential computing capabilities.
Trustee remote attestation service: Provides runtime environment trustworthiness verification and distributes model decryption keys after successful verification.
Runtime Trustiflux: A confidential computing runtime component provided as a Sidecar, including the following core modules:
- Attestation Agent (AA): Performs remote attestation and key retrieval.
- Confidential Data Hub (CDH): Handles ciphertext data decryption.
- Trusted Network Gateway Server (TNG Server): Establishes a secure communication channel.
- Cachefs: Provides model decryption support.
Inference service: A container that carries out actual Large Language Model inference tasks.
Inference program: The client-side program for accessing model inference services.
Trusted Network Gateway Client (TNG Client): Establishes a secure communication channel with the cluster, ensuring communication security.

Expand to view core security mechanisms

The solution primarily includes two security mechanisms:

Remote attestation-based encrypted model distribution:
1. When the Pod starts, the Attestation Agent (AA) in the Sidecar sends a request to the Trustee remote attestation service.
2. The Trustee service performs trustworthiness verification for the CPU (TDX) and GPU confidential environments.
3. After successful verification, the Trustee service securely distributes the model decryption key to the Pod.
4. The Confidential Data Hub (CDH) and Cachefs in the Sidecar use this key to decrypt the encrypted model files and mount them into the inference service container.
Remote attestation-based end-to-end encrypted inference:
1. The end user's inference program sends requests through the local Trusted Network Gateway Client (TNG Client).
2. Requests remain encrypted throughout transmission, preventing man-in-the-middle attacks.
3. After reaching the server-side, the Trusted Network Gateway (TNG) module in the Sidecar decrypts the requests, which are then processed by the inference service.
4. The TNG encrypts the inference results and securely returns them to the client.

Process and environment guide

Deploying and accessing a secure vLLM inference service involves the following stages:

Step	Purpose	Environment
Step 1: Prepare the encrypted model	Encrypt the inference model and upload it to Object Storage Service (OSS) to ensure secure static storage.	A dedicated data preparation server
Step 2: Deploy the Trustee remote attestation service	Deploy a dedicated Trustee verification service as the root of trust to verify the environment and distribute keys.	A dedicated Trustee server
Step 3: Configure the ACK confidential computing cluster	Create and configure Kubernetes nodes for confidential computing tasks.	Alibaba Cloud Management Console (ACK, ECS) Shell environment of an ecs.gn8v-tee instance
Step 4: Deploy ACK-CAI components	Install CAI components in the cluster to dynamically inject security capabilities into applications.	ACK console
Step 5: Deploy the vLLM model inference service	Deploy the vLLM service to the cluster using Helm, and enable confidential computing protection via Annotation.	A machine with kubectl and Helm configured, connected to the API Server
Step 6: Securely access the inference service	Start a client security agent to access the deployed model service through an encrypted channel.	Client environment

Step 1: Prepare encrypted models

This section uses an encryption tool to process model data and uploads it to Object Storage Service (OSS), preparing for subsequent encrypted distribution.

Execution environment: To achieve secure isolation, prepare a temporary ECS instance to download, encrypt, and upload models. We recommend that the ECS instance is in the same region as the OSS Bucket for high-speed upload of encrypted model data over the private network.

Model files are large, and the process takes a long time. To quickly experience the solution, skip this section. You can use the encrypted model example files for a trial, and proceed directly to Step 2: Deploy the Trustee remote attestation service.

1. Download a model

Before deploying a model to the cloud, encrypt it and upload it to cloud storage. The decryption key is managed by KMS and controlled by the remote attestation service. Perform model encryption in a local or trusted environment. This example uses the Qwen2.5-3B-Instruct LLM.

Note

If you already have a model, you can skip this section and proceed to 2. Encrypt the model.

The Qwen2.5-3B-Instruct model requires Python 3.9 or later. To download the model using the ModelScope tool, run the following command in the terminal.

pip3 install modelscope importlib-metadata
modelscope download --model Qwen/Qwen2.5-3B-Instruct

After success, the model downloads to ~/.cache/modelscope/hub/models/Qwen/Qwen2.5-3B-Instruct/.

2. Encrypt models

Currently, you can encrypt models using Gocryptfs encryption mode (based on the AES256-GCM open standard).

Install the Gocryptfs tool to encrypt models. Currently, only Gocryptfs v2.4.0 that uses default encryption parameters is supported. You can choose one of the following installation methods:

Method 1: (Recommended) Install from a yum source

If you use the Alinux 3 or AnolisOS 23 operating system, you can use a yum source to install Gocryptfs.

Alinux 3

sudo yum install gocryptfs -y

AnolisOS 23

sudo yum install anolis-epao-release -y
sudo yum install gocryptfs -y

Method 2: Directly download the precompiled binary file

# Download the precompiled Gocryptfs package.
wget https://github.jobcher.com/gh/https://github.com/rfjakob/gocryptfs/releases/download/v2.4.0/gocryptfs_v2.4.0_linux-static_amd64.tar.gz

# Decompress and install the package.
tar xf gocryptfs_v2.4.0_linux-static_amd64.tar.gz
sudo install -m 0755 ./gocryptfs /usr/local/bin

Create a Gocryptfs key file to use as the model encryption key. You must upload this key to the Trustee remote attestation service for management in a subsequent step.
In this topic, 0Bn4Q1wwY9fN3P is used as the key to encrypt the model. The key content is stored in the cachefs-password file. You can also customize the key. In practice, we recommend that you use a randomly generated strong key.
```
cat << EOF > ~/cachefs-password
0Bn4Q1wwY9fN3P
EOF
```

Use the created key to encrypt the model.

Configure the path of the plaintext model.
Note
Specify the path where the plaintext model you just downloaded is located. If you have other models, replace the path with the actual path of your target model.
```
PLAINTEXT_MODEL_PATH=~/.cache/modelscope/hub/models/Qwen/Qwen2.5-3B-Instruct/
```

Use Gocryptfs to encrypt the model directory tree.

After the encryption is complete, the model is stored as ciphertext in the ./cipher directory.

mkdir -p ~/mount
cd ~/mount
mkdir -p cipher plain

# Install Gocryptfs runtime dependencies.
sudo yum install -y fuse

# Initialize Gocryptfs.
cat ~/cachefs-password | gocryptfs -init cipher

# Mount to plain.
cat ~/cachefs-password | gocryptfs cipher plain

# Move the AI model to ~/mount/plain.
cp -r ${PLAINTEXT_MODEL_PATH}/. ~/mount/plain

3. Upload the model

Prepare an OSS Bucket in the same region where you will deploy the heterogeneous instance. Upload the encrypted model to Alibaba Cloud OSS. This lets you pull and deploy the model from the heterogeneous instance later.

Take OSS as an example. You can create a bucket and a directory named qwen-encrypted, such as oss://examplebucket/qwen-encrypted/. For more information, see Quick Start for the console. Because the model file is large, we recommend using ossbrowser to upload the encrypted model to this directory.

Step 2: Deploy the Trustee remote attestation service

Following the zero trust principle, any confidential computing environment must pass verification before gaining permissions to access sensitive data, such as model decryption keys. This step deploys a dedicated Trustee service to verify the runtime environment of models and inference services. This ensures that model decryption keys are injected only when the environment is confirmed trustworthy, and verifies the environment's trustworthiness when the client-side initiates an inference request.

Execution environment: A dedicated, standalone server deployed outside the ACK cluster, such as an ECS instance or an on-premises private server.

1. Select deployment solutions

Based on the principles of security isolation and trust independence, Trustee must be deployed on a standalone server outside the ACK heterogeneous confidential computing cluster. There are two recommended solutions, depending on different trust level requirements:

Trust level: The higher the degree of software and hardware control a cloud service provider has over the Trustee deployment environment, the lower the trust level. This is because the Trustee service acts as the root of trust for remote attestation and distributing confidential computing/trusted computing resources in the cloud. Under a strict trust model, the Trustee owner must have full control over all software and hardware in the deployment environment, ensuring it runs in a customer-controlled trusted environment.

ECS instances

Create an additional ECS instance within the same VPC as the ACK cluster to specifically run the Trustee service. This allows for efficient and secure communication over the Alibaba Cloud private network, and ensures complete logical and physical isolation between the Trustee service and the confidential computing environment.
On-premises private servers

For scenarios with extremely high security requirements, deploy Trustee in your own data center or on-premises server. Connect it to the cloud VPC network via a leased line or VPN. This ensures that you have full control over the software and hardware environment of the root of trust, unaffected by cloud vendors.

Before use, ensure the server has public network access and that port 8081 is open.

2. Deploy the Trustee service

Trustee is packaged in RPM format and included in the official YUM repositories of Alibaba Cloud Linux 3.x and Anolis (8.x and later). Install it using a system package management tool. After installation, systemd automatically manages and starts the service.

On the prepared server, execute the following command to install and start Trustee using the YUM repository:
```
yum install trustee-1.5.2
```
Trustee automatically starts and listens on port 8081 by default. You can directly access it over the network as a URL, using the deployment environment IP plus the service port number, for example, http://<trustee-ip>:8081/api.

Here, <trustee-ip> is the IP address of the server where Trustee is deployed.

If you use Trustee in a production environment, we recommend configuring HTTPS access for Trustee to enhance security.

Run the following command to check the health status of service components:

You can execute sudo yum install -y jq to install the jq tool.

# Replace <trustee-ip> with the Trustee server IP
curl http://<trustee-ip>:8081/api/services-health | jq

In the expected output, if all service statuses are ok, the service is normal.

{
  "gateway": {
    "status": "ok",
    "timestamp": "2025-08-26T13:46:13+08:00"
  },
  "kbs": {
    "status": "ok",
    "timestamp": "2025-08-26T13:46:13+08:00"
  },
  "as": {
    "status": "ok",
    "timestamp": "2025-08-26T13:46:13+08:00"
  },
  "rvps": {
    "status": "ok",
    "timestamp": "2025-08-26T13:46:13+08:00"
  }
}

Expand to view common Trustee service management commands

The Trustee service is managed by systemd. Use the systemctl command for lifecycle management. Common operations include the following:

Start service: systemctl start trustee
Stop service: systemctl stop trustee
Restart service: systemctl restart trustee
Check status: systemctl status trustee

3. Import model decryption keys to the Trustee instance

After the Trustee service is deployed, you must provide it with the model decryption key. This key is the basis for subsequent remote attestation and secure key distribution to inference services.

Trustee manages keys by mapping local file paths to resource IDs. The following operations create and import a model decryption key into the default key storage directory.

Execute the following commands to create a key directory (create a subdirectory named aliyun in the local directory /opt/trustee/kbs/repository/default/) and write the key content:

Replace <model decryption key> with the actual key string. This example uses 0Bn4Q1wwY9fN3P.
```
sudo mkdir -p /opt/trustee/kbs/repository/default/aliyun/
sudo sh -c 'echo -n "<model decryption key>" > /opt/trustee/kbs/repository/default/aliyun/model-decryption-key'
```
Verify the key ID.

After completing the above operations, the key stored at the file path .../aliyun/model-decryption-key will have a corresponding key ID in the Trustee system: kbs:///default/aliyun/model-decryption-key.

Step 3: Configure the ACK confidential computing cluster

This section builds an underlying infrastructure with hardware-level security isolation capabilities for running confidential computing tasks. It involves creating an ACK cluster and adding ecs.gn8v-tee instances with Intel TDX and NVIDIA TEE capabilities as worker nodes.

Execution environment: ECS, ACK console (for creating clusters, node pools, and ECS instances), and the Shell environment of the created ecs.gn8v-tee instance (for installing drivers).

Create an ACK managed cluster Pro edition in the China (Beijing) region. For more information, see Create an ACK managed cluster.
Create a node pool for the cluster to manage confidential computing instances. For more information, see Create and manage node pools.
- vSwitch: Select the virtual switch in China (Beijing) Zone L.
- Scaling Mode: Keep default configurations. Do not enable automatic elastic scaling.
- Instance type: ecs.gn8v-tee.4xlarge and above.
- Operating System: Alibaba Cloud Linux 3.2104 LTS 64-bit.
- System Disk: 100 GiB or more.
- Expected Number of Nodes: The initial number of nodes in the node pool. Keep default configurations, which is 0.
- Node Labels: Add labels (Key: ack.aliyun.com/nvidia-driver-version, Value: 550.144.03) to specify the NVIDIA driver version.
Create EGS confidential computing instances as cluster nodes. For more information, see Custom purchase instances.
- Region: China (Beijing).
- Network and zone: VPC consistent with the cluster VPC, Zone L.
- Instance type: ecs.gn8v-tee.4xlarge and above.
  
  gn8v-tee instance types have CPU and GPU confidential computing features enabled by default. No need to additionally select confidential virtual machines.
- Image: Alibaba Cloud Linux 3.2104 LTS 64-bit.
Log on to the created EGS instance and install NVIDIA drivers and CUDA toolkit. For more information, see Step 1: Install NVIDIA drivers and CUDA toolkit.
Add the EGS instance to the previously created node pool. Select manual addition as the method. For more information, see Add existing nodes.

Step 4: Deploy ACK-CAI components

Based on ACK-CAI components, enable non-intrusive confidential computing capabilities for applications in the cluster. This component includes a Webhook controller that automatically injects Sidecar containers into Pods based on their Annotation. These Sidecar containers provide remote attestation, model decryption, and secure communication.

Execution environment: ACK console.

Log on to the Container Service Management Console . In the navigation pane on the left, click Clusters.
On the Clusters page, click the name of your cluster. In the navigation pane on the left, click Applications > Helm.
Click Create and follow the on-screen prompts to install the latest version of ACK-CAI.

After installation is complete, view the deployment status in the Helm Chart list.

Step 5: Deploy the vLLM model inference service

After the basic environment and security components are ready, this section uses Helm to deploy the vLLM service. Add specific Annotations to declare that the application requires security enhancements by ACK-CAI.

Execution environment: A machine with kubectl and Helm configured and access to the cluster. Use it directly in Workbench or CloudShell.

Create an empty Helm Chart directory.

mkdir -p ack-cai-vllm-demo
cd ack-cai-vllm-demo

Initialize a Helm Chart for deploying the vLLM service.

This Helm Chart forcibly schedules the vLLM inference service to confidential computing GPU nodes and uses a CSI plugin to store models in OSS.

Expand to view Helm Chart initialization script

# Create template file
mkdir -p ./templates
cat <<EOF >templates/vllm.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-oss
  namespace: {{ .Release.Namespace }}
  labels:
    alicloud-pvname: pv-oss
spec:
  capacity:
    storage: 5Gi
  accessModes:
    - ReadOnlyMany
  persistentVolumeReclaimPolicy: Retain
  csi:
    driver: ossplugin.csi.alibabacloud.com
    volumeHandle: pv-oss
    volumeAttributes:
      bucket: {{ .Values.oss.bucket }}
      path: {{ .Values.oss.path }}
      url: {{ .Values.oss.url }}
      otherOpts: "-o umask=022 -o max_stat_cache_size=0 -o allow_other"
    nodePublishSecretRef:
      name: oss-secret
      namespace: {{ .Release.Namespace }}

---

apiVersion: v1
kind: Secret
metadata:
  name: oss-secret
  namespace: {{ .Release.Namespace }}
stringData:
  akId: {{ .Values.oss.akId }}
  akSecret: {{ .Values.oss.akSecret }}

---

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-oss
  namespace: {{ .Release.Namespace }}
spec:
  accessModes:
    - ReadOnlyMany
  resources:
    requests:
      storage: 5Gi
  selector:
    matchLabels:
      alicloud-pvname: pv-oss

---

apiVersion: v1
kind: Service
metadata:
  name: cai-vllm-svc
  namespace: {{ .Release.Namespace }}
  {{- if .Values.loadbalancer}}
  {{- if .Values.loadbalancer.aclId }}
  annotations:
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-acl-status: "on"
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-acl-id: {{ .Values.loadbalancer.aclId }}
    service.beta.kubernetes.io/alibaba-cloud-loadbalancer-acl-type: "white"
  {{- end }}
  {{- end }}
  labels:
    app: cai-vllm
spec:
  ports:
  - port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: cai-vllm
  type: LoadBalancer

---

apiVersion: v1
kind: Pod
metadata:
  name: cai-vllm
  namespace: {{ .Release.Namespace }}
  labels:
    app: cai-vllm
    trustiflux.alibaba.com/confidential-computing-mode: "ACK-CAI"
  annotations:
    trustiflux.alibaba.com/ack-cai-options: |
{{ .Values.caiOptions | indent 6 }}
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: node.kubernetes.io/instance-type
            operator: In
            values:
              - ecs.gn8v-tee.4xlarge
              - ecs.gn8v-tee.6xlarge
              - ecs.gn8v-tee-8x.16xlarge
              - ecs.gn8v-tee-8x.48xlarge
  containers:
    - name: inference-service
      image: egslingjun-registry.cn-wulanchabu.cr.aliyuncs.com/egslingjun/llm-inference:vllm0.5.4-deepgpu-llm24.7-pytorch2.4.0-cuda12.4-ubuntu22.04
      command:
        - bash
      args: ["-c", "vllm serve /tmp/model --port 8080 --host 0.0.0.0 --served-model-name qwen2.5-3b-instruct --device cuda --dtype auto"]
      ports:
        - containerPort: 8080
      resources:
        limits:
          nvidia.com/gpu: 1  # Request 1 GPU card for this container
      volumeMounts:
        - name: pvc-oss
          mountPath: "/tmp/model"

  volumes:
    - name: pvc-oss
      persistentVolumeClaim:
        claimName: pvc-oss

EOF

# Create Helm Chart description file
cat <<EOF > ./Chart.yaml
apiVersion: v2
name: vllm
description: A test based on vllm for ack-cai
type: application
version: 0.1.0
appVersion: "0.1.0"
EOF

# Create empty Helm Chart variables file values.yaml
touch values.yaml

Edit the values.yaml file and fill in the environment context.

Replace <trustee-ip> with the Trustee address, and replace the actual OSS parameter information.

caiOptions: |
  {
      "cipher-text-volume": "pvc-oss",
      "model-decryption-key-id" : "kbs:///default/aliyun/model-decryption-key",
      "trustee-address": "http://<trustee-ip>:8081/api"
  }
oss:
  bucket: "conf-ai"                          # Replace with the OSS Bucket name where encrypted models are stored
  path: "/qwen2.5-3b-gocryptfs/"             # Replace with the path of the encrypted model file within the OSS Bucket
  url: "https://oss-cn-beijing-internal.aliyuncs.com"   # Replace with the OSS Endpoint
  akId: "xxxxx"                              # Replace with the Alibaba Cloud AK ID
  akSecret: "xxxxx"                          # Replace with the Alibaba Cloud AK Secret

Deploy the vLLM service.
```
helm install vllm . -n default
```

Check if the CAI component's Sidecar containers are successfully injected into the Pod.

kubectl get pod cai-vllm -n default -o jsonpath='{range .status.initContainerStatuses[*]}{.name}{"\t"}{range .state.running}Running{end}{.state.*.reason}{"\n"}{end}{range .status.containerStatuses[*]}{.name}{"\t"}{range .state.running}Running{end}{.state.*.reason}{"\n"}{end}'

In the expected output, the following 5 containers are displayed, indicating successful injection. Wait until all containers change from PodInitializing to Running, which indicates that the service has started.

cai-sidecar-attestation-agent   Running
cai-sidecar-confidential-data-hub       Running
cai-sidecar-tng Running
cai-sidecar-cachefs     Running
inference-service       Running

Get and record the access address of the vLLM service.

kubectl get service cai-vllm-svc -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].port}{"\n"}'

The expected output will return a URL in a similar format (<vllm-ip>:<port>):

http://182.XX.XX.225:8080

Step 6: Securely access the inference service

In addition to server-side security protection, establish an end-to-end encrypted communication link from client to server. This ensures the security of inference data during transmission. This section starts a TNG security gateway on the client, creating a local proxy that automatically encrypts all requests sent to the vLLM service and decrypts received responses.

Execution environment: Client environment, which is any machine that needs to invoke the vLLM inference service.

Start the TNG gateway on the client to establish a secure communication channel.

The TNG gateway creates a local proxy on the client to encrypt requests sent to the server.

Replace <IP> with the Trustee address.

docker run -d \
    --network=host \
    confidential-ai-registry.cn-shanghai.cr.aliyuncs.com/product/tng:2.2.4 \
    tng launch --config-content '
      {
        "add_ingress": [
          {
            "http_proxy": {
              "proxy_listen": {
                "host": "0.0.0.0",
                "port": 41000
              }
            },
            "encap_in_http": {},
            "verify": {
 "as_addr": "http://<trustee-ip>:8081/api/attestation-service/",
              "policy_ids": [
                "default"
              ]
            }
          }
        ]
      }
'

Access the vLLM service through the TNG proxy.

Replace <vllm-ip>:<port> with the previously obtained access address of the vLLM service.

# Set the http_proxy environment variable
export http_proxy=http://127.0.0.1:41000

# Send a curl request
curl http://<vllm-ip>:<port>/v1/completions \
  -H "Content-type: application/json" \
  -d '{
    "model": "qwen2.5-3b-instruct",
    "prompt": "San Francisco is a",
    "max_tokens": 7,
    "temperature": 0
    }'

Reference information

`caiOptions` configuration description

caiOptions accepts a JSON-formatted configuration object. ACK CAI's Admission Webhook parses these parameters and dynamically injects and configures necessary security components (such as AA, CDH, etc.) into the Pod. This achieves functions such as transparent encryption/decryption, remote attestation, and trusted network.

The following is a complete caiOptions configuration example.

{
  "cipher-text-volume": "pvc-oss",
  "model-decryption-key-id": "kbs:///default/aliyun/model-decryption-key",
  "trustee-address": "http://<trustee-ip>:8081/api",
  "aa-version": "1.3.1",
  "cdh-version": "1.3.1",
  "tng-version": "2.2.4",
  "cachefs-version": "1.0.7-2.6.1",
  "tdx-ra-enable": true,
  "gpu-ra-enable": true,
  "tng-http-secure-ports": [
    {
      "port": 8080
    }
  ]
}

Configuration item details:

Configuration item	Optional	Description
`cipher-text-volume`	Required	The PVC name that stores encrypted model data. ACK-CAI automatically decrypts the data mounted by this PVC in a trusted environment.
`model-decryption-key-id`	Required	The KBS URI of the model decryption key. The format is `kbs:///<repository>/<group>/<key>`.
`trustee-address`	Required	The address of the Trustee service, used for remote attestation and key retrieval.
`aa-version`	Optional	The version of the Attestation Agent (AA) component.
`cdh-version`	Optional	The version of the Confidential Data Hub (CDH) component.
`tng-version`	Optional	The version of the Trusted Network Gateway (TNG) component.
`cachefs-version`	Optional	The version of the Cachefs component.
`tdx-ra-enable`	Optional	Whether to enable remote attestation support for CPU (TDX confidential instances). The default is `true`.
`gpu-ra-enable`	Optional	Whether to enable remote attestation support for GPU. The default is `true`.
`tng-http-secure-ports`	Optional	Configure TNG to perform TLS encryption for traffic on specific HTTP ports. This accepts an array of objects, where each object represents a port encryption rule. `"tng-http-secure-ports": [ { "port": 8080, "allow-insecure-request-regexes": [ "/api/builtin/.*" ] } ]` Where: `port`: The HTTP service port number that TNG needs to protect with TLS encryption. `allow-insecure-request-regexes`: An array of regular expressions. If the path of any HTTP request sent to the specified Port matches any regular expression in this array, TNG will not encrypt this request.

Encrypted model example files

The following provides encrypted models available for testing and their related configuration information. These models are stored in public-read OSS Buckets and are encrypted using the specified method.

Click here for encrypted model file information

Model name	Encryption method	Encryption password	Public-read OSS region	Object Storage Service (OSS) Endpoint	Model storage location
Qwen3-32B	Gocryptfs	0Bn4Q1wwY9fN3P	cn-beijing	oss-cn-beijing-internal.aliyuncs.com	conf-ai:/qwen3-32b-gocryptfs/
	Sam				conf-ai:/qwen3-32b-sam/
Qwen2.5-3B-Instruct	Gocryptfs	0Bn4Q1wwY9fN3P	cn-beijing	oss-cn-beijing-internal.aliyuncs.com	conf-ai:/qwen2.5-3b-gocryptfs/
	Sam				conf-ai:/qwen2.5-3b-sam/