Large Language Model (LLM) inference involves sensitive data and core model assets, which are at risk of exposure when running in untrusted environments. Container Service for Kubernetes (ACK) Confidential AI (ACK-CAI) provides end-to-end security for model inference by integrating hardware-based confidential computing technologies, such as Intel Trust Domain Extensions (TDX) and GPU Trusted Execution Environments (TEEs).
ACK-CAI lets you deploy vLLM inference services in an ACK heterogeneous confidential computing cluster. This provides secure isolation and encrypted protection for your models and data. Key advantages:
Hardware-level security isolation: Builds a hardware-based TEE using Intel® TDX and NVIDIA GPU TEE, ensuring the confidentiality and integrity of models and data during computation.
Trusted key distribution: Uses a remote attestation mechanism to strictly verify the integrity of the execution environment. Only after successful verification does a separate trustee service release the model decryption key to the trusted environment.
End-to-end data encryption: Establishes an encrypted channel from the client to the server through a Trusted Network Gateway (TNG), protecting inference requests and responses during transmission.
Non-intrusive for applications: Automatically injects security components into pods using a Kubernetes webhook. You can enable confidential computing capabilities for your application with a simple annotation, requiring no changes to your business code or container images.
How it works
ACK-CAI enables transparent confidential computing capabilities by dynamically injecting a set of sidecar containers called Trustiflux into your application pods. The core security mechanism is based on remote attestation, which ensures that models and data are only accessed within a verified, trusted environment.
Process and environment guide
Deploying and accessing a secure vLLM inference service involves the following stages:
Step | Purpose | Environment |
Encrypt the inference model and upload it to Object Storage Service (OSS) to ensure its confidentiality at rest. | A separate server for data preparation. | |
Deploy a standalone trustee service to act as the root of trust for verifying the environment and distributing keys. | A separate trustee server. | |
Create and configure Kubernetes nodes to run confidential computing tasks. |
| |
Install the CAI components in the cluster to enable the dynamic injection of security capabilities. | ACK console | |
Deploy the vLLM service to the cluster using Helm and enable confidential computing protection with an annotation. | A machine with kubectl and Helm configured, and connected to the API server. | |
Start the client-side security proxy to access the deployed model service over an encrypted channel. | Client environment |
Step 1: Prepare the encrypted model
This step covers how to encrypt your model data and upload it to OSS in preparation for secure distribution.
Execution environment: To ensure security, perform these steps on a temporary, isolated ECS instance. For optimal performance, this instance should be in the same region as your OSS bucket to leverage high-speed internal network uploads.
The model files are large, and this process can be time-consuming. To quickly test the solution, you can skip this step. Use the sample encrypted model file and proceed to Step 2: Deploy the trustee remote attestation service.
1. Download a model
Before deploying a model, you must first encrypt it, then upload it to cloud storage. The key for decrypting the model will be hosted by KMS, controlled by the remote attestation service. Perform model encryption operations in a local or trusted environment. This solution shows how to deploy Qwen2.5-3B-Instruct as an example.
If you already have a model, you do not need to download one. Skip to 2. Encrypt the model.
Run the following command in the terminal use ModelScope tool to download Qwen2.5-3B-Instruct(requires Python 3.9 or higher).
pip3 install modelscope importlib-metadata
modelscope download --model Qwen/Qwen2.5-3B-InstructThe command will download the model to ~/.cache/modelscope/hub/models/Qwen/Qwen2.5-3B-Instruct/.
2. Encrypt the model
This solution supports model encryption using the gocryptfs encryption mode, which is based on the open-source AES-256-GCM standard.
Install the Gocryptfs tool; currently only Gocryptfs V2.4.0 with the default parameters is supported. Choose one of the following installation methods:
Method 1: (Recommended) Install from yum source
If you use Alinux3 or AnolisOS 23 operating system, you can install gocryptfs using the yum source.
Alinux 3
sudo yum install gocryptfs -yAnolisOS 23
sudo yum install anolis-epao-release -y sudo yum install gocryptfs -yMethod 2: Download precompiled binary directly
# Download precompiled Gocryptfs package wget https://github.jobcher.com/gh/https://github.com/rfjakob/gocryptfs/releases/download/v2.4.0/gocryptfs_v2.4.0_linux-static_amd64.tar.gz # Extract and install tar xf gocryptfs_v2.4.0_linux-static_amd64.tar.gz sudo install -m 0755 ./gocryptfs /usr/local/binCreate a Gocryptfs key file as the key for model encryption. In subsequent steps, you will need to upload this key to Trustee.
In this solution,
alibaba@1688is the key for encryption, and will be stored in thecachefs-passwordfile. You can also customize the key. But in practice, it's best to use a randomly generated strong key instead.cat << EOF > ~/cachefs-password alibaba@1688 EOFUse the key to encrypt the model.
Configure the path of the plaintext model.
NoteConfigure the path of the plaintext model you just downloaded, or replace it with the path of your own model.
PLAINTEXT_MODEL_PATH=~/.cache/modelscope/hub/models/Qwen/Qwen2.5-3B-Instruct/Use Gocryptfs to encrypt the model directory tree.
After encryption, the model will be stored in encrypted form in the
./cipherdirectory.mkdir -p ~/mount cd ~/mount mkdir -p cipher plain # Install Gocryptfs runtime dependencies sudo yum install -y fuse # initialize gocryptfs cat ~/cachefs-password | gocryptfs -init cipher # mount to plain cat ~/cachefs-password | gocryptfs cipher plain # move AI model to ~/mount/plain cp -r ${PLAINTEXT_MODEL_PATH} ~/mount/plain
3. Upload the model
Prepare an OSS bucket that resides in the same region as the heterogeneous instance to be deployed. Then, upload the encrypted model to the OSS bucket. Doing so will allow you to pull and deploy the model data from the heterogeneous instance for subsequent operations.
Refer to the Get started by using the OSS console guide to create a storage space (bucket) and a directory named qwen-encrypted (for example, oss://examplebucket/qwen-encrypted/). Due to the large size of model files, we recommend using ossbrowser to upload the encrypted model to this directory.
Step 2: Deploy the trustee remote attestation service
Following Zero Trust principles, any confidential computing environment must be verified before it can access sensitive data, such as the model decryption keys.
The standalone trustee service, which you will deploy in this step, acts as the central authority for this verification process. It is responsible for:
Verifying the execution environment of the model and inference service.
Ensuring that model decryption keys are released only to a verified, trusted environment.
Enabling clients to confirm the trustworthiness of the service when initiating an inference request.
Execution environment: A dedicated, separate server deployed outside the ACK cluster, such as an ECS instance or an on-premises server.
1. Choose a deployment solution
Based on your required trust level, choose between the following solutions:
ECS instance
Deploying the trustee service on a separate ECS instance within the same VPC provides both logical isolation and high-speed, secure internal network communication with your ACK cluster.
On-premises server
For maximum security, deploy the trustee service in your data center and connect it to your virtual private cloud (VPC) via a leased line or VPN. This ensures that you have full control over the hardware and software environment of the root of trust, independent of the cloud provider.
Before you start, make sure the server has Internet access enabled, and that port 8081 is open.
2. Deploy the trustee service
Install the trustee RPM package from the official YUM repository (available on Alibaba Cloud Linux 3.x and Anolis 8.x+).
yum install trustee-1.5.2The service will start automatically and listen on port 8081. You can access it directly over the network using the URL
http://<trustee-ip>:8081/api.Replace
<trustee-ip>with the IP address of the server where trustee is deployed.For production environments, configure HTTPS access for trustee to enhance security.
Verify the health of the service.
Run
sudo yum install -y jqto install the jq tool.# Replace <trustee-ip> with the IP address of the trustee server curl http://<trustee-ip>:8081/api/services-health | jqA successful response will show the status of all components as
ok.{ "gateway": { "status": "ok", "timestamp": "2025-08-26T13:46:13+08:00" }, "kbs": { "status": "ok", "timestamp": "2025-08-26T13:46:13+08:00" }, "as": { "status": "ok", "timestamp": "2025-08-26T13:46:13+08:00" }, "rvps": { "status": "ok", "timestamp": "2025-08-26T13:46:13+08:00" } }
3. Import the model decryption key into the trustee instance
Once the Trustee service is deployed, you must provide it with the model decryption key. This key is required for the remote attestation and secure key distribution process.
Trustee manages keys by mapping local file paths to resource IDs. The following steps create and import a model decryption key in the default storage folder.
Create a directory and file to store the decryption key on the trustee server. This creates a subdirectory named
aliyunin the local folder/opt/trustee/kbs/repository/default/.Replace
<model-decryption-key>with the key you used in Step 1. In this example, the key isalibaba@1688.sudo mkdir -p /opt/trustee/kbs/repository/default/aliyun/ sudo sh -c 'echo -n "<model-decryption-key>" > /opt/trustee/kbs/repository/default/aliyun/model-decryption-key'Verify the key ID.
The key is now stored at the file path
.../aliyun/model-decryption-keywith the key IDkbs:///default/aliyun/model-decryption-keyin the trustee system.
Step 3: Configure the ACK confidential computing cluster
In this step, you will build the underlying infrastructure: an ACK cluster with ecs.gn8v-tee instances that provide both Intel TDX and NVIDIA TEE capabilities as worker nodes.
Execution environment: The ECS and ACK consoles (for creating clusters, node pools, and ECS instances), and the shell environment of the created ecs.gn8v-tee instance (for installing drivers).Create an ACK managed Pro cluster in the China (Beijing) region.
Create a node pool for the cluster to manage the confidential computing instances.
vSwitch: Select a virtual switch in Zone L of the China (Beijing) region.
Scaling Mode: Keep the default configurations. Do not enable auto scaling.
Instance Type:
ecs.gn8v-tee.4xlargeor a higher specification.Operating System: Alibaba Cloud Linux 3.2104 LTS 64-bit.
System Disk: 100 GiB or larger.
Expected Nodes: The initial number of nodes in the node pool. Keep the default configuration of 0.
Node Labels: Add a label (Key:
ack.aliyun.com/nvidia-driver-version, Value:550.144.03) to specify the NVIDIA driver version.
Create an Elastic GPU Service (EGS) confidential computing instance to serve as a cluster node. See Create an instance on the Custom Launch tab.
Region: China (Beijing).
Network and Zone: The VPC must be the same as the cluster's VPC. This example uses one in Zone L.
Instance:
ecs.gn8v-tee.4xlargeor a higher specification.The
gn8v-teeinstance types have CPU and GPU confidential computing features enabled by default. You do not need to select confidential VM.Image: Alibaba Cloud Linux 3.2104 LTS 64-bit.
Log on to the created EGS instance and install the NVIDIA driver and CUDA toolkit.
Add the EGS instance to the created node pool. Select Manual as the method for adding the instance. See Add existing ECS instances.
Step 4: Deploy the ACK-CAI component
The ACK-CAI component includes a webhook controller that automatically injects the necessary sidecar containers into pods based on their annotations. These sidecars handle remote attestation, model decryption, and secure communication.
Execution environment: ACK console.
Log on to the ACK console. In the navigation pane on the left, click Clusters.
On the Clusters page, find the cluster you want and click its name. In the left-side navigation pane, choose .
Click Deploy and install the latest version of ack-cai.
In the Parameters step, change the
tagto1.1.1in the YAML template.You can now view the deployment status in the Helm chart list.
Step 5: Deploy the vLLM inference service
Deploy the vLLM service using Helm and add the specific annotation to enable confidential computing protection.
Execution environment: A machine with kubectl and Helm configured and able to access the cluster. You can use Workbench or CloudShell.
Create a new folder for the Helm chart.
mkdir -p ack-cai-vllm-demo cd ack-cai-vllm-demoInitialize a Helm chart to deploy the vLLM service.
This Helm chart configures the vLLM inference service with node affinity to ensure it runs only on confidential computing GPU-accelerated nodes. It also uses a CSI plugin to mount an OSS bucket for model storage.
Edit the
values.yamlfile to provide your environment-specific information.Replace <trustee-ip> with the trustee address and replace the OSS configurations with your actual values.
caiOptions: | { "cipher-text-volume": "pvc-oss", "model-decryption-key-id" : "kbs:///default/aliyun/model-decryption-key", "trustee-address": "http://<trustee-ip>:8081/api" } oss: bucket: "conf-ai" # Replace with the name of the OSS bucket that stores the encrypted model. path: "/qwen2.5-3b-gocryptfs/" # Replace with the path to the encrypted model file in the OSS Bucket. url: "https://oss-cn-beijing-internal.aliyuncs.com" # Replace with the OSS endpoint. akId: "xxxxx" # Replace with your Alibaba Cloud AccessKey ID. akSecret: "xxxxx" # Replace with your Alibaba Cloud AccessKey secret.Deploy the vLLM service using Helm.
helm install vllm . -n defaultVerify that the CAI component's sidecar containers have been successfully injected into the pod.
kubectl get pod cai-vllm -n default -o jsonpath='{range .status.initContainerStatuses[*]}{.name}{"\t"}{range .state.running}Running{end}{.state.*.reason}{"\n"}{end}{range .status.containerStatuses[*]}{.name}{"\t"}{range .state.running}Running{end}{.state.*.reason}{"\n"}{end}'The expected output shows the following five containers, meaning the injection succeeded. Wait for all containers to change from
PodInitializingtoRunning. This indicates that the service has started.cai-sidecar-attestation-agent Running cai-sidecar-confidential-data-hub Running cai-sidecar-tng Running cai-sidecar-cachefs Running inference-service RunningGet and record the vLLM service endpoint.
kubectl get service cai-vllm-svc -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].port}{"\n"}'The expected output is a URL in the format
<vllm-ip>:<port>.http://182.XX.XX.225:8080
Step 6: Securely access the inference service
To ensure end-to-end security, you must use the TNG client gateway to proxy your requests. The proxy automatically encrypts all requests sent to the vLLM service and decrypts the responses.
Execution environment: The client machine from which you want to call the inference service.
Start the TNG gateway on the client to establish a secure communication channel.
The TNG gateway creates a local proxy on the client to encrypt requests sent to the server.
Replace <IP> with the trustee address.
docker run -d \ --network=host \ confidential-ai-registry.cn-shanghai.cr.aliyuncs.com/product/tng:2.2.4 \ tng launch --config-content ' { "add_ingress": [ { "http_proxy": { "proxy_listen": { "host": "0.0.0.0", "port": 41000 } }, "encap_in_http": {}, "verify": { "as_addr": "http://<trustee-ip>:8081/api/attestation-service/", "policy_ids": [ "default" ] } } ] } 'Access the vLLM service through the TNG proxy.
Replace
<vllm-ip>:<port>with the endpoint of the vLLM service you obtained earlier.# Set the http_proxy environment variable export http_proxy=http://127.0.0.1:41000 # Send a curl request curl http://<vllm-ip>:<port>/v1/completions \ -H "Content-type: application/json" \ -d '{ "model": "qwen2.5-3b-instruct", "prompt": "San Francisco is a", "max_tokens": 7, "temperature": 0 }'
Reference
Configuring caiOptions
The caiOptions annotation accepts a configuration object in JSON format. The ACK CAI admission webhook parses these parameters and uses them to dynamically inject and configure the security components, such as AA and CDH, into the pod. This enables features such as transparent encryption and decryption, remote attestation, and trusted networking.
The following is a complete example of a caiOptions configuration.
{
"cipher-text-volume": "pvc-oss",
"model-decryption-key-id": "kbs:///default/aliyun/model-decryption-key",
"trustee-address": "http://<trustee-ip>:8081/api",
"aa-version": "1.3.1",
"cdh-version": "1.3.1",
"tng-version": "2.2.4",
"cachefs-version": "1.0.7-2.6.1",
"tdx-ra-enable": true,
"gpu-ra-enable": true,
"tng-http-secure-ports": [
{
"port": 8080
}
]
}The following table details the configurations:
Parameter | Required | Description |
| Yes | The name of the persistent volume claim (PVC) that stores the encrypted model. ACK-CAI automatically decrypts the data mounted from this PVC in the trusted environment. |
| Yes | The Key Broker Service (KBS) URI of the model decryption key, in the format |
| Yes | The address of the trustee service, used for remote attestation and key retrieval. |
| No | The version of the AA component. |
| No | The version of the CDH component. |
| No | The version of the TNG component. |
| No | The version of the Cachefs component. |
| No | Specifies whether to enable remote attestation support for the CPU (TDX confidential instance). Default: |
| No | Specifies whether to enable remote attestation support for the GPU. Default: |
| No | Configures TNG to use TLS to encrypt traffic for specific HTTP ports. It accepts an array of objects, where each object represents a port encryption rule.
|
Sample encrypted model files
For testing purposes, you can use the following publicly available encrypted models. They are stored in an OSS bucket and have been encrypted using the specified method.