Large Language Model (LLM) inference involves sensitive data and core model assets. Running LLMs in untrusted environments risks data and model leakage. ACK Confidential AI (ACK-CAI), a confidential AI solution provided by ACK, integrates hardware confidential computing technologies such as Intel TDX and GPU trusted execution environment (TEE) to provide end-to-end security for model inference.
You can use ACK-CAI to deploy vLLM model inference services in ACK heterogeneous confidential computing clusters. This ensures secure isolation and encryption protection for models and data. The advantages are as follows:
-
Hardware-level security isolation: Build a hardware-level trusted execution environment (TEE) using Intel® TDX and NVIDIA GPU TEE technology. This ensures the confidentiality and integrity of models and data during computation.
-
Trusted key distribution: Strictly verify the runtime environment using a remote attestation mechanism. After successful verification, the dedicated Trustee service distributes model decryption keys to the trusted environment.
-
End-to-end data encryption: Establish a client-to-server encrypted channel using a Trusted Network Gateway (TNG). This protects the security of inference requests and response data during transmission.
-
Non-intrusive to applications: Automatically inject security components into Pods based on Kubernetes Webhook. Simply enable security capabilities for applications using Annotation, without modifying business code or images.
How it works
ACK-CAI provides transparent confidential computing capabilities for application Pods by dynamically injecting a set of Sidecar containers named Trustiflux. Its core security mechanism relies on remote attestation, ensuring that models and data are accessed only in a trusted environment.
Process and environment guide
Deploying and accessing a secure vLLM inference service involves the following stages:
|
Step |
Purpose |
Environment |
|
Encrypt the inference model and upload it to Object Storage Service (OSS) to ensure secure static storage. |
A dedicated data preparation server |
|
|
Deploy a dedicated Trustee verification service as the root of trust to verify the environment and distribute keys. |
A dedicated Trustee server |
|
|
Create and configure Kubernetes nodes for confidential computing tasks. |
|
|
|
Install CAI components in the cluster to dynamically inject security capabilities into applications. |
ACK console |
|
|
Deploy the vLLM service to the cluster using Helm, and enable confidential computing protection via Annotation. |
A machine with kubectl and Helm configured, connected to the API Server |
|
|
Start a client security agent to access the deployed model service through an encrypted channel. |
Client environment |
Step 1: Prepare encrypted models
This section uses an encryption tool to process model data and uploads it to Object Storage Service (OSS), preparing for subsequent encrypted distribution.
Execution environment: To achieve secure isolation, prepare a temporary ECS instance to download, encrypt, and upload models. We recommend that the ECS instance is in the same region as the OSS Bucket for high-speed upload of encrypted model data over the private network.
Model files are large, and the process takes a long time. To quickly experience the solution, skip this section. You can use the encrypted model example files for a trial, and proceed directly to Step 2: Deploy the Trustee remote attestation service.
1. Download a model
Before deploying a model to the cloud, encrypt it and upload it to cloud storage. The decryption key is managed by KMS and controlled by the remote attestation service. Perform model encryption in a local or trusted environment. This example uses the Qwen2.5-3B-Instruct LLM.
If you already have a model, you can skip this section and proceed to 2. Encrypt the model.
The Qwen2.5-3B-Instruct model requires Python 3.9 or later. To download the model using the ModelScope tool, run the following command in the terminal.
pip3 install modelscope importlib-metadata
modelscope download --model Qwen/Qwen2.5-3B-InstructAfter success, the model downloads to ~/.cache/modelscope/hub/models/Qwen/Qwen2.5-3B-Instruct/.
2. Encrypt models
Currently, you can encrypt models using Gocryptfs encryption mode (based on the AES256-GCM open standard).
Install the Gocryptfs tool to encrypt models. Currently, only Gocryptfs v2.4.0 that uses default encryption parameters is supported. You can choose one of the following installation methods:
Method 1: (Recommended) Install from a yum source
If you use the Alinux 3 or AnolisOS 23 operating system, you can use a yum source to install Gocryptfs.
Alinux 3
sudo yum install gocryptfs -yAnolisOS 23
sudo yum install anolis-epao-release -y sudo yum install gocryptfs -yMethod 2: Directly download the precompiled binary file
# Download the precompiled Gocryptfs package. wget https://github.jobcher.com/gh/https://github.com/rfjakob/gocryptfs/releases/download/v2.4.0/gocryptfs_v2.4.0_linux-static_amd64.tar.gz # Decompress and install the package. tar xf gocryptfs_v2.4.0_linux-static_amd64.tar.gz sudo install -m 0755 ./gocryptfs /usr/local/binCreate a Gocryptfs key file to use as the model encryption key. You must upload this key to the Trustee remote attestation service for management in a subsequent step.
In this topic,
0Bn4Q1wwY9fN3Pis used as the key to encrypt the model. The key content is stored in thecachefs-passwordfile. You can also customize the key. In practice, we recommend that you use a randomly generated strong key.cat << EOF > ~/cachefs-password 0Bn4Q1wwY9fN3P EOFUse the created key to encrypt the model.
Configure the path of the plaintext model.
NoteSpecify the path where the plaintext model you just downloaded is located. If you have other models, replace the path with the actual path of your target model.
PLAINTEXT_MODEL_PATH=~/.cache/modelscope/hub/models/Qwen/Qwen2.5-3B-Instruct/Use Gocryptfs to encrypt the model directory tree.
After the encryption is complete, the model is stored as ciphertext in the
./cipherdirectory.mkdir -p ~/mount cd ~/mount mkdir -p cipher plain # Install Gocryptfs runtime dependencies. sudo yum install -y fuse # Initialize Gocryptfs. cat ~/cachefs-password | gocryptfs -init cipher # Mount to plain. cat ~/cachefs-password | gocryptfs cipher plain # Move the AI model to ~/mount/plain. cp -r ${PLAINTEXT_MODEL_PATH}/. ~/mount/plain
3. Upload the model
Prepare an OSS Bucket in the same region where you will deploy the heterogeneous instance. Upload the encrypted model to Alibaba Cloud OSS. This lets you pull and deploy the model from the heterogeneous instance later.
Take OSS as an example. You can create a bucket and a directory named qwen-encrypted, such as oss://examplebucket/qwen-encrypted/. For more information, see Quick Start for the console. Because the model file is large, we recommend using ossbrowser to upload the encrypted model to this directory.
Step 2: Deploy the Trustee remote attestation service
Following the zero trust principle, any confidential computing environment must pass verification before gaining permissions to access sensitive data, such as model decryption keys. This step deploys a dedicated Trustee service to verify the runtime environment of models and inference services. This ensures that model decryption keys are injected only when the environment is confirmed trustworthy, and verifies the environment's trustworthiness when the client-side initiates an inference request.
Execution environment: A dedicated, standalone server deployed outside the ACK cluster, such as an ECS instance or an on-premises private server.
1. Select deployment solutions
Based on the principles of security isolation and trust independence, Trustee must be deployed on a standalone server outside the ACK heterogeneous confidential computing cluster. There are two recommended solutions, depending on different trust level requirements:
Trust level: The higher the degree of software and hardware control a cloud service provider has over the Trustee deployment environment, the lower the trust level. This is because the Trustee service acts as the root of trust for remote attestation and distributing confidential computing/trusted computing resources in the cloud. Under a strict trust model, the Trustee owner must have full control over all software and hardware in the deployment environment, ensuring it runs in a customer-controlled trusted environment.
-
ECS instances
Create an additional ECS instance within the same VPC as the ACK cluster to specifically run the Trustee service. This allows for efficient and secure communication over the Alibaba Cloud private network, and ensures complete logical and physical isolation between the Trustee service and the confidential computing environment.
-
On-premises private servers
For scenarios with extremely high security requirements, deploy Trustee in your own data center or on-premises server. Connect it to the cloud VPC network via a leased line or VPN. This ensures that you have full control over the software and hardware environment of the root of trust, unaffected by cloud vendors.
Before use, ensure the server has public network access and that port 8081 is open.
2. Deploy the Trustee service
Trustee is packaged in RPM format and included in the official YUM repositories of Alibaba Cloud Linux 3.x and Anolis (8.x and later). Install it using a system package management tool. After installation, systemd automatically manages and starts the service.
-
On the prepared server, execute the following command to install and start Trustee using the YUM repository:
yum install trustee-1.5.2Trustee automatically starts and listens on port 8081 by default. You can directly access it over the network as a URL, using the deployment environment IP plus the service port number, for example,
http://<trustee-ip>:8081/api.Here,
<trustee-ip>is the IP address of the server where Trustee is deployed.If you use Trustee in a production environment, we recommend configuring HTTPS access for Trustee to enhance security.
-
Run the following command to check the health status of service components:
You can execute
sudo yum install -y jqto install the jq tool.# Replace <trustee-ip> with the Trustee server IP curl http://<trustee-ip>:8081/api/services-health | jqIn the expected output, if all service statuses are
ok, the service is normal.{ "gateway": { "status": "ok", "timestamp": "2025-08-26T13:46:13+08:00" }, "kbs": { "status": "ok", "timestamp": "2025-08-26T13:46:13+08:00" }, "as": { "status": "ok", "timestamp": "2025-08-26T13:46:13+08:00" }, "rvps": { "status": "ok", "timestamp": "2025-08-26T13:46:13+08:00" } }
3. Import model decryption keys to the Trustee instance
After the Trustee service is deployed, you must provide it with the model decryption key. This key is the basis for subsequent remote attestation and secure key distribution to inference services.
Trustee manages keys by mapping local file paths to resource IDs. The following operations create and import a model decryption key into the default key storage directory.
-
Execute the following commands to create a key directory (create a subdirectory named
aliyunin the local directory/opt/trustee/kbs/repository/default/) and write the key content:Replace
<model decryption key>with the actual key string. This example uses0Bn4Q1wwY9fN3P.sudo mkdir -p /opt/trustee/kbs/repository/default/aliyun/ sudo sh -c 'echo -n "<model decryption key>" > /opt/trustee/kbs/repository/default/aliyun/model-decryption-key' -
Verify the key ID.
After completing the above operations, the key stored at the file path
.../aliyun/model-decryption-keywill have a corresponding key ID in the Trustee system:kbs:///default/aliyun/model-decryption-key.
Step 3: Configure the ACK confidential computing cluster
This section builds an underlying infrastructure with hardware-level security isolation capabilities for running confidential computing tasks. It involves creating an ACK cluster and adding ecs.gn8v-tee instances with Intel TDX and NVIDIA TEE capabilities as worker nodes.
Execution environment: ECS, ACK console (for creating clusters, node pools, and ECS instances), and the Shell environment of the created ecs.gn8v-tee instance (for installing drivers).
-
Create an ACK managed cluster Pro edition in the China (Beijing) region. For more information, see Create an ACK managed cluster.
-
Create a node pool for the cluster to manage confidential computing instances. For more information, see Create and manage node pools.
-
vSwitch: Select the virtual switch in China (Beijing) Zone L.
-
Scaling Mode: Keep default configurations. Do not enable automatic elastic scaling.
-
Instance type: ecs.gn8v-tee.4xlarge and above.
-
Operating System: Alibaba Cloud Linux 3.2104 LTS 64-bit.
-
System Disk: 100 GiB or more.
-
Expected Number of Nodes: The initial number of nodes in the node pool. Keep default configurations, which is 0.
-
Node Labels: Add labels (Key:
ack.aliyun.com/nvidia-driver-version, Value:550.144.03) to specify the NVIDIA driver version.
-
-
Create EGS confidential computing instances as cluster nodes. For more information, see Custom purchase instances.
-
Region: China (Beijing).
-
Network and zone: VPC consistent with the cluster VPC, Zone L.
-
Instance type: ecs.gn8v-tee.4xlarge and above.
gn8v-tee instance types have CPU and GPU confidential computing features enabled by default. No need to additionally select confidential virtual machines.
-
Image: Alibaba Cloud Linux 3.2104 LTS 64-bit.
-
-
Log on to the created EGS instance and install NVIDIA drivers and CUDA toolkit. For more information, see Step 1: Install NVIDIA drivers and CUDA toolkit.
-
Add the EGS instance to the previously created node pool. Select manual addition as the method. For more information, see Add existing nodes.
Step 4: Deploy ACK-CAI components
Based on ACK-CAI components, enable non-intrusive confidential computing capabilities for applications in the cluster. This component includes a Webhook controller that automatically injects Sidecar containers into Pods based on their Annotation. These Sidecar containers provide remote attestation, model decryption, and secure communication.
Execution environment: ACK console.
Log on to the Container Service Management Console . In the navigation pane on the left, click Clusters.
On the Clusters page, click the name of your cluster. In the navigation pane on the left, click .
-
Click Create and follow the on-screen prompts to install the latest version of ACK-CAI.
After installation is complete, view the deployment status in the Helm Chart list.
Step 5: Deploy the vLLM model inference service
After the basic environment and security components are ready, this section uses Helm to deploy the vLLM service. Add specific Annotations to declare that the application requires security enhancements by ACK-CAI.
Execution environment: A machine with kubectl and Helm configured and access to the cluster. Use it directly in Workbench or CloudShell.
-
Create an empty Helm Chart directory.
mkdir -p ack-cai-vllm-demo cd ack-cai-vllm-demo -
Initialize a Helm Chart for deploying the vLLM service.
This Helm Chart forcibly schedules the vLLM inference service to confidential computing GPU nodes and uses a CSI plugin to store models in OSS.
-
Edit the values.yaml file and fill in the environment context.
Replace <trustee-ip> with the Trustee address, and replace the actual OSS parameter information.
caiOptions: | { "cipher-text-volume": "pvc-oss", "model-decryption-key-id" : "kbs:///default/aliyun/model-decryption-key", "trustee-address": "http://<trustee-ip>:8081/api" } oss: bucket: "conf-ai" # Replace with the OSS Bucket name where encrypted models are stored path: "/qwen2.5-3b-gocryptfs/" # Replace with the path of the encrypted model file within the OSS Bucket url: "https://oss-cn-beijing-internal.aliyuncs.com" # Replace with the OSS Endpoint akId: "xxxxx" # Replace with the Alibaba Cloud AK ID akSecret: "xxxxx" # Replace with the Alibaba Cloud AK Secret -
Deploy the vLLM service.
helm install vllm . -n default -
Check if the CAI component's Sidecar containers are successfully injected into the Pod.
kubectl get pod cai-vllm -n default -o jsonpath='{range .status.initContainerStatuses[*]}{.name}{"\t"}{range .state.running}Running{end}{.state.*.reason}{"\n"}{end}{range .status.containerStatuses[*]}{.name}{"\t"}{range .state.running}Running{end}{.state.*.reason}{"\n"}{end}'In the expected output, the following 5 containers are displayed, indicating successful injection. Wait until all containers change from
PodInitializingtoRunning, which indicates that the service has started.cai-sidecar-attestation-agent Running cai-sidecar-confidential-data-hub Running cai-sidecar-tng Running cai-sidecar-cachefs Running inference-service Running -
Get and record the access address of the vLLM service.
kubectl get service cai-vllm-svc -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].port}{"\n"}'The expected output will return a URL in a similar format (
<vllm-ip>:<port>):http://182.XX.XX.225:8080
Step 6: Securely access the inference service
In addition to server-side security protection, establish an end-to-end encrypted communication link from client to server. This ensures the security of inference data during transmission. This section starts a TNG security gateway on the client, creating a local proxy that automatically encrypts all requests sent to the vLLM service and decrypts received responses.
Execution environment: Client environment, which is any machine that needs to invoke the vLLM inference service.
-
Start the TNG gateway on the client to establish a secure communication channel.
The TNG gateway creates a local proxy on the client to encrypt requests sent to the server.
Replace <IP> with the Trustee address.
docker run -d \ --network=host \ confidential-ai-registry.cn-shanghai.cr.aliyuncs.com/product/tng:2.2.4 \ tng launch --config-content ' { "add_ingress": [ { "http_proxy": { "proxy_listen": { "host": "0.0.0.0", "port": 41000 } }, "encap_in_http": {}, "verify": { "as_addr": "http://<trustee-ip>:8081/api/attestation-service/", "policy_ids": [ "default" ] } } ] } ' -
Access the vLLM service through the TNG proxy.
Replace
<vllm-ip>:<port>with the previously obtained access address of the vLLM service.# Set the http_proxy environment variable export http_proxy=http://127.0.0.1:41000 # Send a curl request curl http://<vllm-ip>:<port>/v1/completions \ -H "Content-type: application/json" \ -d '{ "model": "qwen2.5-3b-instruct", "prompt": "San Francisco is a", "max_tokens": 7, "temperature": 0 }'
Reference information
caiOptions configuration description
caiOptions accepts a JSON-formatted configuration object. ACK CAI's Admission Webhook parses these parameters and dynamically injects and configures necessary security components (such as AA, CDH, etc.) into the Pod. This achieves functions such as transparent encryption/decryption, remote attestation, and trusted network.
The following is a complete caiOptions configuration example.
{
"cipher-text-volume": "pvc-oss",
"model-decryption-key-id": "kbs:///default/aliyun/model-decryption-key",
"trustee-address": "http://<trustee-ip>:8081/api",
"aa-version": "1.3.1",
"cdh-version": "1.3.1",
"tng-version": "2.2.4",
"cachefs-version": "1.0.7-2.6.1",
"tdx-ra-enable": true,
"gpu-ra-enable": true,
"tng-http-secure-ports": [
{
"port": 8080
}
]
}
Configuration item details:
|
Configuration item |
Optional |
Description |
|
|
Required |
The PVC name that stores encrypted model data. ACK-CAI automatically decrypts the data mounted by this PVC in a trusted environment. |
|
|
Required |
The KBS URI of the model decryption key. The format is |
|
|
Required |
The address of the Trustee service, used for remote attestation and key retrieval. |
|
|
Optional |
The version of the Attestation Agent (AA) component. |
|
|
Optional |
The version of the Confidential Data Hub (CDH) component. |
|
|
Optional |
The version of the Trusted Network Gateway (TNG) component. |
|
|
Optional |
The version of the Cachefs component. |
|
|
Optional |
Whether to enable remote attestation support for CPU (TDX confidential instances). The default is |
|
|
Optional |
Whether to enable remote attestation support for GPU. The default is |
|
|
Optional |
Configure TNG to perform TLS encryption for traffic on specific HTTP ports. This accepts an array of objects, where each object represents a port encryption rule.
Where:
|
Encrypted model example files
The following provides encrypted models available for testing and their related configuration information. These models are stored in public-read OSS Buckets and are encrypted using the specified method.