All Products
Search
Document Center

Container Service for Kubernetes:Securely deploy vLLM inference services in an ACK heterogeneous confidential computing cluster

Last Updated:Sep 15, 2025

Large Language Model (LLM) inference involves sensitive data and core model assets, which are at risk of exposure when running in untrusted environments. Container Service for Kubernetes (ACK) Confidential AI (ACK-CAI) provides end-to-end security for model inference by integrating hardware-based confidential computing technologies, such as Intel Trust Domain Extensions (TDX) and GPU Trusted Execution Environments (TEEs).

ACK-CAI lets you deploy vLLM inference services in an ACK heterogeneous confidential computing cluster. This provides secure isolation and encrypted protection for your models and data. Key advantages:

  • Hardware-level security isolation: Builds a hardware-based TEE using Intel® TDX and NVIDIA GPU TEE, ensuring the confidentiality and integrity of models and data during computation.

  • Trusted key distribution: Uses a remote attestation mechanism to strictly verify the integrity of the execution environment. Only after successful verification does a separate trustee service release the model decryption key to the trusted environment.

  • End-to-end data encryption: Establishes an encrypted channel from the client to the server through a Trusted Network Gateway (TNG), protecting inference requests and responses during transmission.

  • Non-intrusive for applications: Automatically injects security components into pods using a Kubernetes webhook. You can enable confidential computing capabilities for your application with a simple annotation, requiring no changes to your business code or container images.

How it works

ACK-CAI enables transparent confidential computing capabilities by dynamically injecting a set of sidecar containers called Trustiflux into your application pods. The core security mechanism is based on remote attestation, which ensures that models and data are only accessed within a verified, trusted environment.

imageimage

Core components

  1. ACK heterogeneous confidential computing cluster: A Kubernetes cluster built on TDX confidential instances and GPU confidential computing capabilities.

  2. Trustee remote attestation service: Verifies the trustworthiness of the execution environment and distributes the model decryption key after successful verification.

  3. Runtime Trustiflux: A confidential computing runtime component delivered as a sidecar, which includes the following core modules:

    • Attestation Agent (AA): Performs remote attestation and retrieves decryption keys.

    • Confidential Data Hub (CDH): Handles the decryption of encrypted data.

    • TNG server: Establishes a secure communication channel for the service.

    • Cachefs: Provides the underlying support for model decryption.

  4. Inference service: The container responsible for running the actual LLM inference tasks.

  5. Inference program: The client-side application used to access the model inference service.

  6. TNG client: Establishes a secure communication channel with the cluster to ensure communication security.

Core security mechanisms

  • Encrypted model distribution with remote attestation:

    1. When a pod starts, the AA in the sidecar sends a request to the trustee remote attestation service.

    2. The trustee service verifies the integrity of the confidential environments for the CPU (TDX) and the GPU.

    3. After verification, the trustee service securely distributes the model decryption key to the pod.

    4. The CDH and Cachefs in the sidecar use this key to decrypt the encrypted model files, which are then mounted on the inference service container.

  • End-to-end encrypted inference with remote attestation:

    1. The end user's inference program sends requests through the local TNG Client.

    2. The request remains encrypted throughout its transit to prevent man-in-the-middle attacks.

    3. At the server, the request is decrypted by the TNG module in the sidecar before being passed to the inference service.

    4. The inference result is then encrypted by the TNG and securely returned to the client.

Process and environment guide

Deploying and accessing a secure vLLM inference service involves the following stages:

Step

Purpose

Environment

Step 1: Prepare the encrypted model

Encrypt the inference model and upload it to Object Storage Service (OSS) to ensure its confidentiality at rest.

A separate server for data preparation.

Step 2: Deploy the trustee remote attestation service

Deploy a standalone trustee service to act as the root of trust for verifying the environment and distributing keys.

A separate trustee server.

Step 3: Configure the ACK confidential computing cluster

Create and configure Kubernetes nodes to run confidential computing tasks.

  • ACK console and Elastic Compute Service (ECS) console

  • Shell environment of the ecs.gn8v-tee instance

Step 4: Deploy the ACK-CAI component

Install the CAI components in the cluster to enable the dynamic injection of security capabilities.

ACK console

Step 5: Deploy the vLLM service

Deploy the vLLM service to the cluster using Helm and enable confidential computing protection with an annotation.

A machine with kubectl and Helm configured, and connected to the API server.

Step 6: Securely access the inference service

Start the client-side security proxy to access the deployed model service over an encrypted channel.

Client environment

Step 1: Prepare the encrypted model

This step covers how to encrypt your model data and upload it to OSS in preparation for secure distribution.

Execution environment: To ensure security, perform these steps on a temporary, isolated ECS instance. For optimal performance, this instance should be in the same region as your OSS bucket to leverage high-speed internal network uploads.
The model files are large, and this process can be time-consuming. To quickly test the solution, you can skip this step. Use the sample encrypted model file and proceed to Step 2: Deploy the trustee remote attestation service.

1. Download a model

Before deploying a model, you must first encrypt it, then upload it to cloud storage. The key for decrypting the model will be hosted by KMS, controlled by the remote attestation service. Perform model encryption operations in a local or trusted environment. This solution shows how to deploy Qwen2.5-3B-Instruct as an example.

Note

If you already have a model, you do not need to download one. Skip to 2. Encrypt the model.

Run the following command in the terminal use ModelScope tool to download Qwen2.5-3B-Instruct(requires Python 3.9 or higher).

pip3 install modelscope importlib-metadata
modelscope download --model Qwen/Qwen2.5-3B-Instruct

The command will download the model to ~/.cache/modelscope/hub/models/Qwen/Qwen2.5-3B-Instruct/.

2. Encrypt the model

This solution supports model encryption using the gocryptfs encryption mode, which is based on the open-source AES-256-GCM standard.

  1. Install the Gocryptfs tool; currently only Gocryptfs V2.4.0 with the default parameters is supported. Choose one of the following installation methods:

    Method 1: (Recommended) Install from yum source

    If you use Alinux3 or AnolisOS 23 operating system, you can install gocryptfs using the yum source.

    Alinux 3
    sudo yum install gocryptfs -y
    AnolisOS 23
    sudo yum install anolis-epao-release -y
    sudo yum install gocryptfs -y

    Method 2: Download precompiled binary directly

    # Download precompiled Gocryptfs package
    wget https://github.jobcher.com/gh/https://github.com/rfjakob/gocryptfs/releases/download/v2.4.0/gocryptfs_v2.4.0_linux-static_amd64.tar.gz
    
    # Extract and install
    tar xf gocryptfs_v2.4.0_linux-static_amd64.tar.gz
    sudo install -m 0755 ./gocryptfs /usr/local/bin
  2. Create a Gocryptfs key file as the key for model encryption. In subsequent steps, you will need to upload this key to Trustee.

    In this solution, alibaba@1688 is the key for encryption, and will be stored in the cachefs-password file. You can also customize the key. But in practice, it's best to use a randomly generated strong key instead.

    cat << EOF > ~/cachefs-password
    alibaba@1688
    EOF
  3. Use the key to encrypt the model.

    1. Configure the path of the plaintext model.

      Note

      Configure the path of the plaintext model you just downloaded, or replace it with the path of your own model.

      PLAINTEXT_MODEL_PATH=~/.cache/modelscope/hub/models/Qwen/Qwen2.5-3B-Instruct/
    2. Use Gocryptfs to encrypt the model directory tree.

      After encryption, the model will be stored in encrypted form in the ./cipher directory.

      mkdir -p ~/mount
      cd ~/mount
      mkdir -p cipher plain
      
      # Install Gocryptfs runtime dependencies
      sudo yum install -y fuse
      
      # initialize gocryptfs
      cat ~/cachefs-password | gocryptfs -init cipher
      
      # mount to plain
      cat ~/cachefs-password | gocryptfs cipher plain
      
      # move AI model to ~/mount/plain
      cp -r ${PLAINTEXT_MODEL_PATH} ~/mount/plain

3. Upload the model

Prepare an OSS bucket that resides in the same region as the heterogeneous instance to be deployed. Then, upload the encrypted model to the OSS bucket. Doing so will allow you to pull and deploy the model data from the heterogeneous instance for subsequent operations.

Refer to the Get started by using the OSS console guide to create a storage space (bucket) and a directory named qwen-encrypted (for example, oss://examplebucket/qwen-encrypted/). Due to the large size of model files, we recommend using ossbrowser to upload the encrypted model to this directory.

Step 2: Deploy the trustee remote attestation service

Following Zero Trust principles, any confidential computing environment must be verified before it can access sensitive data, such as the model decryption keys.

The standalone trustee service, which you will deploy in this step, acts as the central authority for this verification process. It is responsible for:

  • Verifying the execution environment of the model and inference service.

  • Ensuring that model decryption keys are released only to a verified, trusted environment.

  • Enabling clients to confirm the trustworthiness of the service when initiating an inference request.

Execution environment: A dedicated, separate server deployed outside the ACK cluster, such as an ECS instance or an on-premises server.

1. Choose a deployment solution

Based on your required trust level, choose between the following solutions:

  • ECS instance

    Deploying the trustee service on a separate ECS instance within the same VPC provides both logical isolation and high-speed, secure internal network communication with your ACK cluster.

  • On-premises server

    For maximum security, deploy the trustee service in your data center and connect it to your virtual private cloud (VPC) via a leased line or VPN. This ensures that you have full control over the hardware and software environment of the root of trust, independent of the cloud provider.

Before you start, make sure the server has Internet access enabled, and that port 8081 is open.

2. Deploy the trustee service

  1. Install the trustee RPM package from the official YUM repository (available on Alibaba Cloud Linux 3.x and Anolis 8.x+).

    yum install trustee-1.5.2

    The service will start automatically and listen on port 8081. You can access it directly over the network using the URL http://<trustee-ip>:8081/api.

    Replace <trustee-ip> with the IP address of the server where trustee is deployed.
    For production environments, configure HTTPS access for trustee to enhance security.
  2. Verify the health of the service.

    Run sudo yum install -y jq to install the jq tool.
    # Replace <trustee-ip> with the IP address of the trustee server
    curl http://<trustee-ip>:8081/api/services-health | jq

    A successful response will show the status of all components as ok.

    {
      "gateway": {
        "status": "ok",
        "timestamp": "2025-08-26T13:46:13+08:00"
      },
      "kbs": {
        "status": "ok",
        "timestamp": "2025-08-26T13:46:13+08:00"
      },
      "as": {
        "status": "ok",
        "timestamp": "2025-08-26T13:46:13+08:00"
      },
      "rvps": {
        "status": "ok",
        "timestamp": "2025-08-26T13:46:13+08:00"
      }
    }

Common management commands

The trustee service is managed by systemd. You can use the systemctl command to manage its lifecycle. Common operations include:

  • Start the service: systemctl start trustee

  • Stop the service: systemctl stop trustee

  • Restart the service: systemctl restart trustee

  • View the status: systemctl status trustee

3. Import the model decryption key into the trustee instance

Once the Trustee service is deployed, you must provide it with the model decryption key. This key is required for the remote attestation and secure key distribution process.

Trustee manages keys by mapping local file paths to resource IDs. The following steps create and import a model decryption key in the default storage folder.

  1. Create a directory and file to store the decryption key on the trustee server. This creates a subdirectory named aliyun in the local folder /opt/trustee/kbs/repository/default/.

    Replace <model-decryption-key> with the key you used in Step 1. In this example, the key is alibaba@1688.
    sudo mkdir -p /opt/trustee/kbs/repository/default/aliyun/
    sudo sh -c 'echo -n "<model-decryption-key>" > /opt/trustee/kbs/repository/default/aliyun/model-decryption-key'
  2. Verify the key ID.

    The key is now stored at the file path .../aliyun/model-decryption-key with the key ID kbs:///default/aliyun/model-decryption-key in the trustee system.

Step 3: Configure the ACK confidential computing cluster

In this step, you will build the underlying infrastructure: an ACK cluster with ecs.gn8v-tee instances that provide both Intel TDX and NVIDIA TEE capabilities as worker nodes.

Execution environment: The ECS and ACK consoles (for creating clusters, node pools, and ECS instances), and the shell environment of the created ecs.gn8v-tee instance (for installing drivers).
  1. Create an ACK managed Pro cluster in the China (Beijing) region.

  2. Create a node pool for the cluster to manage the confidential computing instances.

    • vSwitch: Select a virtual switch in Zone L of the China (Beijing) region.

    • Scaling Mode: Keep the default configurations. Do not enable auto scaling.

    • Instance Type: ecs.gn8v-tee.4xlarge or a higher specification.

    • Operating System: Alibaba Cloud Linux 3.2104 LTS 64-bit.

    • System Disk: 100 GiB or larger.

    • Expected Nodes: The initial number of nodes in the node pool. Keep the default configuration of 0.

    • Node Labels: Add a label (Key: ack.aliyun.com/nvidia-driver-version, Value: 550.144.03) to specify the NVIDIA driver version.

  3. Create an Elastic GPU Service (EGS) confidential computing instance to serve as a cluster node. See Create an instance on the Custom Launch tab.

    • Region: China (Beijing).

    • Network and Zone: The VPC must be the same as the cluster's VPC. This example uses one in Zone L.

    • Instance: ecs.gn8v-tee.4xlarge or a higher specification.

      The gn8v-tee instance types have CPU and GPU confidential computing features enabled by default. You do not need to select confidential VM.
    • Image: Alibaba Cloud Linux 3.2104 LTS 64-bit.

  4. Log on to the created EGS instance and install the NVIDIA driver and CUDA toolkit.

  5. Add the EGS instance to the created node pool. Select Manual as the method for adding the instance. See Add existing ECS instances.

Step 4: Deploy the ACK-CAI component

The ACK-CAI component includes a webhook controller that automatically injects the necessary sidecar containers into pods based on their annotations. These sidecars handle remote attestation, model decryption, and secure communication.

Execution environment: ACK console.
  1. Log on to the ACK console. In the navigation pane on the left, click Clusters.

  2. On the Clusters page, find the cluster you want and click its name. In the left-side navigation pane, choose Applications > Helm.

  3. Click Deploy and install the latest version of ack-cai.

    In the Parameters step, change the tag to 1.1.1 in the YAML template.

    You can now view the deployment status in the Helm chart list.

Step 5: Deploy the vLLM inference service

Deploy the vLLM service using Helm and add the specific annotation to enable confidential computing protection.

Execution environment: A machine with kubectl and Helm configured and able to access the cluster. You can use Workbench or CloudShell.
  1. Create a new folder for the Helm chart.

    mkdir -p ack-cai-vllm-demo
    cd ack-cai-vllm-demo
  2. Initialize a Helm chart to deploy the vLLM service.

    This Helm chart configures the vLLM inference service with node affinity to ensure it runs only on confidential computing GPU-accelerated nodes. It also uses a CSI plugin to mount an OSS bucket for model storage.

    Expand to view the Helm chart initialization script

    # Create the template file
    mkdir -p ./templates
    cat <<EOF >templates/vllm.yaml
    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: pv-oss
      namespace: {{ .Release.Namespace }}
      labels:
        alicloud-pvname: pv-oss
    spec:
      capacity:
        storage: 5Gi
      accessModes:
        - ReadOnlyMany
      persistentVolumeReclaimPolicy: Retain
      csi:
        driver: ossplugin.csi.alibabacloud.com
        volumeHandle: pv-oss
        volumeAttributes:
          bucket: {{ .Values.oss.bucket }}
          path: {{ .Values.oss.path }}
          url: {{ .Values.oss.url }}
          otherOpts: "-o umask=022 -o max_stat_cache_size=0 -o allow_other"
        nodePublishSecretRef:
          name: oss-secret
          namespace: {{ .Release.Namespace }}
    
    ---
    
    apiVersion: v1
    kind: Secret
    metadata:
      name: oss-secret
      namespace: {{ .Release.Namespace }}
    stringData:
      akId: {{ .Values.oss.akId }}
      akSecret: {{ .Values.oss.akSecret }}
    
    ---
    
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: pvc-oss
      namespace: {{ .Release.Namespace }}
    spec:
      accessModes:
        - ReadOnlyMany
      resources:
        requests:
          storage: 5Gi
      selector:
        matchLabels:
          alicloud-pvname: pv-oss
    
    ---
    
    apiVersion: v1
    kind: Service
    metadata:
      name: cai-vllm-svc
      namespace: {{ .Release.Namespace }}
      {{- if .Values.loadbalancer}}
      {{- if .Values.loadbalancer.aclId }}
      annotations:
        service.beta.kubernetes.io/alibaba-cloud-loadbalancer-acl-status: "on"
        service.beta.kubernetes.io/alibaba-cloud-loadbalancer-acl-id: {{ .Values.loadbalancer.aclId }}
        service.beta.kubernetes.io/alibaba-cloud-loadbalancer-acl-type: "white"
      {{- end }}
      {{- end }}
      labels:
        app: cai-vllm
    spec:
      ports:
      - port: 8080
        protocol: TCP
        targetPort: 8080
      selector:
        app: cai-vllm
      type: LoadBalancer
    
    ---
    
    apiVersion: v1
    kind: Pod
    metadata:
      name: cai-vllm
      namespace: {{ .Release.Namespace }}
      labels:
        app: cai-vllm
        trustiflux.alibaba.com/confidential-computing-mode: "ACK-CAI"
      annotations:
        trustiflux.alibaba.com/ack-cai-options: |
    {{ .Values.caiOptions | indent 6 }}
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node.kubernetes.io/instance-type
                operator: In
                values:
                  - ecs.gn8v-tee.4xlarge
                  - ecs.gn8v-tee.6xlarge
                  - ecs.gn8v-tee-8x.16xlarge
                  - ecs.gn8v-tee-8x.48xlarge
      containers:
        - name: inference-service
          image: egslingjun-registry.cn-wulanchabu.cr.aliyuncs.com/egslingjun/llm-inference:vllm0.5.4-deepgpu-llm24.7-pytorch2.4.0-cuda12.4-ubuntu22.04
          command:
            - bash
          args: ["-c", "vllm serve /tmp/model --port 8080 --host 0.0.0.0 --served-model-name qwen2.5-3b-instruct --device cuda --dtype auto"]
          ports:
            - containerPort: 8080
          resources:
            limits:
              nvidia.com/gpu: 1  # Request 1 GPU for this container
          volumeMounts:
            - name: pvc-oss
              mountPath: "/tmp/model"
    
      volumes:
        - name: pvc-oss
          persistentVolumeClaim:
            claimName: pvc-oss
    
    EOF
    
    # Create the Helm chart description file
    cat <<EOF > ./Chart.yaml
    apiVersion: v2
    name: vllm
    description: A test based on vllm for ack-cai
    type: application
    version: 0.1.0
    appVersion: "0.1.0"
    EOF
    
    # Create a variables file for the Helm chart named values.yaml
    touch values.yaml
    
    
    
  3. Edit the values.yaml file to provide your environment-specific information.

    Replace <trustee-ip> with the trustee address and replace the OSS configurations with your actual values.
    caiOptions: |
      {
          "cipher-text-volume": "pvc-oss",
          "model-decryption-key-id" : "kbs:///default/aliyun/model-decryption-key",
          "trustee-address": "http://<trustee-ip>:8081/api"
      }
    oss:
      bucket: "conf-ai"                          # Replace with the name of the OSS bucket that stores the encrypted model.
      path: "/qwen2.5-3b-gocryptfs/"             # Replace with the path to the encrypted model file in the OSS Bucket.
      url: "https://oss-cn-beijing-internal.aliyuncs.com"   # Replace with the OSS endpoint.
      akId: "xxxxx"                              # Replace with your Alibaba Cloud AccessKey ID.
      akSecret: "xxxxx"                          # Replace with your Alibaba Cloud AccessKey secret.
  4. Deploy the vLLM service using Helm.

    helm install vllm . -n default
  5. Verify that the CAI component's sidecar containers have been successfully injected into the pod.

    kubectl get pod cai-vllm -n default -o jsonpath='{range .status.initContainerStatuses[*]}{.name}{"\t"}{range .state.running}Running{end}{.state.*.reason}{"\n"}{end}{range .status.containerStatuses[*]}{.name}{"\t"}{range .state.running}Running{end}{.state.*.reason}{"\n"}{end}'

    The expected output shows the following five containers, meaning the injection succeeded. Wait for all containers to change from PodInitializing to Running. This indicates that the service has started.

    cai-sidecar-attestation-agent   Running
    cai-sidecar-confidential-data-hub       Running
    cai-sidecar-tng Running
    cai-sidecar-cachefs     Running
    inference-service       Running
  6. Get and record the vLLM service endpoint.

    kubectl get service cai-vllm-svc -o jsonpath='http://{.status.loadBalancer.ingress[0].ip}:{.spec.ports[0].port}{"\n"}'

    The expected output is a URL in the format <vllm-ip>:<port>.

    http://182.XX.XX.225:8080

Step 6: Securely access the inference service

To ensure end-to-end security, you must use the TNG client gateway to proxy your requests. The proxy automatically encrypts all requests sent to the vLLM service and decrypts the responses.

Execution environment: The client machine from which you want to call the inference service.
  1. Start the TNG gateway on the client to establish a secure communication channel.

    The TNG gateway creates a local proxy on the client to encrypt requests sent to the server.
    Replace <IP> with the trustee address.
    docker run -d \
        --network=host \
        confidential-ai-registry.cn-shanghai.cr.aliyuncs.com/product/tng:2.2.4 \
        tng launch --config-content '
          {
            "add_ingress": [
              {
                "http_proxy": {
                  "proxy_listen": {
                    "host": "0.0.0.0",
                    "port": 41000
                  }
                },
                "encap_in_http": {},
                "verify": {
     "as_addr": "http://<trustee-ip>:8081/api/attestation-service/",
                  "policy_ids": [
                    "default"
                  ]
                }
              }
            ]
          }
    '
  2. Access the vLLM service through the TNG proxy.

    Replace <vllm-ip>:<port> with the endpoint of the vLLM service you obtained earlier.
    # Set the http_proxy environment variable
    export http_proxy=http://127.0.0.1:41000
    
    # Send a curl request
    curl http://<vllm-ip>:<port>/v1/completions \
      -H "Content-type: application/json" \
      -d '{
        "model": "qwen2.5-3b-instruct",
        "prompt": "San Francisco is a",
        "max_tokens": 7,
        "temperature": 0
        }'

Reference

Configuring caiOptions

The caiOptions annotation accepts a configuration object in JSON format. The ACK CAI admission webhook parses these parameters and uses them to dynamically inject and configure the security components, such as AA and CDH, into the pod. This enables features such as transparent encryption and decryption, remote attestation, and trusted networking.

The following is a complete example of a caiOptions configuration.

{
  "cipher-text-volume": "pvc-oss",
  "model-decryption-key-id": "kbs:///default/aliyun/model-decryption-key",
  "trustee-address": "http://<trustee-ip>:8081/api",
  "aa-version": "1.3.1",
  "cdh-version": "1.3.1",
  "tng-version": "2.2.4",
  "cachefs-version": "1.0.7-2.6.1",
  "tdx-ra-enable": true,
  "gpu-ra-enable": true,
  "tng-http-secure-ports": [
    {
      "port": 8080
    }
  ]
}

The following table details the configurations:

Parameter

Required

Description

cipher-text-volume

Yes

The name of the persistent volume claim (PVC) that stores the encrypted model. ACK-CAI automatically decrypts the data mounted from this PVC in the trusted environment.

model-decryption-key-id

Yes

The Key Broker Service (KBS) URI of the model decryption key, in the format kbs:///<repository>/<group>/<key>.

trustee-address

Yes

The address of the trustee service, used for remote attestation and key retrieval.

aa-version

No

The version of the AA component.

cdh-version

No

The version of the CDH component.

tng-version

No

The version of the TNG component.

cachefs-version

No

The version of the Cachefs component.

tdx-ra-enable

No

Specifies whether to enable remote attestation support for the CPU (TDX confidential instance). Default: true.

gpu-ra-enable

No

Specifies whether to enable remote attestation support for the GPU. Default: true.

tng-http-secure-ports

No

Configures TNG to use TLS to encrypt traffic for specific HTTP ports. It accepts an array of objects, where each object represents a port encryption rule.

"tng-http-secure-ports": [
  {
    "port": 8080,
    "allow-insecure-request-regexes": [
      "/api/builtin/.*"
    ]
  }
]
  • port: The HTTP service port number that requires TLS encryption protection by TNG.

  • allow-insecure-request-regexes: An array of regular expressions. For any HTTP request sent to the specified port, if its path matches any regular expression in this array, TNG will not encrypt the request.

Sample encrypted model files

For testing purposes, you can use the following publicly available encrypted models. They are stored in an OSS bucket and have been encrypted using the specified method.

Unfold

Model name

Encryption method

Encryption password

Public-read OSS region

OSS endpoint

Model storage location

Qwen2.5-3B-Instruct

Gocryptfs

alibaba@1688

cn-beijing

oss-cn-beijing-internal.aliyuncs.com

conf-ai:/qwen2.5-3b-gocryptfs/

Sam

conf-ai:/qwen2.5-3b-sam/