This document shows how to deploy Confidential Containers on Alibaba Cloud bare metal instances using Intel TDX. This setup uses hardware-level confidential computing to secure containerized workloads at runtime. - Elastic Compute Service

Introduction

Enterprises rely on containerized applications for faster delivery, but this introduces security challenges. Running sensitive workloads in shared environments like multi-tenant clouds or edge locations exposes them to new risks. Traditional container isolation cannot fully protect against threats from the underlying infrastructure, including malicious administrators, compromised kernels, or firmware-level attacks.

The CNCF Confidential Containers (CoCo) project bridges the gap between cloud-native agility and data security. By encapsulating Kubernetes pods in a hardware-based confidential VM, it extends the protections of confidential computing to complex containerized workloads with minimal changes to existing applications. It encrypts runtime memory, making it inaccessible to the host, and provides remote attestation to verify the environment's integrity before injecting secrets.

This document shows how to deploy Confidential Containers on an 8th-generation ecs.ebmg8i.48xlarge bare metal instance. Using Intel confidential computing technology, you will build a trusted and isolated runtime environment, making security a default feature of your cloud-native infrastructure.

Architecture

The architecture diagram shows the single-node architecture of Confidential Containers. Each Kubernetes pod runs inside a TDX confidential VM, which protects data at runtime. All data processing occurs within this secure boundary, ensuring security throughout the entire data lifecycle.

Objectives

Deploy a single-node CoCo cluster (version v0.17.0).
Set up a single-node Kubernetes cluster (v1.32.0) using Kubeadm.

Procedure

Step 1: Create a TDX bare metal instance

Create an 8th-generation Alibaba Cloud bare metal instance that supports Intel TDX technology.

Go to the instance purchase page.
Select a billing method, region, and availability zone. The required community image is only available in specific availability zones. For this example, select Beijing Zone I.
In the Instance configuration section, click Elastic Bare Metal Server, and then select the ecs.ebmg8i.48xlarge instance type.
In the Image configuration section, select Community Image and search for the image using the image ID m-2ze2ucup4c5bvgx751lx. Below the image selection, you must select the Confidential VM checkbox to enable the TDX feature.
Important
This is a custom Ubuntu-based image with pre-installed drivers that support TDX. The default SSH login user is ubuntu, not root.
Configure settings for the network, storage, bandwidth, security group, and management. For detailed descriptions of each configuration item, see Configuration Item Descriptions.
Before you create the instance, review the overall configuration on the right side of the page and configure options such as the subscription duration. Ensure that all settings meet your requirements.
Click Confirm Order to create the instance.
Instance creation typically takes 3 to 5 minutes. You can go to the Instances page in the console to check the status. When the instance status changes to Running, the instance is ready.

Step 2: Prepare the Kubernetes node

After creating the instance, connect to it and install the required software to configure it as a single-node Kubernetes cluster.

Install containerd To run containers, you must install containerd as the container runtime.

# Update the package sources and install dependencies.
sudo apt-get update
sudo apt-get install -y ca-certificates curl

# Add Docker's official GPG key.
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
sudo chmod a+r /etc/apt/keyrings/docker.asc

# Add Docker's APT repository.
echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
  
# Install containerd.
sudo apt-get update
sudo apt-get install -y containerd.io

Configure containerd Generate the default configuration file, and then modify key parameters for Kubernetes compatibility.

# Generate the default configuration file.
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml

# Replace the Kubernetes image registry with an Alibaba Cloud mirror to accelerate image pulls.
sudo sed -i -E 's#registry.k8s.io#registry.aliyuncs.com/google_containers#g' /etc/containerd/config.toml

# Change the cgroup driver to systemd to meet Kubernetes requirements.
sudo sed -i 's#SystemdCgroup = false#SystemdCgroup = true#g' /etc/containerd/config.toml

# Restart the containerd service to apply the changes.
sudo systemctl restart containerd

Configure node kernel parameters To meet the networking and runtime requirements for Kubernetes, disable the swap partition and load the necessary kernel modules.

# Temporarily disable swap.
sudo swapoff -a
# Permanently disable swap by commenting out the swap entry in the fstab file.
sudo sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

# Load the overlay and br_netfilter kernel modules.
sudo modprobe overlay
sudo modprobe br_netfilter

# Configure the modules to load automatically at boot.
sudo tee /etc/modules-load.d/k8s.conf <<EOF
overlay
br_netfilter
EOF

# Configure kernel parameters to allow IP forwarding and bridged traffic processing.
sudo tee /etc/sysctl.d/kubernetes.conf <<EOT
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOT

# Apply all sysctl configurations.
sudo sysctl --system

Step 3: Install a single-node Kubernetes cluster

Use the kubeadm tool to quickly set up a single-node Kubernetes cluster.

Install Kubernetes components Install kubeadm, kubelet, and kubectl.

Ensure you use a Kubernetes version compatible with Confidential Containers. This document uses v1.32.0 as an example.

# Add the GPG key for the Kubernetes APT repository.
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.32/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

# Add the Kubernetes APT repository.
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.32/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list

# Install the Kubernetes tools.
sudo apt-get update
sudo apt-get install -y kubectl kubeadm kubelet
sudo apt-mark hold kubelet kubeadm kubectl

Initialize the Kubernetes control plane Use the kubeadm init command to initialize the control plane.

# Get the IP address of the primary network interface to use as the API server's advertise address.
# Note: This command assumes the primary interface is eth0. If your environment is different, replace it with the correct IP address.
NODE_IP=$(ip -o -4 addr show dev eth0 | awk '{split($4,a,"/");print a[1]}')

# Initialize the cluster.
sudo kubeadm init --pod-network-cidr=10.10.0.0/16 \
  --apiserver-advertise-address ${NODE_IP} \
  --kubernetes-version v1.32.0 \
  --image-repository registry.aliyuncs.com/google_containers

When initialization finishes, follow the output instructions to configure kubectl access credentials.

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Install a network plugin Deploy the Flannel CNI plugin to enable pod-to-pod communication in the cluster.

# Install the Flannel network plugin.
export KUBECONFIG=/etc/kubernetes/admin.conf

FLANNEL_VERSION=v0.27.4
wget https://github.com/flannel-io/flannel/releases/download/${FLANNEL_VERSION}/kube-flannel.yml

# Modify the Flannel manifest to set the correct pod network CIDR and use a regional image mirror.
sed -i -E 's#([0-9]{1,3}\.){3}[0-9]{1,3}/[0-9]+#10.10.0.0/16#g' kube-flannel.yml
sed -i -E 's#ghcr.io/flannel-io#confidential-ai-registry.cn-shanghai.cr.aliyuncs.com/product#g' kube-flannel.yml

# Deploy Flannel.
kubectl apply -f kube-flannel.yml

Remove the control plane taint To allow workloads to be scheduled on this single-node cluster, remove the NoSchedule taint from the control plane node.
```
NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
kubectl taint nodes $NODE_NAME node-role.kubernetes.io/control-plane-
```

Step 4: Deploy CoCo components

Confidential Containers offers multiple deployment methods. Choose either the Helm chart method or the Operator method.

Helm chart

Deploy CoCo using a Helm chart for more flexible configuration.

Install Helm If Helm is not already installed in your environment, install it.

curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
chmod 700 get_helm.sh
./get_helm.sh

Configure the CoCo Helm chart Clone the CoCo charts repository and create a custom values.yaml file to enable the TDX-related runtimes.

# Clone the charts repository and check out the specified version.
git clone https://github.com/confidential-containers/charts.git
cd charts
git reset --hard v0.17.0

# Create a TDX-specific configuration file.
cat << EOF > values-tdx.yaml
architecture: x86_64

# Point the CoCo image registry to a regional mirror address.
kata-as-coco-runtime:
  image:
    reference: registry-cn-hangzhou.ack.aliyuncs.com/dev/coco-kata-deploy
  imagePullPolicy: Always
  k8sDistribution: k8s
  debug: false

  # Snapshotter configuration
  snapshotter:
    setup: ["nydus"]

  # Enable the TDX-related shims.
  shims:
    qemu-tdx:
      enabled: true
      supportedArches:
        - amd64
      containerd:
        snapshotter: nydus
        forceGuestPull: false
      crio:
        guestPull: true
      agent:
        httpsProxy: ""
        noProxy: ""
    qemu-coco-dev:
      enabled: true
      supportedArches:
        - amd64
      allowedHypervisorAnnotations: []
      containerd:
        snapshotter: nydus
        forceGuestPull: false
      crio:
        guestPull: true
      agent:
        httpsProxy: ""
        noProxy: ""
    # Explicitly disable other unnecessary TEE shims.
    qemu-snp:
      enabled: false
    qemu-se:
      enabled: false

# Enable RuntimeClass creation.
runtimeClasses:
  enabled: true
  createDefault: false
  defaultName: "kata"
# Default shim per architecture
  defaultShim:
    amd64: qemu-tdx
EOF

Deploy CoCo Use Helm to install CoCo with your custom configuration.

helm install coco oci://ghcr.io/confidential-containers/charts/confidential-containers \
  -f values-tdx.yaml \
  --namespace coco-system \
  --create-namespace \
  --version 0.17.0

Create the RuntimeClass The Helm chart disables the creation of the default RuntimeClass. Therefore, you must manually create the required RuntimeClass objects to reference them in your pods.

cat <<EOF | kubectl apply -f -
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: kata-qemu-tdx
handler: kata-qemu-tdx
---
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
  name: kata-qemu-coco-dev
handler: kata-qemu-coco-dev
EOF

Operator

mkdir -p kustomize && cd kustomize

mkdir release && mkdir -p ccruntime/default

cat <<EOF > release/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- "github.com/confidential-containers/operator/config/release?ref=v0.17.0"

images:
- name: quay.io/confidential-containers/operator
  newName: registry-cn-hangzhou.ack.aliyuncs.com/dev/coco-operator
EOF

cat <<EOF > ccruntime/default/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- "github.com/confidential-containers/operator/config/samples/ccruntime/default?ref=v0.17.0"

images:
- name: quay.io/confidential-containers/reqs-payload
  newName: registry-cn-hangzhou.ack.aliyuncs.com/dev/coco-reqs-payload
- name: quay.io/kata-containers/kata-deploy-ci
  newName: registry-cn-hangzhou.ack.aliyuncs.com/dev/coco-kata-deploy-ci
- name: quay.io/kata-containers/kata-deploy
  newName: registry-cn-hangzhou.ack.aliyuncs.com/dev/coco-kata-deploy
EOF

# Allow the current node to act as a worker.
for NODE_NAME in $(kubectl get nodes -o jsonpath='{.items[*].metadata.name}'); do
  kubectl label node $NODE_NAME node.kubernetes.io/worker=
done

kubectl apply -k release
kubectl apply -k ccruntime/default

Step 5: Deploy and verify a sample pod

After deploying all components, run a sample pod and verify that it is running in a TDX confidential VM.

Label the node Add a label to the worker node to indicate that it supports the Kata runtime.

NODE_NAME=$(kubectl get nodes -o jsonpath='{.items[0].metadata.name}')
kubectl label node $NODE_NAME katacontainers.io/kata-runtime=true

Deploy a sample pod Create a pod and specify the TDX runtime in its spec by using runtimeClassName.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Pod
metadata:
  name: coco-demo-pod
spec:
  runtimeClassName: kata-qemu-tdx
  containers:
    - image: alibaba-cloud-linux-3-registry.cn-hangzhou.cr.aliyuncs.com/alinux3/alinux3:latest
      name: hello-alinux
      command:
        - "sleep"
        - "infinity"
EOF

Verify the deployment status Check that the pod is running successfully and confirm that it is using the correct runtime class.

# Wait for the pod status to become Running.
kubectl get pod coco-demo-pod

# Describe the pod and confirm that the "Runtime Class" field is "kata-qemu-tdx".
kubectl describe pod coco-demo-pod | grep "Runtime Class"

Verify the TDX environment Log in to the bare metal instance and check the kernel logs to confirm that the TDX module is initialized and in use.
```
# Run this command on the node.
dmesg | grep -i tdx
```
If you see output containing TDX module initialized or similar TDX-related messages, the TDX environment is activated, and the Kata runtime is using this hardware feature to create confidential VMs.

Costs and risks

Cost breakdown: The primary cost of this solution comes from the ecs.ebmg8i.48xlarge bare metal instance. This instance type has high specifications and is suitable for production workloads with strict security and performance requirements, which increases initial evaluation and testing costs.
Limitations and risks:
- Hardware and region dependency: This solution depends on a specific bare metal instance type and the availability zones that support it.
- Single point of failure: This tutorial demonstrates a single-node cluster, which does not provide high availability and is not suitable for a production environment. A production deployment requires a multi-node cluster and a high-availability architecture.
- Version compatibility: The versions of Confidential Containers, Kubernetes, and related components are tightly coupled. Upgrades and maintenance require careful compatibility testing.