All Products
Search
Document Center

Container Service for Kubernetes:Fix vulnerability CVE-2025-23266

Last Updated:Mar 26, 2026

CVE-2025-23266 is a critical time-of-check to time-of-use (TOCTOU) race condition in NVIDIA Container Toolkit 1.17.7 and earlier. If exploited, an attacker can escape a container and execute arbitrary commands on the host or access sensitive host data. Immediate remediation is required for affected clusters.

For the official vulnerability disclosure, see NVIDIA Security Bulletin 5659.

Affected scope

Your cluster is affected if both conditions are true:

  • Kubernetes version is 1.32 or earlier

  • At least one GPU-accelerated node runs NVIDIA Container Toolkit 1.17.7 or earlier

Note

This vulnerability does not affect deployments that use the Container Device Interface (CDI).

Known attack scenarios require running a malicious container image and accessing GPU resources through the NVIDIA Container Toolkit.

Check the NVIDIA Container Toolkit version

Log on to each GPU-accelerated node and run:

nvidia-container-cli --version

Sample output (unaffected version):

cli-version: 1.17.8
lib-version: 1.17.8
build date: 2025-05-30T13:47+00:00
build revision: 6eda4d76c8c5f8fc174e4abca83e513fb4dd63b0
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64

If cli-version is 1.17.7 or earlier, the node is affected.

Preventive measures

While you prepare to apply the fix, restrict image pulls to trusted registries only. Enable the container security policy rule in the policy governance feature.

Solution

New GPU-accelerated nodes

ACK edge clusters running Kubernetes 1.20 or later

Nodes created on or after August 4, 2025 automatically install the patched version (1.17.8) of the NVIDIA Container Toolkit. No further action is required.

Clusters running Kubernetes earlier than 1.20

Upgrade the cluster before creating new nodes to ensure new nodes receive the patched version. See Upgrade clusters.

Existing GPU-accelerated nodes

All GPU-accelerated nodes created before August 4, 2025 require a manual fix.

Important

Apply fixes in batches to maintain system stability.

Fix edge nodes

The fix procedure drains each node, upgrades the NVIDIA Container Toolkit to version 1.17.8, and then restores the node to service.

Prerequisites

Before you begin, ensure that you have:

  • SSH access to the target GPU-accelerated edge node

  • kubectl access to the cluster with sufficient permissions to cordon and drain nodes

Step 1: Drain the node

Draining a node safely migrates its workloads to other available nodes before you make changes.

  1. Mark the node as unschedulable:

    kubectl cordon <NODE_NAME>
  2. Drain the node:

    kubectl drain <NODE_NAME> --grace-period=120 --ignore-daemonsets=true

Step 2: Apply the fix

Log on to the affected node and run the following commands.

  1. Set the REGION and INTERCONNECT_MODE environment variables. Replace the example values with your actual configuration:

    ParameterDescriptionExample
    REGIONRegion ID of your ACK edge cluster. See Supported regions.cn-hangzhou
    INTERCONNECT_MODENetwork access type: basic (Internet access) or private (leased line access).basic
    export REGION="cn-hangzhou" INTERCONNECT_MODE="basic"
  2. Run the fix script:

    #!/bin/bash
    set -e
    
    if [[ $REGION == "" ]];then
        echo "Error: REGION is null"
        exit 1
    fi
    
    if [[ $INTERCONNECT_MODE == "" ]]; then
       echo "Error: INTERCONNECT_MODE is null"
       exit 1
    fi
    
    NV_TOOLKIT_VERSION=1.17.8
    
    INTERNAL=$( [ "$INTERCONNECT_MODE" = "private" ] && echo "-internal" || echo "" )
    PACKAGE=upgrade_nvidia-container-toolkit-${NV_TOOLKIT_VERSION}.tar.gz
    
    cd /tmp
    
    export PKG_URL_PREFIX="http://aliacs-k8s-${REGION}.oss-${REGION}${INTERNAL}.aliyuncs.com"
    curl -o ${PACKAGE}  ${PKG_URL_PREFIX}/public/pkg/nvidia-container-runtime/${PACKAGE}
    
    tar -xf ${PACKAGE}
    
    cd pkg/nvidia-container-runtime/upgrade/common
    
    bash upgrade-nvidia-container-toolkit.sh
  3. Verify the output:

    OutputMeaning
    INFO No need to upgrade current nvidia-container-toolkit(1.17.8)Node was already running the patched version. No changes were made.
    INFO succeed to upgrade nvidia container toolkitNode was vulnerable and has been successfully patched.

Step 3: Restore the node

Return the node to service:

kubectl uncordon <NODE_NAME>

Step 4 (optional): Verify GPU functionality

Deploy a GPU workload to confirm the node is functioning correctly after the fix. Use the sample YAML templates from: