All Products
Search
Document Center

Container Compute Service:inference-nv-pytorch 25.08

Last Updated:Mar 26, 2026

This release upgrades vLLM to v0.10.0 and SGLang to v0.4.10.post2. No bug fixes are included.

What's new

  • vLLM upgraded to v0.10.0

  • SGLang upgraded to v0.4.10.post2

Image variants

Two image variants are available: one built around vLLM and one around SGLang.

FieldvLLM variantSGLang variant
Tag25.08-vllm0.10.0-pytorch2.7-cu128-20250811-serverless25.08-sglang0.4.10.post2-pytorch2.7-cu128-20250808-serverless
Use caseLarge model inferenceLarge model inference
FrameworkPyTorchPyTorch
Minimum driverNVIDIA Driver >= 570NVIDIA Driver >= 570

System components

vLLM image

Package typePackageVersion
OSUbuntu24.04
RuntimePython3.12
RuntimeTorch2.7.1+cu128
RuntimeCUDA12.8
LibraryNCCL2.27.5
Libraryflash_attn2.8.2
Librarytriton3.3.1
Libraryxformers0.0.31
Libraryxfuser0.4.4
Libraryxgrammar0.1.21
Libraryray2.48.0
Librarytransformers4.55.0
Librarydiffusers0.34.0
Libraryimageio2.37.0
Libraryimageio-ffmpeg0.6.0
Libraryvllm0.10.0
DeepGPUdeepgpu-torch0.0.24+torch2.7.0cu128
DeepGPUdeepgpu-comfyui1.1.7

SGLang image

Package typePackageVersion
OSUbuntu24.04
RuntimePython3.12
RuntimeTorch2.7.1+cu128
RuntimeCUDA12.8
LibraryNCCL2.27.5
Libraryflash_attn2.8.2
Libraryflash_mla1.0.0+41b611f
Libraryflashinfer-python0.2.9rc2
Librarytriton3.3.1
Libraryxgrammar0.1.22
Librarytorchao0.9.0
Librarytransformers4.54.1
Librarydiffusers0.34.0
Libraryimageio2.37.0
Libraryimageio-ffmpeg0.6.0
Librarysgl-kernel0.2.8
Librarysglang0.4.10.post2

Image registry

Internet images

egslingjun-registry.cn-wulanchabu.cr.aliyuncs.com/egslingjun/inference-nv-pytorch:25.08-vllm0.10.0-pytorch2.7-cu128-20250811-serverless

egslingjun-registry.cn-wulanchabu.cr.aliyuncs.com/egslingjun/inference-nv-pytorch:25.08-sglang0.4.10.post2-pytorch2.7-cu128-20250808-serverless

VPC images

acs-registry-vpc.{region-id}.cr.aliyuncs.com/egslingjun/{image:tag}

Replace the placeholders with your actual values:

PlaceholderDescriptionExample
{region-id}Region where your ACS is activatedcn-beijing, cn-wulanchabu
{image:tag}Image name and taginference-nv-pytorch:25.08-vllm0.10.0-pytorch2.7-cu128-20250811-serverless
Important

VPC image pulls are currently supported only in the China (Beijing) region.

Note

Both images are compatible with ACS products and Lingjun multi-tenant products. They are not compatible with Lingjun single-tenant products.

Driver requirements

NVIDIA Driver release >= 570

Quick start

The following example pulls the vLLM image and runs an inference test using the Qwen2.5-7B-Instruct model.

Note

To use inference-nv-pytorch images in ACS, select the image on the Artifacts page when creating a workload in the console, or specify the image reference in a YAML file. For step-by-step guides, see:

  1. Pull the container image.

    docker pull egslingjun-registry.cn-wulanchabu.cr.aliyuncs.com/egslingjun/inference-nv-pytorch:[tag]
  2. Download the Qwen2.5-7B-Instruct model from ModelScope.

    pip install modelscope
    cd /mnt
    modelscope download --model Qwen/Qwen2.5-7B-Instruct --local_dir ./Qwen2.5-7B-Instruct
  3. Start the container.

    docker run -d -t --network=host --privileged --init --ipc=host \
    --ulimit memlock=-1 --ulimit stack=67108864 \
    -v /mnt/:/mnt/ \
    egslingjun-registry.cn-wulanchabu.cr.aliyuncs.com/egslingjun/inference-nv-pytorch:[tag]
  4. Start the vLLM inference service inside the container.

    python3 -m vllm.entrypoints.openai.api_server \
    --model /mnt/Qwen2.5-7B-Instruct \
    --trust-remote-code --disable-custom-all-reduce \
    --tensor-parallel-size 1
  5. Send a test request from the client.

    curl http://localhost:8000/v1/chat/completions \
        -H "Content-Type: application/json" \
        -d '{
        "model": "/mnt/Qwen2.5-7B-Instruct",
        "messages": [
        {"role": "system", "content": "You are a friendly AI assistant."},
        {"role": "user", "content": "Tell me about deep learning."}
        ]}'

    For more information about vLLM, see the vLLM documentation.

Known issues

  • The deepgpu-comfyui plug-in for Wanx model video generation currently supports only the GN8IS and G49E GPU instance types.