All Products
Search
Document Center

Alibaba Cloud Linux:Use AC2 images on ECS instances

Last Updated:Nov 21, 2025

Alibaba Cloud AI Containers (AC2) provides a set of AI container images that are optimized for and deeply integrated with Alibaba Cloud infrastructure, such as Elastic Compute Service (ECS) and Container Service for Kubernetes (ACK). These images can significantly reduce the cost of deploying AI application environments. This topic describes how to configure the runtime environment on an ECS instance and run an AC2 container for PyTorch training.

Purchase and connect to an ECS instance

To use AC2 on an ECS instance, you must first purchase and connect to an ECS instance. For more information, see Create and manage an ECS instance in the console (express version).

When you purchase an instance, Alibaba Cloud provides a wide selection of images. We recommend using the Alibaba Cloud Linux 3.2104 LTS 64-bit image to ensure full service support.

Configure the ECS instance

Install Docker

Before you can use AC2, you must set up the Docker runtime environment. The installation steps for Docker vary depending on the operating system. For more information, see Install and use Docker and Docker Compose. The following steps show how to set up the Docker runtime environment on Alibaba Cloud Linux 3.

  1. Add the Docker Community Edition (Docker-CE) software repository and install a compatible plugin.

    sudo dnf config-manager --add-repo=https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
    sudo dnf install -y --repo alinux3-plus dnf-plugin-releasever-adapter
  2. Install Docker-CE.

    sudo dnf install -y docker-ce
  3. After the installation is complete, run the following command to view the Docker version number and check whether the installation was successful.

    docker -v
  4. Start the Docker service and enable it to start on system startup.

    sudo systemctl start docker  # Start the Docker service
    sudo systemctl enable docker # Set the Docker service to start on system startup
  5. Check the status of the Docker service.

    systemctl status docker

    The output in the following figure indicates that the Docker service has started successfully.

    docker-started-systemctl-status.png

Install the NVIDIA Container Toolkit (for GPU-accelerated instances)

For GPU-accelerated instances, you must install the NVIDIA Container Toolkit to enable GPU support for containers. The NVIDIA Container Toolkit is supported on Alibaba Cloud Linux 3. Run the following commands to complete the configuration on a GPU-accelerated instance.

  1. Add the NVIDIA CUDA repository and enable the driver for installation. You can switch driver versions by enabling different module streams, such as 570-open or open-dkms. Run dnf module list nvidia-driver to view all available versions. For information about recommended driver versions for different instances, see Automatically install or load a Tesla driver when you create a GPU-accelerated instance.

    sudo dnf config-manager --add-repo=https://developer.download.nvidia.com/compute/cuda/repos/rhel8/x86_64/cuda-rhel8.repo
    sudo dnf module enable nvidia-driver:570-open
    sudo dnf install nvidia-driver-cuda kmod-nvidia-open-dkms
    To uninstall or switch the driver version, run the following commands to remove the installed driver and reset the module configuration.
    dnf module remove --all nvidia-driver
    dnf module reset nvidia-driver
  2. Install the NVIDIA Container Toolkit.

    sudo dnf install -y nvidia-container-toolkit
  3. Restart the Docker daemon process to enable GPU access for containers.

    sudo systemctl restart docker

    After the service restarts, Docker can pass through GPUs. When you create a container, you can add the --gpus <gpu-request> parameter to specify the GPUs to pass through.

After the ECS instance is configured, you can pull AC2 images for development and deployment. For more information, see Alibaba Cloud AI Containers image list.

Use AC2 for training

AC2 provides images for many training frameworks, such as PyTorch and TensorFlow. These images integrate various AI runtime frameworks and contain verified runtime drivers and acceleration libraries for different platforms, such as NVIDIA drivers and CUDA. This can save developers significant time during environment deployment.

PyTorch framework images

Run a PyTorch CPU image to train a model

  1. Run the CPU container image.

    Replace <ac2-pytorch-cpu-image> with the address of the PyTorch CPU image.

    sudo docker run -it --rm <ac2-pytorch-cpu-image>
  2. Download the sample code.

    git clone https://github.com/pytorch/examples.git
  3. Download the PyTorch sample code and use the --no-cuda parameter to specify that the training task runs on the CPU.

    python3 examples/mnist/main.py --no-cuda

Run a PyTorch GPU image to train a model

  1. Run the GPU container image. Use the --gpus all parameter to allow the container to use all GPUs.

    Replace <ac2-pytorch-gpu-image> with the address of the PyTorch GPU image.

    sudo docker run -it --rm --gpus all <ac2-pytorch-gpu-image>
  2. Download the sample code.

    git clone https://github.com/pytorch/examples.git
  3. Download the PyTorch sample code. By default, the GPU is used for training.

    python3 examples/mnist/main.py

Clean up the runtime environment

After the training tasks are complete, you can follow these steps to clean up the container environment.

  1. Run the following command to list all containers.

    sudo docker ps -a

    The command output lists information such as the container ID (CONTAINER ID), image (IMAGE), status (STATUS), and container name (NAMES).

    docker-ps-output.png

  2. Run the following commands to remove a container. Before you remove a container, make sure that it is stopped.

    Replace <CONTAINER ID> with the ID of the container that you want to remove.

    sudo docker stop <CONTAINER ID> # Stop the container
    sudo docker rm <CONTAINER ID>   # Remove the container
  3. Run the following command to list the downloaded images.

    sudo docker images

    The output shows information such as the image repository, image tag, and image ID.

    docker-images-output.png

  4. Run the following command to delete the specified image.

    Replace <IMAGE ID> with the ID of the image that you want to delete.

    sudo docker rmi <IMAGE ID>