All Products
Search
Document Center

Elastic GPU Service:Deploy an NGC environment on a GPU-accelerated instance

Last Updated:Jul 21, 2023

This topic describes how to deploy an NVIDIA GPU Cloud (NGC) environment on a GPU-accelerated instance. In this example, a TensorFlow deep learning framework is used.

Prerequisites

  • An NGC account is created from the NGC website.

  • The NGC API key is obtained from the NGC website and saved to your computer.

    Note

    When you log on to an NGC container environment, the system verifies your NGC API key.

Background information

  • NGC is a deep learning ecosystem that is developed by NVIDIA. NGC allows developers to access software stacks for free and use the stacks to build development environments for deep learning.

    Alibaba Cloud provides instances of the gn5 instance family that are configured with NGC. Alibaba Cloud also provides NGC container images that are optimized for NVIDIA Pascal GPUs in Alibaba Cloud Marketplace. The NGC container images allow developers to quickly deploy NGC container environments and access optimized deep learning frameworks. This way, you can develop and deploy services, and pre-install development environments in an efficient manner. The NGC container images also support optimized algorithm frameworks and real-time updates.

  • The NGC website provides various image versions for mainstream deep learning frameworks, such as Caffe, Caffe2, Microsoft Cognitive Toolkit (CNTK), MXNet, TensorFlow, Theano, and Torch. You can select an image based on your business requirements to deploy an environment.

  • You can deploy an NGC environment on an instance that belongs to one of the following instance families:

    • gn4, gn5, gn5i, gn6v, gn6i, gn6e, gn7i, gn7e, and gn7s

    • ebmgn5i, ebmgn6i, ebmgn6v, ebmgn6e, ebmgn7i, and ebmgn7e

Procedure

The following section describes how to create a GPU-accelerated instance and deploy an NGC environment on the instance. In this example, a GPU-accelerated instance of the gn5 instance family is used.

  1. Create a GPU-accelerated instance of the gn5 instance family.

    For more information, see Create an instance by using the wizard. When you configure parameters for the instance, take note of the following items:

    Parameter

    Description

    Region

    Select only one of the following regions: China (Qingdao), China (Beijing), China (Hohhot), China (Hangzhou), China (Shanghai), China (Shenzhen), China (Hong Kong), Singapore, Australia (Sydney), US (Silicon Valley), US (Virginia), and Germany (Frankfurt).

    Instance

    Instance: Select an instance of the gn5 instance family.

    Image

    1. Click Marketplace Image and click Select from Alibaba Cloud Marketplace (including operating system).

    2. In the dialog box that appears, find NVIDIA GPU Cloud Virtual Machine Image.

    3. Click Use.

    Public IP Address

    Select Assign Public IPv4 Address.

    Note

    If you do not select Assign Public IPv4 Address, you can bind an elastic IP address (EIP) to the instance after the instance is created. For more information, see Associate or disassociate an EIP.

    Security Group

    Select a security group. You must enable TCP port 22 of the security group. If you need your instance to support HTTPS or Deep Learning GPU Training System (DIGITS) 6, you must enable TCP port 443 for HTTPS or TCP port 5000 for DIGITS 6.

  2. After the GPU-accelerated instance is created, log on to the ECS console to obtain the public IP address of the instance.

  3. Connect to the GPU-accelerated instance.

    You can use one of the following logon credentials that you selected when you created the instance to connect to the instance:

  4. Enter the NGC API Key that you obtained from the NGC website, and then press the Enter key to log on to the NGC container environment.

  5. Run the nvidia-smi command.

    You can view information about the GPU that the instance uses, such as the GPU model and the driver version. The following figure shows the information about the GPU.

  6. Build a TensorFlow deep learning framework.

    1. Log on to the NGC website. In the left-side navigation pane, click Containers.

    2. On the Containers page, enter TensorFlow in the search box and click the TensorFlow card.

      Containers
    3. On the TensorFlow page, click Copy Image Path to download the TensorFlow image of the version that you want to use.

      For example, if you want to download the tensorflow:18.03 image, you can obtain the directory nvcr.io/nvidia/tensorflow:18.03-py3 on the page.

    4. View the downloaded image.

      docker image ls                   
    5. Run the container to deploy the TensorFlow development environment.

      nvidia-docker run --rm -it nvcr.io/nvidia/tensorflow:18.03-py3              
  7. Run the following commands to test TensorFlow:

    python
    import tensorflow as tf
    hello = tf.constant('Hello, TensorFlow!')
    sess = tf.Session()
    sess.run(hello)

    If TensorFlow loads the GPU correctly, the result is as shown in the following figure.

  8. Run the following command to save the settings that you configured for the TensorFlow image.

    docker commit   -m "commit docker" CONTAINER_ID  nvcr.io/nvidia/tensorflow:18.03-py3
    # You can run the docker ps command to view the CONTAINER_ID. 
    Important

    Otherwise, the changes will be lost the next time you log on.