This topic describes how to use the NGC-based Real-time Acceleration Platform for Integrated Data Science (RAPIDS) libraries that are installed on a GPU instance to accelerate tasks for data science and machine learning as well as improve the efficiency of computing resources.


RAPIDS is an open-source suite of data processing and machine learning libraries developed by NVIDIA that enables GPU-acceleration for data science workflows and machine learning. For more information about RAPIDS, visit the RAPIDS website.

NVIDIA GPU Cloud (NGC) is a deep learning ecosystem developed by NVIDIA to provide developers with free access to deep learning and machine learning software stacks to build corresponding development environments. The NGC website provides RAPIDS Docker images, which come pre-installed with development environments.

JupyterLab is an interactive development environment that makes it easy to browse, edit, and run code files on your servers.

Dask is a lightweight big data framework that can improve the efficiency of parallel computing.

This topic provides code that is modified based on the NVIDIA RAPIDS Demo code and dataset and demonstrates how to use RAPIDS to accelerate an end-to-end task from the Extract-Transform-Load (ETL) phase to the Machine Learning (ML) Training phase on a GPU instance. The RAPIDS cuDF library is used in the ETL phase whereas the XGBoost model is used in the ML Training phase. The example code is based on the Dask framework and runs on a single machine.
Note To obtain the official RAPIDS Demo code of NVIDIA, visit Mortgage Demo.


  • Register an Alibaba Cloud account and complete real-name verification. For more information, see Account Management FAQ and Real-name registration FAQ.
  • Register an NGC account in the NGC registration page.
  • Obtain an NGC API Key by performing the following steps:
    1. Log on to the NGC website.
    2. In the left-side navigation pane, click CONFIGURATION. On the Setup page that appears, click Get API Key.
    3. On the API Key page that appears, click Generate API Key.
    4. In the Generate a New API Key message that appears, click Confirm.
      Note A new NGC API Key overwrites any previous API key. Before you generate a new API Key, you must make sure that the previous API key is no longer needed.
    5. Copy the API Key to your local disk.


If you do not use a RAPIDS pre-installed image to create a GPU instance, perform the following steps to use RAPIDS to accelerate machine learning tasks:
  1. Obtain the RAPIDS image download command.
  2. Deploy the RAPIDS environment.
  3. Run RAPIDS Demo.

Step 1: Obtain the RAPIDS image download command

  1. Log on to the NGC website.
  2. On the page that appears, click the MACHINE LEARNING tab. On the MACHINE LEARNING tab that appears, click RAPIDS.
  3. Obtain the docker pull command.

    The example code in this topic is based on the RAPIDS v0.8 image. Note that if you use another image, the corresponding command may differ.

    1. On the RAPIDS page that appears, click the Tags tab.
    2. Copy the tag information as needed. For this example, copy 0.8-cuda10.0-runtime-ubuntu16.04-gcc5-py3.6.
    3. Return to the top of the page, find the Pull Command section, and copy the displayed command. Paste the copied command to the text editor. Then, replace the image version with the tag information obtained in the preceding step and save the TXT file. For this example, replace cuda9.2-runtime-ubuntu16.04 with 0.8-cuda10.0-runtime-ubuntu16.04-gcc5-py3.6.

      The saved docker pull command is used to download the RAPIDS image in Step 2: Deploy the RAPIDS environment.

Step 2: Deploy the RAPIDS environment

  1. Create a GPU instance.

    For more information about how to create a GPU instance, see Create an instance by using the provided wizard.

    • Instance Type: RAPIDS applies only to GPU models that use the NVIDIA Pascal or a later architecture. Therefore, you must select an instance type that meets the GPU requirements. The following instance types are available: gn6i, gn6v, and gn5i. For more information, see Instance families. We recommend that you select an instance type with more memory, such as gn6i, gn6v, or gn5. For this example, select the GPU instance that has a 16 GB memory.
    • Image: Select NVIDIA GPU Cloud Virtual Machine Image in the Image Marketplace dialog box.
    • Network Billing Method: Select Assign Public IP Address or attach an EIP Address after you create a GPU instance.
    • Security Group: Select a security group for which the following ports are enabled:
      • TCP port 22, used for SSH logon
      • TCP port 8888, used to access JupyterLab
      • TCP port 8786 and TCP port 8787, used to access Dask
  2. Connect to the GPU instance.

    For more information, see Methods to connect to a Linux instance.

  3. Enter the NGC API Key and press Enter to log on to the NGC container.
  4. Optional. Run the nvidia-smi command to view GPU information, such as GPU model and GPU driver version.

    We recommend that you check the GPU information to identify any potential issues. For example, if an earlier NGC driver version is used, it may not be supported by the target Docker image.

  5. Run the docker pull command obtained in Step 1: Obtain the RAPIDS image download command to download the RAPIDS image.
    docker pull
  6. Optional. Check the information of the downloaded image

    We recommend that you ensure that the correct image is downloaded.

    docker images
  7. Run the NGC container to deploy the RAPIDS environment.
    docker run --runtime=nvidia \
            --rm -it \
            -p 8888:8888 \
            -p 8787:8787 \
            -p 8786:8786 \

Step 3: Run RAPIDS Demo

  1. Download the dataset and the Demo file on a GPU instance.
    # Obtain the apt source address and download demos.
    source_address=$(curl|head -n 1)
    cd /rapids
    wget $source_address/rapids_notebooks_v0.8.tar.gz
    tar -xzvf rapids_notebooks_v0.8.tar.gz
    cd /rapids/rapids_notebooks_v0.8/xgboost
    wget $source_address/data/mortgage/mortgage_2000_1gb.tgz
  2. Start JupyterLab on the GPU instance by running the following commands:

    We recommend that you run these commands directly.

    # Run the following command to start JupyterLab and set the logon password:
    cd /rapids/rapids_notebooks_v0.8/xgboost
    jupyter-lab --allow-root --ip= --no-browser --NotebookApp.token='YOUR PASSWORD'
    # Exit JupyterLab.
    sh ../utils/
    • You can also run the sh ../utils/ script to start JupyterLab. However, you cannot set the logon password if you run the script.
    • You can also press Ctrl+C twice to exit.
  3. Open your browser and enter http://IP address of your GPU instance:8888 in the address bar to access JupyterLab.
    Note We recommend that you use Google Chrome.
    If you set the logon password when you start JupyterLab, you will be directed to a page to enter your password.
  4. Run the NoteBook code.

    A mortgage repayment task is used in this example. For more information, see Code running. The NoteBook code includes:

    • xgboost_E2E.ipynb: an XGBoost Demo file. You can double-click this file to view its details, or click the Execute icon to run one cell at a time.
    • mortgage_2000_1gb.tgz: a file named mortgage_2000_1gb.tgz, which contains the mortgage repayment training data of year 2000. Files in the perf folder are split into files of 1 GB to maximize the usage of GPU memory.

Code running

In this example, XGBoost is used to demonstrate the end-to-end code running process from data pre-processing to training the XGBoost data model. The process involves the following three phases:

  • Extract-Transform-Load (ETL): performed on the GPU instance in most cases. Data is extracted, transformed, and then loaded onto the data warehouse.
  • Data Conversion: performed on the GPU instance. Data processed in the ETL phase is converted into the DMatrix format so that it can be used by XGBoost to train the data model.
  • ML-Training: performed on the GPU instance by default. XGBoost is used to train the gradient boosting decision tree (GBDT).

The NoteBook code is run as follows:

  1. Prepare the dataset.

    In this example, the Shell script downloads the mortgage repayment training data of year 2000 mortgage_2000_1gb.tgz by default.

    If you want to obtain more data for XGBoost model training, you can set the download_url parameter to specify the URL as needed. For more information, visit Mortgage Data.

    The following figure shows an example.
  2. Set one or more parameters as needed.
    Parameter Description
    start_year Specify the start year from which training data is selected. In the ETL phase, data generated between start_year and end_year is processed.
    end_year Specify the end year from which training data is selected. In the ETL phase, data generated between start_year and end_year is processed.
    train_with_gpu Specify whether to use GPU for XGBoost model training. Default value: True.
    gpu_count Specify the number of workers to be started. Default value: 1. You can set the parameter to a value less than the number of GPUs in the GPU instance.
    part_count Specify the number of performance files used for data model training. Default value: 2 * gpu_count. If the value is too large, an insufficient memory error occurs in the Data Conversion phase and the error message is stored in log files.
    The following figure shows an example.
  3. Start Dask.

    The NoteBook code starts Dask Scheduler, and also starts workers based on the setting of gpu_count for ETL and data model training. After you start Dask, you can monitor tasks on the Dask Dashboard. For more information about how to start Dask, see Dask Dashboard.

    The following figure shows an example.
  4. Start the ETL phase.

    In this phase, tables are associated, grouped, integrated, and split. The data format is DataFrame of the cuDF library, which is similar to Pandas DataFrame.

    The following figure shows an example.
  5. Start the Data Conversion phase.

    In this phase, DataFrame-format data is converted into the DMatrix-format data for XGBoost model training. Each worker processes one DMatrix object.

    The following figure shows an example.
  6. Start the ML Training phase.

    In this phase, data model training is started by dask-xgboost, which supports collaborative communication among Dask workers. At the bottom layer, dask-xgboost is also called to execute data model training.

    The following figure shows an example.

Dask Dashboard

Dask Dashboard supports task progress tracking, task performance problem identification, and fault debugging.

After Dask is started, you can enter http://IP address of your GPU instance:8787/status in the address bar of your browser to go to the Dask Dashboard.

Related functions

Operation Function
Download a file. def download_file_from_url(url, filename):
Decompress a file. def decompress_file(filename, path):
Obtain the number of GPUs in the current machine. def get_gpu_nums():
Manage the GPU memory.
  • def initialize_rmm_pool():
  • def initialize_rmm_no_pool():
  • def run_dask_task(func, **kwargs):
Submit a Dask task.
  • def process_quarter_gpu(year=2000, quarter=1, perf_file=""):
  • def run_gpu_workflow(quarter=1, year=2000, perf_file="", **kwargs):
Use cuDF to load data from a CSV file.
  • def gpu_load_performance_csv(performance_path, **kwargs):
  • def gpu_load_acquisition_csv(acquisition_path, **kwargs):
  • def gpu_load_names(**kwargs):
Process and extract characteristics of data for training machine learning models.
  • def null_workaround(df, **kwargs):
  • def create_ever_features(gdf, **kwargs):
  • def join_ever_delinq_features(everdf_tmp, delinq_merge, **kwargs):
  • def create_joined_df(gdf, everdf, **kwargs):
  • def create_12_mon_features(joined_df, **kwargs):
  • def combine_joined_12_mon(joined_df, testdf, **kwargs):
  • def final_performance_delinquency(gdf, joined_df, **kwargs):
  • def join_perf_acq_gdfs(perf, acq, **kwargs):
  • def last_mile_cleaning(df, **kwargs):