This topic describes how to use RAPIDS libraries (based on the NGC environment) that are installed on a GPU instance to accelerate tasks for data science and machine learning and improve the efficiency of computing resources.

Background information

The following concepts are used in the example provided in this topic:

  • Real-time Acceleration Platform for Integrated Data Science (RAPIDS) is a software suite of GPU acceleration libraries developed by NVIDIA for data science and machine learning. For more information, visit RAPIDS website.
  • NVIDIA GPU Cloud (NGC) is a deep learning ecosystem developed by NVIDIA to provide developers with free access to deep learning and machine learning software stacks that allows them to quickly build corresponding environments. The NGC website provides RAPIDS Docker images, which come with pre-installed environments.
  • JupyterLab is an interactive development environment that helps you browse, edit, and run code files on your servers.
  • Dask is a lightweight big data frame that can improve the efficiency of parallel computing.
  • In the example provided in this topic, modified code that is based on the NVIDIA RAPIDS Demo and corresponding dataset is provided to demonstrate how to use RAPIDS to accelerate an end-to-end task from ETL to ML Training on a GPU instance. The cuDF library of RAPIDS is used in the Extract-Transform-Load (ETL) phase whereas the XGBoost model is used in the ML Training phase. The example code is based on the Dask frame and runs on a single machine.
Note To obtain the official RAPIDS Demo code of NVIDIA, see Mortgage Demo.


  • Register an Alibaba Cloud account and complete the real-name verification. For more information, see Account management FAQs and Real-name registration FAQs.
  • Go to the NGC registration page and register an account.
  • Obtain an NGC API Key by following these steps:
    1. Log on to the NGC website.
    2. Go to the CONFIGURATION page, and then click Get API Key.
    3. Click Generate API Key.
    4. In the displayed dialog box, click Confirm.
      Note A new NGC API Key overwrites any previous API key. Before you generate a new API Key, you must make sure that the previous API key is no longer being used.
    5. Copy the API Key to your local disk.

Procedure 1: Obtain the RAPIDS image download command

  1. Log on to the NGC website.
  2. Go to the MACHINE LEARNING page, and then click the RAPIDS image.

  3. Obtain the docker pull command.

    The example code in this topic is based on the RAPIDS v0.6 image. Note that if you use another image, the corresponding command may differ.

    1. Click the Tags tab.

    2. Locate and copy the Tag information. In this example, select 0.6-cuda10.0-runtime-ubuntu16.04-gcc5-py3.6. Then, open a text editor and paste the Tag information.

    3. Find the Pull Command area and copy the displayed command. Then, paste the command to the text editor. After that, replace the image version with the Tag information from the preceding step, and save the TXT file. In this example, replace cuda9.2-runtime-ubuntu16.04 with 0.6-cuda10.0-runtime-ubuntu16.04-gcc5-py3.6.

Procedure 2: Deploy the RAPIDS environment

  1. Create a GPU instance.

    For more information, see Create an instance by using the wizard.

    • Instance Type: RAPIDS can only be deployed on GPU instances that use NVIDIA Pascal or a later architecture. Currently, you can select the following instance types: gn6i, gn6v, gn5, and gn5i. For more information, see Instance type families. We recommend that you select an instance type that has a larger memory, such as gn6i, gn6v, and gn5. In this example, select the GPU instance that has a 16 GB memory.
    • Image: In this example, use the NVIDIA GPU Cloud VM Image.

    • Network Billing Method: Select Assign public IP, or Attach an ENI after you create a GPU instance.
    • Security Group: Select a security group that enables the following ports:
      • TCP port 22, which is used to enable logon through SSH
      • TCP port 8888, which is used to access JupyterLab
      • TCP port 8786 and TCP port 8787, which are used to access Dask
  2. Connect to the GPU instance.

    For more information, see Connect to a Linux instance.

  3. Enter the NGC API Key and press Enter to log on to the NGC container.

  4. Optional. Run the nvidia-smi command to view GPU information, such as GPU model and GPU driver version.

    We recommend that you check the GPU information to identify any potential issues. For example, if an earlier NGC driver version is used, it may not be supported by the target Docker image.

  5. Run the docker pull command obtained in Procedure 1: Obtain the RAPIDS image download command to download the RAPIDS image.
    docker pull
  6. Optional. Check the information of the downloaded image to ensure that the correct image is downloaded.
    docker images
  7. Run the NGC container to deploy the RAPIDS environment.
    docker run --runtime=nvidia \
            --rm -it \
            -p 8888:8888 \
            -p 8787:8787 \
            -p 8786:8786 \

Procedure 3: Run RAPIDS Demo

  1. On the GPU instance, download the dataset and the Demo file.
    # Obtain the apt source address and download the script (used to download training data, notebook, and utils).
    $ source_address=$(curl|head -n 1)
    $ source_address="${source_address}/opsx/ecs/linux/binary/machine_learning/"
    $ wget $source_address/rapids_notebooks_v0.6/utils/
    # Run the downloaded script.
    $ sh ./
    # Go to the download directory to view the downloaded file.
    $ apt update
    $ apt install tree
    $ tree /rapids/rapids_notebooks_v0.6/
    We recommend that you check the downloaded file contains five folders and 16 files.

  2. Start JupyterLab on the GPU instance by running the following commands:
    # Go to the working directory.
    $ cd /rapids/rapids_notebooks_v0.6/xgboost
    # Run the following command to start JupyterLab and set the logon password:
    $ jupyter-lab --allow-root --ip= --no-browser --NotebookApp.token='logon password'
    # Exit.
    $ sh ../utils/

    You can also run the $ sh ../utils/ script to start JupyterLab. However you cannot set the logon password if you run the script. To exit, press Ctrl+C twice.

  3. Open your browser and enter http://IP address of your GPU instance:8888 to access JupyterLab. If a password for JupyterLab is set, you need to enter the password as prompted.
    Note We recommend that you use Google Chrome.
    If you set the logon password when you start JupyterLab, you will be prompted to enter your password.

  4. Run the NoteBook code.

    Log on to JupyterLab and view the NoteBook code. A mortgage repayment task is used in this example. For more information, see Code running process. Details of the example code include:

    • A folder named mortgage_2000_1gb that contains decompressed training data. Specifically, this folder contains the acq folder, perf folder and names.csv file.
    • A file named xgboost_E2E.ipynb, which is an XGBoost Demo file. You can double-click this file to view its details, or click the Execute button to execute one cell at a time.

    • A file named mortgage_2000_1gb.tgz, which contains the mortgage repayment training data of the year 2000 (files in the perf folder are split into 1 GB sizes, namely, each file is no larger than 1 GB. This method helps utilize the GPU memory more efficiently).

Code running process

In this example, XGBoost is used to demonstrate the end-to-end code running process from data pre-processing to training the XGBoost data model. The process involves the following three phases:

  • ETL (Extract-Transform-Load), which is completed on the GPU instance to extract and transform the data and then load the data to the data warehouse.
  • Data Conversion, which is completed on the GPU instance to convert the data processed in the ETL phase into DMatrix-format data so that XGBoost can be used to train the data model.
  • ML-Training, which is completed on the GPU instance by default so that the XGBoost training gradient boosting decision tree (GBDT) is used.

The NoteBook code is run as follows:

  1. Prepare the dataset.

    In this example, the Shell script downloads the mortgage repayment training data (mortgage_2000_1gb.tgz) and decompress the data to the mortgage_2000_1gb file.

    If you want to obtain more data for XGBoost model training, you can set the download_url parameter to specify the required URL. For more information, see Mortgage Data.

    The following figure shows an example.

  2. Set one or more parameters as needed.
    Parameter Description
    start_year Specify the start year from which training data is selected. In the ETL phase, data generated between the start_year and the end_year is processed.
    end_year Specify the end year from which training data is selected. In the ETL phase, data generated between the start_year and end_year is processed.
    train_with_gpu Set whether to use GPU for XGBoost model training. Default value: True.
    gpu_count Specify the number of workers to be started. Default value: 1. You can set the parameter to a value that is less than the number of GPUs in the GPU instance.
    part_count Specify the number of performance files used for data model training. Default value: 2 × gpu_count. If the value is too large, an insufficient memory error occurs in Data Conversion phase and an error message is displayed on the backend of NoteBook.

    The following figure shows an example.

  3. Start Dask.

    The NoteBook code starts Dask Scheduler, and also starts workers according to the setting of the gpu_count parameter for ETL and data model training.

    The following figure shows an example.

  4. Start the ETL phase.

    In this phase, tables are associated, grouped, integrated, and split. The data format is DataFrame of the cuDF library (similar to DataFrame of pandas).

    The following figure shows an example.

  5. Start the Data Conversion phase.

    In this phase, DataFrame-format data is converted into DMatrix-format data for XGBoost model training. Each worker processes one DMatrix object.

    The following figure shows an example.

  6. Start the ML Training phase.

    In this phase, data model training is started by dask-xgboost, which supports collaborative communication among Dask workers. On the bottom layer, dask-xgboost is also called to execute data model training.

    The following figure shows an example.

Related functions

Operation Function name
Download a file. def download_file_from_url(url, filename):
Decompress a file. def decompress_file(filename, path):
Obtain the number of GPUs in the current machine. def get_gpu_nums():
Manage the GPU memory.
  • def initialize_rmm_pool():
  • def initialize_rmm_no_pool():
  • def run_dask_task(func, **kwargs):
Submit a Dask task.
  • def process_quarter_gpu(year=2000, quarter=1, perf_file=""):
  • def run_gpu_workflow(quarter=1, year=2000, perf_file="", **kwargs):
Use cuDF to load data from a CSV file.
  • def gpu_load_performance_csv(performance_path, **kwargs):
  • def gpu_load_acquisition_csv(acquisition_path, **kwargs):
  • def gpu_load_names(**kwargs):
Process and extract characteristics of data for training machine learning models.
  • def null_workaround(df, **kwargs):
  • def create_ever_features(gdf, **kwargs):
  • def join_ever_delinq_features(everdf_tmp, delinq_merge, **kwargs):
  • def create_joined_df(gdf, everdf, **kwargs):
  • def create_12_mon_features(joined_df, **kwargs):
  • def combine_joined_12_mon(joined_df, testdf, **kwargs):
  • def final_performance_delinquency(gdf, joined_df, **kwargs):
  • def join_perf_acq_gdfs(perf, acq, **kwargs):
  • def last_mile_cleaning(df, **kwargs):