This topic describes how to use the NGC-based Real-time Acceleration Platform for Integrated Data Science (RAPIDS) libraries that are installed on a GPU-accelerated instance to accelerate tasks for data science and machine learning as well as improve the efficiency of computing resources.

Background information

RAPIDS is an open source suite of data processing and machine learning libraries developed by NVIDIA to enable GPU-acceleration for data science and machine learning. For more information about RAPIDS, visit the RAPIDS website.

NVIDIA GPU Cloud (NGC) is a deep learning ecosystem developed by NVIDIA to provide developers with free access to deep learning and machine learning software stacks to build corresponding development environments. The NGC website provides RAPIDS Docker images, which come pre-installed with development environments.

JupyterLab is an interactive development environment that makes it easy to browse, edit, and run code files on your servers.

Dask is a lightweight big data framework that can make parallel computing more efficient.

This topic provides sample code that is modified based on the NVIDIA RAPIDS Demo code and datasets and demonstrates how to use RAPIDS to accelerate an end-to-end task from the Extract-Transform-Load (ETL) phase to the Machine Learning (ML) Training phase on a GPU-accelerated instance. The RAPIDS cuDF library is used in the ETL phase and the XGBoost model is used in the ML Training phase. The sample code is based on the Dask framework and runs on a single machine.
Note To obtain the official RAPIDS Demo code of NVIDIA, visit Mortgage Demo.


If you do not use a RAPIDS pre-installed image to create a GPU-accelerated instance, perform the following steps to use RAPIDS to accelerate machine learning tasks:

Step 1: Obtain the NGC API key

  1. Create an account on the NGC registration page.
  2. Log on to the NGC website.
  3. In the upper-right corner, click the username and select Setup. On the Setup page, click Get API Key.
  4. On the API Key page, click Generate API Key.
  5. In the Generate a New API Key message, click Confirm.
    Note A new NGC API key overwrites the previous API key. Before you obtain a new API key, you must make sure that the previous API key is no longer needed.
  6. Copy the API key and save it to your local storage device.

Step 2: Obtain the RAPIDS image download command

Perform the following steps to obtain the download command of the RAPIDS image:

  1. Log on to the NGC website.
  2. On the page that appears, click the CONTAINERS tab and search for the RAPIDS image.
  3. Obtain the docker pull command.
    The sample code in this topic is based on the RAPIDS v0.8 image. Therefore, use the image that is tagged with v0.8 when you run the sample code. If you use another image, the corresponding commands may differ.
    1. On the RAPIDS page, click the Tags tab.
    2. Copy the tag information. 0.8-cuda10.0-runtime-ubuntu16.04-gcc5-py3.6 is copied in this example.
    3. Return to the top of the page, find the Pull Command section, and copy the displayed command. Paste the copied command to the text editor. Then, replace the image version with the tag information obtained in the preceding step and save the file.
      In this example, cuda9.2-runtime-ubuntu16.04 is replaced with 0.8-cuda10.0-runtime-ubuntu16.04-gcc5-py3.6.

      The saved docker pull command is used to download the RAPIDS image. For more information about how to download the RAPIDS image, see the Step 3: Deploy the RAPIDS environment section.

Step 3: Deploy the RAPIDS environment

Perform the following steps to deploy the RAPIDS environment:

  1. Create a GPU-accelerated instance. For more information, see Create an instance by using the wizard.
    Configure the following parameters:
    • Instance Type: RAPIDS applies only to GPU models that use the NVIDIA Pascal or a later architecture. Therefore, you must select an instance type that meets the GPU requirements. The following instance families are available: gn6i, gn6v, gn5, and gn5i. For more information, see Instance families. We recommend that you select an instance family that has larger video memory, such as gn6i, gn6v, or gn5. The GPU-accelerated instance that has a 16 GB video memory is used in this example.
    • Image: Select NVIDIA GPU Cloud Virtual Machine Image in the Image Marketplace dialog box.
    • Public IP Address: Select Assign Public IPv4 Address or attach an elastic IP address (EIP) after you create the GPU-accelerated instance. For more information, see Associate an EIP with an ECS instance.
    • Security Group: Select a security group for which the following ports are enabled:
      • TCP port 22, used for SSH logon
      • TCP port 8888, used to access JupyterLab
      • TCP port 8786 and TCP port 8787, used to access Dask
  2. Connect to the GPU-accelerated instance. For more information, see Connection methods.
  3. Enter the NGC API key and press the Enter key to log on to the NGC container.
  4. Optional:Run the nvidia-smi command to view GPU information, such as the GPU model and GPU driver version.
    We recommend that you check the GPU information to identify potential issues. For example, if an earlier NGC driver version is used, it may not be supported by the Docker image.
  5. Run the docker pull command to download the RAPIDS image.
    For more information about how to obtain the docker pull command, see the Step 2: Obtain the RAPIDS image download command section.
    docker pull
  6. Optional:View the downloaded image.
    We recommend that you view the Docker image information to ensure that the correct image is downloaded.
    docker images
  7. Run the NGC container to deploy the RAPIDS environment.
    docker run --runtime=nvidia \
            --rm -it \
            -p 8888:8888 \
            -p 8787:8787 \
            -p 8786:8786 \

Step 4: Run RAPIDS Demo

Perform the following steps to run RAPIDS Demo:

  1. Download the dataset and the Demo file on the GPU-accelerated instance.
    # Get apt source address and download demos.
    source_address=$(curl|head -n 1)
    cd /rapids
    wget $source_address/rapids_notebooks_v0.8.tar.gz
    tar -xzvf rapids_notebooks_v0.8.tar.gz
    cd /rapids/rapids_notebooks_v0.8/xgboost
    wget $source_address/data/mortgage/mortgage_2000_1gb.tgz
  2. Start JupyterLab on the GPU-accelerated instance.
    We recommend that you run the commands to start JupyterLab directly.
    # Run the following command to start JupyterLab and set the password.
    cd /rapids/rapids_notebooks_v0.8/xgboost
    jupyter-lab --allow-root --ip= --no-browser --NotebookApp.token='YOUR PASSWORD'
    # Exit JupyterLab.
    sh ../utils/
    • You can also run the sh ../utils/ script to start JupyterLab. However, you cannot set the logon password if you run the script.
    • You can also press Ctrl+C twice to exit JupyterLab.
  3. Open your browser and enter http://IP address of your GPU-accelerated instance:8888 in the address bar to access JupyterLab.
    Note We recommend that you use Google Chrome.
    If you set the logon password when you start JupyterLab, you are directed to a page to enter your password.
  4. Run the NoteBook code.
    A mortgage repayment task is used in this example. For more information, see the Code running section. The NoteBook code includes the following content:
    • xgboost_E2E.ipynb: an XGBoost Demo file. You can double-click this file to view its details, or click the Execute icon to run one cell at a time.
    • mortgage_2000_1gb.tgz: a file named mortgage_2000_1gb.tgz, which contains the mortgage repayment training data of year 2000. Files in the perf folder are split into files of 1 GB to maximize the usage of GPU video memory.

Code running

In this example, XGBoost is used to demonstrate the end-to-end code running process from data pre-processing to training the XGBoost data model. The process involves the following phases:
  • ETL: performed on the GPU-accelerated instance in most cases. Data is extracted, transformed, and then loaded onto the data warehouse.
  • Data Conversion: performed on the GPU-accelerated instance. Data processed in the ETL phase is converted into the DMatrix format so that it can be used by XGBoost to train the data model.
  • ML-Training: performed on the GPU-accelerated instance by default. XGBoost is used to train the gradient boosting decision tree (GBDT).

Perform the following steps to run the NoteBook code:

  1. Prepare the dataset.

    In this example, the shell script downloads the mortgage repayment training data of year 2000 by default, which is the mortgage_2000_1gb.tgz file.

    If you want to obtain more data for XGBoost model training, you can set the download_url parameter to specify the URL. For more information, visit Mortgage Data.

    The following figure shows an example.
  2. Set relevant parameters.
    Parameter Description
    start_year Specifies the start year from which the training data is selected. In the ETL phase, data generated between the start_year and end_year range is processed.
    end_year Specifies the end year from which the training data is selected. In the ETL phase, data generated between the start_year and end_year range is processed.
    train_with_gpu Specifies whether to use GPU for XGBoost model training. Default value: True.
    gpu_count Specifies the number of workers to be started. Default value: 1. You can set the parameter based on your actual needs but the value must be less than the number of GPUs in the GPU-accelerated instance.
    part_count Specifies the number of performance files used for data model training. Default value: 2 × gpu_count. If the value is too large, an insufficient memory error occurs in the Data Conversion phase and the error message is stored in the background of NoteBook.
    The following figure shows an example.
  3. Start Dask.

    The NoteBook code starts Dask Scheduler, and also starts workers based on the gpu_count value for ETL and data model training. After you start Dask, you can monitor tasks on the Dask Dashboard. For more information about how to start Dask, see the Dask Dashboard section.

    The following figure shows an example.
  4. Start the ETL phase.

    In this phase, tables are joined, grouped, aggregated, and sliced. The data format is DataFrame of the cuDF library, which is similar to Pandas DataFrame.

    The following figure shows an example.
  5. Start the Data Conversion phase.

    In this phase, DataFrame-format data is converted into the DMatrix-format data for XGBoost model training. Each worker processes one DMatrix object.

    The following figure shows an example.
  6. Start the ML Training phase.

    In this phase, data model training is started by dask-xgboost, which supports collaborative communication among Dask workers. At the bottom layer, dask-xgboost is also called to execute data model training.

    The following figure shows an example.

Dask Dashboard

Dask Dashboard supports task progress tracking, task performance problem identification, and fault debugging.

After Dask is started, you can enter http://IP address of your GPU-accelerated instance:8787/status in the address bar of your browser to go to the Dask Dashboard page.

Related functions

Operation Function name
Download a file. def download_file_from_url(url, filename):
Decompress a file. def decompress_file(filename, path):
Obtain the number of GPUs in the current machine. def get_gpu_nums():
Manage the GPU memory.
  • def initialize_rmm_pool():
  • def initialize_rmm_no_pool():
  • def run_dask_task(func, **kwargs):
Submit a Dask task.
  • def process_quarter_gpu(year=2000, quarter=1, perf_file=""):
  • def run_gpu_workflow(quarter=1, year=2000, perf_file="", **kwargs):
Use cuDF to load data from a CSV file.
  • def gpu_load_performance_csv(performance_path, **kwargs):
  • def gpu_load_acquisition_csv(acquisition_path, **kwargs):
  • def gpu_load_names(**kwargs):
Process and extract characteristics of data for training machine learning models.
  • def null_workaround(df, **kwargs):
  • def create_ever_features(gdf, **kwargs):
  • def join_ever_delinq_features(everdf_tmp, delinq_merge, **kwargs):
  • def create_joined_df(gdf, everdf, **kwargs):
  • def create_12_mon_features(joined_df, **kwargs):
  • def combine_joined_12_mon(joined_df, testdf, **kwargs):
  • def final_performance_delinquency(gdf, joined_df, **kwargs):
  • def join_perf_acq_gdfs(perf, acq, **kwargs):
  • def last_mile_cleaning(df, **kwargs):