Use RAPIDS to accelerate a machine learning task on a GPU-accelerated instance - Elastic GPU Service

This topic describes how to use Real-time Acceleration Platform for Integrated Data Science (RAPIDS) libraries on a GPU-accelerated instance in an NVIDIA GPU Cloud (NGC) environment. This method can help you accelerate data science and machine learning (ML) tasks and maximize the usage of computing resources.

Background information

Real-time Acceleration Platform for Integrated Data Science (RAPIDS) is developed by NVIDIA for data science and machine learning applications. It provides a suite of software libraries that you can use to take full advantage of your GPU's capabilities. For more information, see the official RAPIDS website.
NGC is a deep learning ecosystem developed by NVIDIA to provide developers with free access to deep learning and ML software stacks, with the hope of helping developers quickly set-up and run their own development environments. For more information, see the official NVIDIA NGC AI Development Catalog. There, you can find Docker images set up with RAPIDS for a variety of development environments.
JupyterLab is an interactive development environment that lets you browse, edit, and run code files on servers in an efficient manner.
Dask is a lightweight big data framework that can improve the efficiency of parallel computing.

This topic provides a suite of sample code that is modified based on the official NVIDIA RAPIDS demo code and datasets. This topic also demonstrates how to use RAPIDS to accelerate an end-to-end task from the extract, transform, and load (ETL) phase to the ML Training phase on a GPU-accelerated instance. The RAPIDS cuDF library is used in the ETL phase and the XGBoost library is used in the ML Training phase. The sample code is run on a single machine based on Dask.

Note

For more information about the NVIDIA RAPIDS demo code, see Mortgage demo code.

Procedure

If you do not use a RAPIDS docker image that is pre-installed with development environments to create a GPU-accelerated instance, perform the following steps to use RAPIDS to accelerate the ML task:

Step 1: Obtain the NGC API key
Step 2: Obtain the download command of the RAPIDS image
Step 3: Deploy the RAPIDS environment
Step 4: Run the RAPIDS demo code

Step 1: Obtain the NGC API key

On the NGC registration page, create an NGC account.
Log on to the NGC website.
In the upper-right corner of the page that appears, click the username. Then, click Setup.
In the Generate API Key section of the Setup page, click Get API Key.
On the API Key page, click Generate API Key.
In the Generate a New API Key message, click Confirm.
Note
A new NGC API key overwrites the previous API key. Before you obtain a new API key, make sure that the previous API key is no longer needed.
Copy the new API key and save it to your on-premises machine.

Step 2: Obtain the download command of the RAPIDS image

To obtain the download command, perform the following operations:

Log on to the NGC website.
In the left-side navigation pane, choose CATALOG > Explore Catalog.
On the NVIDIA NGC: AI Development Catalog page, enter RAPIDS in the search box.
On the page that appears, click the RAPIDS image.
Obtain the docker pull command.
In this topic, the sample code is written based on the RAPIDS image whose version is 0.8. To run the sample code, you must use the image whose tag version is 0.8. You can select the tag version based on your image version.
1. On the RAPIDS page, click the Tags tab.
2. Find the tag and copy the tag information. In this example, 0.8-cuda10.0-runtime-ubuntu16.04-gcc5-py3.6 is copied.
3. Return to the top of the RAPIDS page. In the Pull Command section, copy the docker pull command. In your text editor, paste the command, replace the image version information with the tag information, and then save the new command.
  In this example, cuda9.2-runtime-ubuntu16.04 is replaced with 0.8-cuda10.0-runtime-ubuntu16.04-gcc5-py3.6.
  You can use the new docker pull command to download the RAPIDS image. For more information about how to download the RAPIDS image, see Step 3: Deploy the RAPIDS environment.

Step 3: Deploy the RAPIDS environment

To deploy the RAPIDS environment, perform the following operations:

Create a GPU-accelerated instance. For more information, see Create an instance by using the wizard.
Configure the following parameters:
- Instance Type: RAPIDS applies only to GPU models powered by the NVIDIA Pascal architecture or a more advanced architecture. Therefore, you must select an instance type that is equipped with one of such GPU models. The instance types of the following instance families are available: gn6i, gn6v, gn5, and gn5i. For more information about the GPU models, see Overview of instance families. We recommend that you select an instance family that is equipped with larger GPU memory, such as gn6i, gn6v, or gn5. In this example, a GPU-accelerated instance equipped with 16 GB GPU memory is used.
- Image: In the Image Marketplace dialog box, search NVIDIA GPU Cloud VM Image.
- Public IP Address: Select Assign Public IPv4 Address. Alternatively, associate an elastic IP address (EIP) after the GPU-accelerated instance is created. For more information, see Associate an EIP with an ECS instance in the EIP documentation.
- Security Group: Select a security group for which the following ports are enabled:
  - TCP port 22: used for SSH logon.
  - TCP port 8888: used to access JupyterLab.
  - TCP port 8787 and TCP port 8786: used to access Dask.
Connect to the GPU-accelerated instance. For more information, see Connection methods.
Enter the NGC API key and press the Enter key to log on to the NGC container.
Optional:Run the nvidia-smi command to view GPU information, such as the GPU model and GPU driver version.
We recommend that you check the GPU information to identify potential issues. For example, if you use an earlier NGC driver version, the new docker image may not support this version.
Run the new docker pull command to download the RAPIDS image.
For more information about how to obtain the command, see Step 2: Obtain the download command of the RAPIDS image.
```
docker pull nvcr.io/nvidia/rapidsai/rapidsai:0.8-cuda10.0-runtime-ubuntu16.04-gcc5-py3.6
```
Optional:View the downloaded RAPIDS image.
We recommend that you check the information about the RAPIDS docker image to ensure that the required image is downloaded.
```
docker images
```

Run the NGC container to deploy the RAPIDS environment.

docker run --runtime=nvidia \
        --rm -it \
        -p 8888:8888 \
        -p 8787:8787 \
        -p 8786:8786 \
        nvcr.io/nvidia/rapidsai/rapidsai:0.8-cuda10.0-runtime-ubuntu16.04-gcc5-py3.6

Step 4: Run the RAPIDS demo code

To run the RAPIDS demo code, perform the following operations:

Download the datasets and the demo files from the GPU-accelerated instance.

# Get apt source address and download demos.
source_address=$(curl http://100.100.100.200/latest/meta-data/source-address|head -n 1)
source_address="${source_address}/opsx/ecs/linux/binary/machine_learning/"
cd /rapids
wget $source_address/rapids_notebooks_v0.8.tar.gz
tar -xzvf rapids_notebooks_v0.8.tar.gz
cd /rapids/rapids_notebooks_v0.8/xgboost
wget $source_address/data/mortgage/mortgage_2000_1gb.tgz

Start JupyterLab on the GPU-accelerated instance.
We recommend that you run the following command to start JupyterLab.
Note
You need to log on as the root user to start JupyterLab on the GPU-accelerated instance by running commands.
```
# Run the following command to start JupyterLab and set the password.
cd /rapids/rapids_notebooks_v0.8/xgboost
jupyter-lab --allow-root --ip=0.0.0.0 --no-browser --NotebookApp.token='YOUR LOGON PASSWORD'
# Exit JupyterLab.
sh ../utils/stop-jupyter.sh
```
- If you do not want to run the preceding command to start JupyterLab, you can run the sh ../utils/start-jupyter.sh script to start JupyterLab. However, the script does not allow you to specify the logon password.
- If you do not want to run the preceding command to exit JupyterLab, you can press Ctrl+C twice to exit JupyterLab.
Open your browser and enter http://<IP address of your GPU-accelerated instance>:8888 in the address bar to access JupyterLab.
If you specified a logon password when you started JupyterLab, you are redirected to a page on which you can enter your password.
Run the NoteBook code.
In this example, a mortgage repayment task is used. For more information, see Code running. After you log on to JupyterLab, you can view the NoteBook code that contains the following files:
- xgboost_E2E.ipynb: an XGBoost demo file. You can double-click the file to view the details, and click the triangle icon to run one cell at a time.
- mortgage_2000_1gb.tgz: The file contains the training data of the mortgage performance of single family loans in the year 2000. The files in the perf folder are split into files of 1 GB to maximize the usage of GPU memory.

Code running

In this example, XGBoost is used to demonstrate the end-to-end code running process from data pre-processing to training. The process involves the following phases:

ETL: performed on the GPU-accelerated instance in most cases. Data is extracted, cleansed, transformed, and then loaded to a data warehouse.
Data Conversion: performed on the GPU-accelerated instance. Data processed in the ETL phase is converted into the DMatrix format. This way, the data can be trained by XGBoost.
ML Training: performed on the GPU-accelerated instance by default. XGBoost is used to train the gradient boosting decision tree (GBDT).

To run the NoteBook code, perform the following operations:

Prepare the dataset.
In this example, the mortgage_2000_1gb.tgz file is downloaded by the shell script by default. The file contains the training data of the mortgage performance of single family loans in the year 2000.

Configure relevant parameters. The following table describes the parameters.

Parameter	Description
start_year	Specifies the start year from which the system selects training data. In the ETL phase, the data generated between the time range that is specified by the start_year and end_year parameters is processed.
end_year	Specifies the end year from which the system selects training data. In the ETL phase, the data generated between the time range that is specified by the start_year and end_year parameters is processed.
train_with_gpu	Specifies whether to use GPUs for XGBoost model training. Default value: True.
gpu_count	Specifies the number of workers that you want to start. Default value: 1. You can specify the parameter based on your business requirements. However, the value must be less than the number of GPUs on the GPU-accelerated instance.
part_count	Specifies the number of performance files used for model training. Default value: 2 × the value of the gpu_count parameter. If the value exceeds the upper limit, an insufficient memory error occurs in the Data Conversion phase and the error message is returned to the background of NoteBook.

The following figure shows an example.

Start Dask.
The NoteBook code starts Dask Scheduler, and starts workers based on the value of the gpu_count parameter for the ETL phase and model training. After Dask is started, you can monitor the task on the Dask dashboard. For more information about how to go to the Dask dashboard page, see Dask Dashboard.
The following figure shows an example.
Start the ETL phase.
In this phase, the system joins, groups, aggregates, and slices tables. The data is in the DataFrame format of the cuDF library. This format is similar to the Pandas DataFrame format.
The following figure shows an example.
Start the Data Conversion phase.
In this phase, the system converts the DataFrame data into the DMatrix data. This way, the data can be trained by XGBoost. Each worker processes one DMatrix object.
The following figure shows an example.
Start the ML Training phase.
In this phase, the model training is started by dask-xgboost that supports collaborative communication among Dask workers. At the underlying layer, XGBoost is used to train the model.
The following figure shows an example.

Dask Dashboard

You can use the Dask dashboard to track progress, identify performance issues, and debug faults of tasks.

After Dask is started, you can enter http://<IP address of your GPU-accelerated instance>:8787/status in the address bar of your browser to access the Dask dashboard page.

Related functions

Feature	Function
Download a file.	def download_file_from_url(url, filename):
Decompress a file.	def decompress_file(filename, path):
Obtain the number of GPUs on the GPU-accelerated instance.	def get_gpu_nums():
Manage the GPU memory.	def initialize_rmm_pool(): def initialize_rmm_no_pool(): def run_dask_task(func, **kwargs):
Submit a Dask task.	def process_quarter_gpu(year=2000, quarter=1, perf_file=""): def run_gpu_workflow(quarter=1, year=2000, perf_file="", **kwargs):
Use cuDF to load data from a CSV file.	def gpu_load_performance_csv(performance_path, kwargs): def gpu_load_acquisition_csv(acquisition_path, kwargs): def gpu_load_names(**kwargs):
Process and extract the characteristics of training data.	def null_workaround(df, kwargs): def create_ever_features(gdf, kwargs): def join_ever_delinq_features(everdf_tmp, delinq_merge, kwargs): def create_joined_df(gdf, everdf, kwargs): def create_12_mon_features(joined_df, kwargs): def combine_joined_12_mon(joined_df, testdf, kwargs): def final_performance_delinquency(gdf, joined_df, kwargs): def join_perf_acq_gdfs(perf, acq, kwargs): def last_mile_cleaning(df, **kwargs):