This topic describes how to use the NGC-based Real-time Acceleration Platform for Integrated Data Science (RAPIDS) libraries that are installed on a GPU-accelerated instance to accelerate tasks for data science and machine learning as well as improve the efficiency of computing resources.
Background information
RAPIDS is an open source suite of data processing and machine learning libraries developed by NVIDIA to enable GPU-acceleration for data science and machine learning. For more information about RAPIDS, visit the RAPIDS website.
NVIDIA GPU Cloud (NGC) is a deep learning ecosystem developed by NVIDIA to provide developers with free access to deep learning and machine learning software stacks to build corresponding development environments. The NGC website provides RAPIDS Docker images, which come pre-installed with development environments.
JupyterLab is an interactive development environment that makes it easy to browse, edit, and run code files on your servers.
Dask is a lightweight big data framework that can make parallel computing more efficient.
Procedure
Step 1: Obtain the NGC API key
Step 2: Obtain the RAPIDS image download command
Perform the following steps to obtain the download command of the RAPIDS image:
Step 3: Deploy the RAPIDS environment
Perform the following steps to deploy the RAPIDS environment:
Step 4: Run RAPIDS Demo
Perform the following steps to run RAPIDS Demo:
Code running
- ETL: performed on the GPU-accelerated instance in most cases. Data is extracted, transformed, and then loaded onto the data warehouse.
- Data Conversion: performed on the GPU-accelerated instance. Data processed in the ETL phase is converted into the DMatrix format so that it can be used by XGBoost to train the data model.
- ML-Training: performed on the GPU-accelerated instance by default. XGBoost is used to train the gradient boosting decision tree (GBDT).
Perform the following steps to run the NoteBook code:
Dask Dashboard
Dask Dashboard supports task progress tracking, task performance problem identification, and fault debugging.
http://IP address of your GPU-accelerated instance:8787/status
in the address bar of your browser to go to the Dask Dashboard page.
Related functions
Operation | Function name |
---|---|
Download a file. | def download_file_from_url(url, filename): |
Decompress a file. | def decompress_file(filename, path): |
Obtain the number of GPUs in the current machine. | def get_gpu_nums(): |
Manage the GPU memory. |
|
Submit a Dask task. |
|
Use cuDF to load data from a CSV file. |
|
Process and extract characteristics of data for training machine learning models. |
|