This topic describes how to install and use FastGPU to build training tasks by using 64-bit Ubuntu 18.04.
Prerequisites
Note To build AI computing tasks, you can use an ECS instance, an on-premises machine,
or Alibaba Cloud Shell as a client to install FastGPU.
Background information
FastGPU contains the following components:
- The runtime component ncluster: provides interfaces to deploy offline AI training and inference scripts to Alibaba Cloud IaaS resources. For more information about the runtime component, see Description of the component ncluster during the runtime of the component.
- The command line-based component ecluster: provides command line-based tools to manage the status of Alibaba Cloud AI computing tasks and the lifecycle of clusters. For more information about the command line-based component, see Description of ecluster.
Install FastGPU
- Download the FastGPU package to the client.
wget https://ali-perseus-release.oss-cn-huhehaote.aliyuncs.com/fastgpu/ncluster-1.0.8-py3-none-any.whl
- Install FastGPU.
pip install ncluster-1.0.8-py3-none-any.whl
Run the FastGPU demo
FastGPU provides the following training scenario demos. You can go to GitHub to download them.
- GTC-demo: the gesture recognition training by using PyTorch.
- InsightFace: the facial recognition training of MXNet.
- Bert: the speech recognition training of TensorFlow.
The following operations use the BERT model to show how to use FastGPU in Cloud Shell. The instance automatically created in the demo is of the ecs.gn6v-c10g1.20xlarge instance type that has eight V100 GPUs. The task deployment time is about 2.5 minutes, and the training duration is 11.5 minutes. Therefore, the total time is 14 minutes. The training precision is above 0.88.