preparations for creating a training job - Platform For AI

This topic describes what you need to prepare before submitting training jobs, including compute resources, an image, a dataset, and a code build. Platform for AI (PAI) allows you to specify datasets stored in Apsara File Storage NAS (NAS) file systems, Cloud Parallel File Storage (CPFS) file systems, or Object Storage Service (OSS) buckets and code builds stored in Git repositories.

Prerequisites

If you use OSS to store data, make sure that the role that you use is granted the permissions to access OSS. Otherwise, I/O errors may occur when the system accesses the data stored in your OSS bucket. For more information about how to grant a service-linked role the permissions to access OSS, see Grant the permissions that are required to use DLC.

Limits

OSS is a distributed object storage service instead of a file system. When you use OSS to store data, some file system features are not supported. For example, you cannot append data to or overwrite existing files in OSS buckets.

Step 1: Prepare resources

Before you submit a training job, you need to prepare computing resources for the training. Select one of the following resources:

The public resource group
After you complete Deep Learning Containers (DLC) authorization, the system automatically prepares a public resource group for you. You do not need to manually create a resource group. For more information, see Grant the permissions that are required to use DLC. You can select the public resource group when you configure a job on the Create Job page in your workspace.
General computing resources
You can create a dedicated resource group, purchase the required general computing resources, and allocate computing resources in the dedicated resource group by creating resource quotas and associating them with workspaces. After you associate a resource quota with a workspace, you can use the resource quota to run training jobs in the workspace. For more information, see Resource quota for general computing resources.
Intelligent computing LINGJUN resources
If you want to leverage the high performance offered by LINGJUN resources, you need to prepare the intelligent computing LINGJUN resources for the training jobs and associate the resources with the workspace. For more information, see Resource quota for intelligent computing LINGJUN resources.

Step 2: Prepare an image

Before you submit a training job, you need to prepare the image for the training environment. Select one of the following image types:

Community image: If you use a general development environment, you can select a public standard image from open source communities without further configuration.
Alibaba Cloud image: PAI provides official images based on different frameworks that are optimized for Alibaba Cloud services. These images are suitable for trainings that use Alibaba Cloud services and help you achieve improved compatibility and performance.
Custom image: If you have specific requirements on training environments or dependencies, you can create a custom image to meet your business requirements.

The following table lists the available community images and Alibaba Cloud images when you submit a distributed training job.

Type	Framework	Image
Community image	TensorFlow	tensorflow-training:2.3-cpu-py36-ubuntu18.04
		tensorflow-training:2.3-gpu-py36-cu101-ubuntu18.04
		tensorflow-training:1.15-cpu-py36-ubuntu18.04
		tensorflow-training:1.15-gpu-py36-cu100-ubuntu18.04
	PyTorch	pytorch-training:1.6.0-gpu-py37-cu101-ubuntu18.04
	PyTorch	pytorch-training:1.7.1-gpu-py37-cu110-ubuntu18.04
Alibaba Cloud image	TensorFlow	tensorflow-training:1.12.2PAI-cpu-py27-ubuntu16.04
		tensorflow-training:1.12.2PAI-mkl-cpu-py27-ubuntu16.04
		tensorflow-training:1.12.2PAI-gpu-py27-cu100-ubuntu16.04
		tensorflow-training:1.12.2PAI-cpu-py36-ubuntu16.04
		tensorflow-training:1.12.2PAI-mkl-cpu-py36-ubuntu16.04
		tensorflow-training:1.12.2PAI-gpu-py36-cu100-ubuntu16.04
		tensorflow-training:1.15.0PAI-gpu-py27-cu100-ubuntu16.04
		tensorflow-training:1.15.0PAI-gpu-py36-cu100-ubuntu16.04
	PyTorch	pytorch-training:1.3.1PAI-gpu-py37-cu100-ubuntu16.04
		pytorch-training:1.4.0PAI-gpu-py37-cu100-ubuntu16.04
		pytorch-training:1.5.1PAI-gpu-py37-cu100-ubuntu16.04
		pytorch-training:1.6.0PAI-gpu-py37-cu100-ubuntu16.04

Community image

Images

Standard images provided by the community. They support resources of various types. Click to view the details of the image files.

registry.${region}.aliyuncs.com/pai-dlc/pytorch-training:1.6.0-gpu-py37-cu101-ubuntu18.04
registry.${region}.aliyuncs.com/pai-dlc/pytorch-training:1.7.1-gpu-py37-cu110-ubuntu18.04
registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:2.3.0-cpu-py36-ubuntu18.04
registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:2.3.0-gpu-py36-cu101-ubuntu18.04
registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.15.4-cpu-py36-ubuntu18.04
registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.15.4-gpu-py36-cu100-ubuntu18.04

Replace ${region} with a specific region. Example values:

cn-hangzhou
cn-shanghai
cn-qingdao
cn-beijing
cn-zhangjiakou
cn-huhehaote
cn-shenzhen
cn-chengdu
cn-hongkong
ap-southeast-1

The following table lists the URLs of the community images when ${region} is set to cn-hangzhou.

${region}	Framework	CPU/GPU	Python version	Image URL
cn-hangzhou	Tensorflow 2.3	CPU	3.6 (py36)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:2.3.0-cpu-py36-ubuntu18.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:2.3.0-cpu-py36-ubuntu18.04
	Tensorflow 2.3	GPU	3.6 (py36)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:2.3.0-gpu-py36-cu101-ubuntu18.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:2.3.0-gpu-py36-cu101-ubuntu18.04
	Tensorflow 1.15	CPU	3.6 (py36)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.4-cpu-py36-ubuntu18.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.4-cpu-py36-ubuntu18.04
	Tensorflow 1.15	GPU	3.6 (py36)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.4-gpu-py36-cu100-ubuntu18.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.4-gpu-py36-cu100-ubuntu18.04
	PyTorch 1.6	GPU	3.7 (py37)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.6.0-gpu-py37-cu101-ubuntu18.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.6.0-gpu-py37-cu101-ubuntu18.04
	PyTorch 1.7	GPU	3.7 (py37)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.7.1-gpu-py37-cu110-ubuntu18.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.7.1-gpu-py37-cu110-ubuntu18.04

Image versions

This section describes the operating systems, Python versions, and third-party libraries supported by each community image.

tensorflow-training:2.3-cpu-py36-ubuntu18.04

Operating system: Ubuntu 18.04.5 LTS
Python version: 3.6.9

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	asn1crypto 0.24.0	astunparse 1.6.3	cachetools 4.2.0
certifi 2020.12.5	cryptography 2.1.4	gast 0.3.3	google-auth 1.24.0
google-auth-oauthlib 0.4.2	google-pasta 0.2.0	grpcio 1.34.0	h5py 2.10.0
idna 2.6	importlib-metadata 3.3.0	Keras-Preprocessing 1.1.2	keyring 10.6.0
keyrings.alt 3.0	Markdown 3.3.3	numpy 1.18.5	oauthlib 3.1.0
opt-einsum 3.3.0	pip 20.2.4	protobuf 3.14.0	pyasn1 0.4.8
pyasn1-modules 0.2.8	pycrypto 2.6.1	pygobject 3.26.1	pyxdg 0.25
requests 2.25.1	requests-oauthlib 1.3.0	rsa 4.6	SecretStorage 2.3.1
setuptools 51.1.1	six 1.15.0	tensorboard 2.4.0	tensorboard-plugin-wit 1.7.0
tensorflow 2.3.2	tensorflow-estimator 2.3.0	termcolor 1.1.0	typing-extensions 3.7.4.3
urllib3 1.26.2	werkzeug 1.0.1	wheel 0.30.0	wrapt 1.12.1
zipp 3.4.0

tensorflow-training:2.3-gpu-py36-cu101-ubuntu18.04

Operating system: Ubuntu 18.04.5 LTS
Python version: 3.6.9
CUDA version: 10.1

Third-party libraries: The following table lists the supported third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	asn1crypto 0.24.0	astunparse 1.6.3	cachetools 4.2.0
certifi 2020.12.5	cryptography 2.1.4	grpcio 1.34.0	gast 0.3.3
google-auth 1.24.0	google-auth-oauthlib 0.4.2	google-pasta 0.2.0	h5py 2.10.0
idna 2.6	importlib-metadata 3.3.0	Keras-Preprocessing 1.1.2	keyrings.alt 3.0
keyring 10.6.0	Markdown 3.3.3	numpy 1.18.5	oauthlib 3.1.0
opt-einsum 3.3.0	python-apt 1.6.5+ubuntu0.5	pip 20.2.4	protobuf 3.14.0
pyasn1 0.4.8	pyasn1-modules 0.2.8	pycrypto 2.6.1	pygobject 3.26.1
pyxdg 0.25	requests 2.25.1	requests-oauthlib 1.3.0	rsa 4.6
SecretStorage 2.3.1	setuptools 51.1.1	six 1.15.0	tensorboard 2.4.0
tensorboard-plugin-wit 1.7.0	tensorflow-gpu 2.3.2	tensorflow-estimator 2.3.0	termcolor 1.1.0
typing-extensions 3.7.4.3	urllib3 1.26.2	werkzeug 1.0.1	wheel 0.30.0
wrapt 1.12.1	zipp 3.4.0

tensorflow-training:1.15-cpu-py36-ubuntu18.04

Operating system: Ubuntu 18.04.5 LTS
Python version: 3.6.9

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	asn1crypto 0.24.0	astor 0.8.1	cryptography 2.1.4
gast 0.2.2	google-pasta 0.2.0	grpcio 1.34.0	h5py 2.10.0
idna 2.6	importlib-metadata 3.3.0	Keras-Preprocessing 1.1.2	Keras-Applications 1.0.8
keyring 10.6.0	keyrings.alt 3.0	Markdown 3.3.3	numpy 1.18.5
opt-einsum 3.3.0	pip 20.3.3	protobuf 3.14.0	pycrypto 2.6.1
pygobject 3.26.1	pyxdg 0.25	SecretStorage 2.3.1	setuptools 51.1.1
six 1.11.0	tensorboard 1.15.0	tensorflow 1.15.5	tensorflow-estimator 1.15.1
termcolor 1.1.0	typing-extensions 3.7.4.3	werkzeug 1.0.1	wheel 0.30.0
wrapt 1.12.1	zipp 3.4.0

tensorflow-training:1.15-gpu-py36-cu100-ubuntu18.04

Operating system: Ubuntu 18.04.5 LTS
Python version: 3.6.9
CUDA version: 10.0

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	asn1crypto 0.24.0	astor 0.8.1	cryptography 2.1.4
gast 0.2.2	google-pasta 0.2.0	grpcio 1.34.0	h5py 2.10.0
idna 2.6	importlib-metadata 3.3.0	Keras-Preprocessing 1.1.2	Keras-Applications 1.0.8
keyring 10.6.0	keyrings.alt 3.0	Markdown 3.3.3	numpy 1.18.5
opt-einsum 3.3.0	pip 20.3.3	protobuf 3.14.0	pycrypto 2.6.1
pygobject 3.26.1	pyxdg 0.25	SecretStorage 2.3.1	setuptools 51.1.1
six 1.11.0	tensorboard 1.15.0	tensorflow-gpu 1.15.5	tensorflow-estimator 1.15.1
termcolor 1.1.0	typing-extensions 3.7.4.3	werkzeug 1.0.1	wheel 0.30.0
wrapt 1.12.1	zipp 3.4.0	python-apt 1.6.5+ubuntu0.5

pytorch-training:1.6.0-gpu-py37-cu101-ubuntu18.04

Operating system: Ubuntu 18.04.4 LTS
Python version: 3.7.7
CUDA version: 10.1

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
backcall 0.2.0	beautifulsoup4 4.9.1	certifi 2020.6.20	cffi 1.14.0
cryptography 2.9.2	conda 4.8.3	conda-build 3.18.11	conda-package-handling 1.7.0
decorator 4.4.2	filelock 3.0.12	glob2 0.7	ipython-genutils 0.2.0
idna 2.9	ipython 7.16.1	jedi 0.17.1	Jinja2 2.11.2
libarchive-c 2.9	MarkupSafe 1.1.1	mkl-fft 1.1.0	mkl-service 2.3.0
mkl-random 1.1.1	numpy 1.18.5	olefile 0.46	PyYAML 5.3.1
parso 0.7.0	pexpect 4.8.0	pickleshare 0.7.5	Pillow 7.2.0
pip 20.0.2	pkginfo 1.5.0.1	prompt-toolkit 3.0.5	psutil 5.7.0
ptyprocess 0.6.0	pycosat 0.6.3	pycparser 2.20	Pygments 2.6.1
pyOpenSSL 19.1.0	PySocks 1.7.1	pytz 2020.1	ruamel-yaml 0.15.87
requests 2.23.0	soupsieve 2.0.1	setuptools 46.4.0.post20200518	six 1.14.0
traitlets 4.3.3	torch 1.6.0	torchvision 0.7.0	tqdm 4.46.0
urllib3 1.25.8	wheel 0.34.2	wcwidth 0.2.5

pytorch-training:1.7.1-gpu-py37-cu110-ubuntu18.04

Operating system: Ubuntu 18.04.5 LTS
Python version: 3.8.5
CUDA version: 11.0

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
backcall 0.2.0	beautifulsoup4 4.9.3	brotlipy 0.7.0	certifi 2020.12.5
cffi 1.14.3	cryptography 3.2.1	conda 4.9.2	conda-build 3.21.4
conda-package-handling 1.7.2	dnspython 2.1.0	decorator 4.4.2	filelock 3.0.12
glob2 0.7	ipython-genutils 0.2.0	idna 2.10	ipython 7.19.0
Jinja2 2.11.2	jedi 0.17.2	libarchive-c 2.9	mkl-service 2.3.0
MarkupSafe 1.1.1	mkl-fft 1.2.0	mkl-random 1.1.1	numpy 1.19.2
olefile 0.46	PyYAML 5.3.1	parso 0.7.0	pexpect 4.8.0
pickleshare 0.7.5	Pillow 8.1.0	pip 20.2.4	pkginfo 1.7.0
prompt-toolkit 3.0.8	psutil 5.7.2	ptyprocess 0.7.0	pycosat 0.6.3
pycparser 2.20	Pygments 2.7.4	pyOpenSSL 19.1.0	PySocks 1.7.1
python-etcd 0.4.5	pytz 2020.5	ruamel-yaml 0.15.87	requests 2.24.0
soupsieve 2.1	setuptools 50.3.1.post20201107	six 1.15.0	typing-extensions 3.7.4.3
torch 1.7.1	torchelastic 0.2.1	torchvision 0.8.2	tqdm 4.51.0
traitlets 5.0.5	urllib3 1.25.11	wheel 0.35.1	wcwidth 0.2.5

Alibaba Cloud image

Images

Official images provided by Alibaba Cloud. Click to view the details of the image files.

registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py27-ubuntu16.04
registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-mkl-cpu-py27-ubuntu16.04
registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-gpu-py27-cu100-ubuntu16.04

registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py36-ubuntu16.04
registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-mkl-cpu-py36-ubuntu16.04
registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-gpu-py36-cu100-ubuntu16.04

registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.15.0PAI-gpu-py27-cu100-ubuntu16.04
registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.15.0PAI-gpu-py36-cu100-ubuntu16.04

registry.${region}.aliyuncs.com/pai-dlc/pytorch-training:1.3.1PAI-gpu-py37-cu100-ubuntu16.04
registry.${region}.aliyuncs.com/pai-dlc/pytorch-training:1.4.0PAI-gpu-py37-cu100-ubuntu16.04
registry.${region}.aliyuncs.com/pai-dlc/pytorch-training:1.5.1PAI-gpu-py37-cu100-ubuntu16.04
registry.${region}.aliyuncs.com/pai-dlc/pytorch-training:1.6.0PAI-gpu-py37-cu100-ubuntu16.04

registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py27-ubuntu18.04
registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-gpu-py27-cu101-ubuntu18.04
registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py36-ubuntu18.04
registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-gpu-py36-cu101-ubuntu18.04
registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.15.4PAI-cpu-py36-ubuntu18.04
registry.${region}.aliyuncs.com/pai-dlc/tensorflow-training:1.15.4PAI-gpu-py36-cu101-ubuntu18.04

Replace ${region} with a specific region. Example values:

cn-hangzhou
cn-shanghai
cn-qingdao
cn-beijing
cn-zhangjiakou
cn-huhehaote
cn-shenzhen
cn-chengdu
cn-hongkong
ap-southeast-1

The following table lists the URL of PAI images when ${region} is set to cn-hangzhou.

${region}	Framework	CPU/GPU	Python version	Image URL
cn-hangzhou	TensorFlow 1.12	CPU	2.7 (py27)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py27-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py27-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI2011-cpu-py27-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI2011-cpu-py27-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py27-ubuntu18.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py27-ubuntu18.04
		MKL-CPU	2.7 (py27)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-mkl-cpu-py27-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-mkl-cpu-py27-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI2011-mkl-cpu-py27-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI2011-mkl-cpu-py27-ubuntu16.04
		GPU	2.7 (py27)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-gpu-py27-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-gpu-py27-cu100-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI2011-gpu-py27-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI2011-gpu-py27-cu100-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-gpu-py27-cu101-ubuntu18.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-gpu-py27-cu101-ubuntu18.04
		CPU	3.6 (py36)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py36-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py36-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI2011-cpu-py36-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI2011-cpu-py36-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py36-ubuntu18.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-cpu-py36-ubuntu18.04
		MKL-CPU	3.6 (py36)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-mkl-cpu-py36-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-mkl-cpu-py36-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI2011-mkl-cpu-py36-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI2011-mkl-cpu-py36-ubuntu16.04
		GPU	3.6 (py36)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-gpu-py36-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-gpu-py36-cu100-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI2011-gpu-py36-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI2011-gpu-py36-cu100-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-gpu-py36-cu101-ubuntu18.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.12.2PAI-gpu-py36-cu101-ubuntu18.04
	TensorFlow 1.15	GPU	2.7 (py27)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.0PAI-gpu-py27-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.0PAI-gpu-py27-cu100-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.0PAI2011-gpu-py27-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.0PAI2011-gpu-py27-cu100-ubuntu16.04
		CPU	3.6 (py36)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.4PAI-cpu-py36-ubuntu18.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.4PAI-cpu-py36-ubuntu18.04
		GPU	3.6 (py36)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.0PAI-gpu-py36-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.0PAI-gpu-py36-cu100-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.0PAI2011-gpu-py36-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.0PAI2011-gpu-py36-cu100-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.4PAI-gpu-py36-cu101-ubuntu18.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/tensorflow-training:1.15.4PAI-gpu-py36-cu101-ubuntu18.04
	PyTorch 1.3	GPU	3.7 (py37)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.3.1PAI-gpu-py37-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.3.1PAI-gpu-py37-cu100-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.3.1PAI2011-gpu-py37-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.3.1PAI2011-gpu-py37-cu100-ubuntu16.04
	PyTorch 1.4	GPU	3.7 (py37)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.4.0PAI-gpu-py37-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.4.0PAI-gpu-py37-cu100-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.4.0PAI2011-gpu-py37-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.4.0PAI2011-gpu-py37-cu100-ubuntu16.04
	PyTorch 1.5	GPU	3.7 (py37)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.5.1PAI-gpu-py37-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.5.1PAI-gpu-py37-cu100-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.5.1PAI2011-gpu-py37-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.5.1PAI2011-gpu-py37-cu100-ubuntu16.04
	PyTorch 1.6	GPU	3.7 (py37)	registry.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.6.0PAI-gpu-py37-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.6.0PAI-gpu-py37-cu100-ubuntu16.04 registry.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.6.0PAI2011-gpu-py37-cu100-ubuntu16.04 registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.6.0PAI2011-gpu-py37-cu100-ubuntu16.04

Image versions

This section describes the operating systems, Python versions, and third-party libraries supported by each Alibaba Cloud image.

tensorflow-training:1.12.2PAI-cpu-py27-ubuntu16.04

Operating system: Ubuntu 16.04.6 LTS
Python version: 2.7.18 Anaconda

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	aliyun-python-sdk-core 2.13.15	aliyun-python-sdk-kms 2.14.0	astor 0.8.1
backports.weakref 1.0.post1	certifi 2020.6.20	crcmod 1.7	Cython 0.29.14
enum34 1.1.6	funcsigs 1.0.2	futures 3.3.0	gast 0.4.0
grpcio 1.27.2	h5py 2.10.0	jmespath 0.10.0	Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2	Markdown 3.1.1	mkl-fft 1.0.15	mkl-random 1.1.0
mkl-service 2.3.0	mock 3.0.5	numpy 1.16.4	opencv-python 4.2.0.32
oss2 2.9.1	paiio 0.1.0	pip 9.0.1	protobuf 3.14.0
pycryptodome 3.9.7	pyodps 0.10.4	pypai 1.1.0+tensorflow.1.12.2pai2011	requests 2.13.0
setuptools 36.4.0	six 1.15.0	tensorboard 1.12.2	tensorflow 1.12.2PAI2011
termcolor 1.1.0	toposort 1.5	Werkzeug 1.0.1	wheel 0.35.1

tensorflow-training:1.12.2PAI-mkl-cpu-py27-ubuntu16.04

Operating system: Ubuntu 16.04.6 LTS
Python version: 2.7.18 Anaconda

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	aliyun-python-sdk-core 2.13.15	aliyun-python-sdk-kms 2.14.0	astor 0.8.1
backports.weakref 1.0.post1	certifi 2020.6.20	crcmod 1.7	Cython 0.29.14
enum34 1.1.6	funcsigs 1.0.2	futures 3.3.0	gast 0.4.0
grpcio 1.27.2	h5py 2.10.0	jmespath 0.10.0	Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2	Markdown 3.1.1	mkl-fft 1.0.15	mkl-random 1.1.0
mkl-service 2.3.0	mock 3.0.5	numpy 1.16.4	opencv-python 4.2.0.32
oss2 2.9.1	paiio 0.1.0	pip 9.0.1	protobuf 3.14.0
pycryptodome 3.9.7	pyodps 0.10.4	pypai 1.1.0+tensorflow.1.12.2pai2011	requests 2.13.0
setuptools 36.4.0	six 1.15.0	tensorboard 1.12.2	tensorflow 1.12.2PAI2011
termcolor 1.1.0	toposort 1.5	Werkzeug 1.0.1	wheel 0.35.1

tensorflow-training:1.12.2PAI-gpu-py27-cu100-ubuntu16.04

Operating system: Ubuntu 16.04.6 LTS
Python version: 2.7.18 Anaconda
CUDA version: 10.0

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	aliyun-python-sdk-core 2.13.15	aliyun-python-sdk-kms 2.14.0	astor 0.8.1
backports.weakref 1.0.post1	certifi 2020.6.20	crcmod 1.7	Cython 0.29.14
enum34 1.1.6	funcsigs 1.0.2	futures 3.3.0	gast 0.4.0
grpcio 1.27.2	h5py 2.10.0	jmespath 0.10.0	Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2	Markdown 3.1.1	mkl-fft 1.0.15	mkl-random 1.1.0
mkl-service 2.3.0	mock 3.0.5	numpy 1.16.4	opencv-python 4.2.0.32
oss2 2.9.1	paiio 0.1.0	pip 9.0.1	protobuf 3.14.0
pycryptodome 3.9.7	pyodps 0.10.4	pypai 1.1.0+tensorflow.gpu.1.12.2pai2011	requests 2.13.0
setuptools 36.4.0	six 1.15.0	tensorboard 1.12.2	tensorflow-gpu 1.12.2PAI2011
termcolor 1.1.0	toposort 1.5	Werkzeug 1.0.1	wheel 0.35.1
subprocess32 3.5.4	tao-wrapper 0.1.1	whale 0.0.2

tensorflow-training:1.12.2PAI-cpu-py36-ubuntu16.04

Operating system: Ubuntu 16.04.6 LTS
Python version: 3.6.12 Anaconda

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	aliyun-python-sdk-core 2.13.29	aliyun-python-sdk-core-v3 2.13.11	aliyun-python-sdk-kms 2.14.0
astor 0.8.1	cached-property 1.5.2	certifi 2020.12.5	crcmod 1.7
Cython 0.29.21	gast 0.4.0	grpcio 1.31.0	h5py 3.1.0
importlib-metadata 3.4.0	jmespath 0.10.0	Keras-Applications 1.0.8	Keras-Preprocessing 1.1.2
Markdown 3.3.3	mkl-fft 1.2.0	mkl-random 1.1.1	mkl-service 2.3.0
numpy 1.16.4	opencv-python 4.2.0.32	oss2 2.12.1	paiio 0.1.0
pip 20.2.4	protobuf 3.14.0	pycryptodome 3.9.9	pyodps 0.10.4
pypai 1.1.0+tensorflow.1.12.2pai2011	requests 2.13.0	setuptools 50.3.1.post20201107	six 1.15.0
tensorboard 1.12.2	tensorflow 1.12.2PAI2011	termcolor 1.1.0	toposort 1.5
typing-extensions 3.7.4.3	Werkzeug 1.0.1	wheel 0.35.1	zipp 3.4.0

tensorflow-training:1.12.2PAI-mkl-cpu-py36-ubuntu16.04

Operating system: Ubuntu 16.04.6 LTS
Python version: 3.6.12 Anaconda

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	aliyun-python-sdk-core 2.13.29	aliyun-python-sdk-core-v3 2.13.11	aliyun-python-sdk-kms 2.14.0
astor 0.8.1	cached-property 1.5.2	certifi 2020.12.5	crcmod 1.7
Cython 0.29.21	gast 0.4.0	grpcio 1.31.0	h5py 3.1.0
importlib-metadata 3.4.0	jmespath 0.10.0	Keras-Applications 1.0.8	Keras-Preprocessing 1.1.2
Markdown 3.3.3	mkl-fft 1.2.0	mkl-random 1.1.1	mkl-service 2.3.0
numpy 1.16.4	opencv-python 4.2.0.32	oss2 2.12.1	paiio 0.1.0
pip 20.2.4	protobuf 3.14.0	pycryptodome 3.9.9	pyodps 0.10.4
pypai 1.1.0+tensorflow.1.12.2pai2011	requests 2.13.0	setuptools 50.3.1.post20201107	six 1.15.0
tensorboard 1.12.2	tensorflow 1.12.2PAI2011	termcolor 1.1.0	toposort 1.5
typing-extensions 3.7.4.3	Werkzeug 1.0.1	wheel 0.35.1	zipp 3.4.0

tensorflow-training:1.12.2PAI-gpu-py36-cu100-ubuntu16.04

Operating system: Ubuntu 16.04.6 LTS
Python version: 3.6.12 Anaconda
CUDA version: 10.0

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	aliyun-python-sdk-core 2.13.29	aliyun-python-sdk-core-v3 2.13.11	aliyun-python-sdk-kms 2.14.0
astor 0.8.1	cached-property 1.5.2	certifi 2020.12.5	crcmod 1.7
Cython 0.29.21	gast 0.4.0	grpcio 1.31.0	h5py 3.1.0
importlib-metadata 3.4.0	jmespath 0.10.0	Keras-Applications 1.0.8	Keras-Preprocessing 1.1.2
Markdown 3.3.3	mkl-fft 1.2.0	mkl-random 1.1.1	mkl-service 2.3.0
numpy 1.16.4	opencv-python 4.2.0.32	oss2 2.12.1	paiio 0.1.0
pip 20.2.4	protobuf 3.14.0	pycryptodome 3.9.9	pyodps 0.10.4
pypai 1.1.0+tensorflow.gpu.1.12.2pai2011	requests 2.13.0	setuptools 50.3.1.post20201107	six 1.15.0
tensorboard 1.12.2	tensorflow-gpu 1.12.2PAI2011	termcolor 1.1.0	toposort 1.5
typing-extensions 3.7.4.3	Werkzeug 1.0.1	wheel 0.35.1	zipp 3.4.0
subprocess32 3.5.4	tao-wrapper 0.1.1	whale 0.0.2

tensorflow-training:1.15.0PAI-gpu-py27-cu100-ubuntu16.04

Operating system: Ubuntu 16.04.6 LTS
Python version: 2.7.18 Anaconda
CUDA version: 10.0

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	aliyun-python-sdk-core 2.13.15	aliyun-python-sdk-kms 2.14.0	astor 0.8.1
backports.weakref 1.0.post1	certifi 2020.6.20	crcmod 1.7	Cython 0.29.14
enum34 1.1.6	funcsigs 1.0.2	functools32 3.2.3.post2	futures 3.3.0
gast 0.2.2	google-pasta 0.2.0	opt-einsum 2.3.2	tensorflow-estimator 1.15.1
grpcio 1.27.2	h5py 2.10.0	jmespath 0.10.0	Keras-Applications 1.0.8
Keras-Preprocessing 1.1.2	Markdown 3.1.1	mkl-fft 1.0.15	mkl-random 1.1.0
mkl-service 2.3.0	mock 3.0.5	numpy 1.16.4	opencv-python 4.2.0.32
oss2 2.9.1	paiio 0.1.0	pip 9.0.1	protobuf 3.14.0
pycryptodome 3.9.7	pyodps 0.10.4	pypai 1.1.0+tensorflow.gpu.1.15.0	requests 2.13.0
setuptools 44.1.1	six 1.15.0	tensorboard 1.15.0	tensorflow-gpu 1.15.0
termcolor 1.1.0	toposort 1.5	Werkzeug 1.0.1	wheel 0.35.1
subprocess32 3.5.4	tao-wrapper 0.1.1	whale 0.0.2	wrapt 1.12.1

tensorflow-training:1.15.0PAI-gpu-py36-cu100-ubuntu16.04

Operating system: Ubuntu 16.04.6 LTS
Python version: 3.6.12 Anaconda
CUDA version: 10.0

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	aliyun-python-sdk-core 2.13.29	aliyun-python-sdk-core-v3 2.13.11	aliyun-python-sdk-kms 2.14.0
astor 0.8.1	cached-property 1.5.2	certifi 2020.12.5	crcmod 1.7
Cython 0.29.21	gast 0.2.2	grpcio 1.31.0	h5py 3.1.0
importlib-metadata 3.4.0	jmespath 0.10.0	Keras-Applications 1.0.8	Keras-Preprocessing 1.1.2
Markdown 3.3.3	mkl-fft 1.2.0	mkl-random 1.1.1	mkl-service 2.3.0
numpy 1.16.4	opencv-python 4.2.0.32	oss2 2.12.1	paiio 0.1.0
pip 20.2.4	protobuf 3.14.0	pycryptodome 3.9.9	pyodps 0.10.4
pypai 1.1.0+tensorflow.gpu.1.15.0	requests 2.13.0	setuptools 50.3.1.post20201107	six 1.15.0
tensorboard 1.15.0	tensorflow-gpu 1.15.0	termcolor 1.1.0	toposort 1.5
typing-extensions 3.7.4.3	Werkzeug 1.0.1	wheel 0.35.1	zipp 3.4.0
subprocess32 3.5.4	tao-wrapper 0.1.1	whale 0.0.2	google-pasta 0.2.0
opt-einsum 3.3.0	tensorflow-estimator 1.15.1	wrapt 1.12.1

pytorch-training:1.3.1PAI-gpu-py37-cu100-ubuntu16.04

Operating system: Ubuntu 16.04.6 LTS
Python version: 3.7.4
CUDA version: 10.0

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	aiohttp 3.7.3	apex 0.1	asn1crypto 1.2.0
async-timeout 3.0.1	attrs 20.3.0	blinker 1.4	cachetools 4.2.0
certifi 2020.12.5	cffi 1.13.0	cryptography 2.8	click 7.1.2
conda 4.9.2	conda-package-handling 1.6.0	future 0.18.2	grpcio 1.31.0
google-auth 1.24.0	google-auth-oauthlib 0.4.2	importlib-metadata 2.0.0	idna 2.8
multidict 4.7.6	Markdown 3.3.3	mkl-fft 1.2.0	mkl-random 1.1.1
mkl-service 2.3.0	nvidia-dali 0.15.0	numpy 1.19.2	oauthlib 3.1.0
PySocks 1.7.1	Pillow 8.1.0	pip 20.2.4	protobuf 3.13.0
pyasn1 0.4.8	pyasn1-modules 0.2.8	pycosat 0.6.3	pycparser 2.19
PyJWT 2.0.0	pyOpenSSL 19.0.0	ruamel-yaml 0.15.46	requests 2.22.0
requests-oauthlib 1.3.0	rsa 4.7	six 1.15.0	sailfish 1.0.1
setuptools 50.3.1.post20201107	typing-extensions 3.7.4.3	tensorboard 2.3.0	tensorboard-plugin-wit 1.6.0
torch 1.3.1+ali	torchsummary 1.5.1	torchvision 0.4.2	tqdm 4.36.1
urllib3 1.24.2	Werkzeug 1.0.1	wheel 0.35.1	yarl 1.5.1
zipp 3.4.0

pytorch-training:1.4.0PAI-gpu-py37-cu100-ubuntu16.04

Operating system: Ubuntu 16.04.6 LTS
Python version: 3.7.4
CUDA version: 10.0

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	aiohttp 3.7.3	apex 0.1	asn1crypto 1.2.0
async-timeout 3.0.1	attrs 20.3.0	blinker 1.4	cachetools 4.2.0
certifi 2020.12.5	cffi 1.13.0	cryptography 2.8	click 7.1.2
conda 4.9.2	conda-package-handling 1.6.0	future 0.18.2	grpcio 1.31.0
google-auth 1.24.0	google-auth-oauthlib 0.4.2	importlib-metadata 2.0.0	idna 2.8
multidict 4.7.6	Markdown 3.3.3	mkl-fft 1.2.0	mkl-random 1.1.1
mkl-service 2.3.0	nvidia-dali 0.15.0	numpy 1.19.2	oauthlib 3.1.0
PySocks 1.7.1	Pillow 8.1.0	pip 20.2.4	protobuf 3.13.0
pyasn1 0.4.8	pyasn1-modules 0.2.8	pycosat 0.6.3	pycparser 2.19
PyJWT 2.0.0	pyOpenSSL 19.0.0	ruamel-yaml 0.15.46	requests 2.22.0
requests-oauthlib 1.3.0	rsa 4.7	six 1.15.0	setuptools 50.3.1.post20201107
typing-extensions 3.7.4.3	tensorboard 2.3.0	tensorboard-plugin-wit 1.6.0	torch 1.4.0+ali
torchsummary 1.5.1	torchvision 0.5.0	tqdm 4.36.1	urllib3 1.24.2
wheel 0.35.1	Werkzeug 1.0.1	yarl 1.5.1	zipp 3.4.0

pytorch-training:1.5.1PAI-gpu-py37-cu100-ubuntu16.04

Operating system: Ubuntu 16.04.6 LTS
Python version: 3.7.4
CUDA version: 10.0

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	aiohttp 3.7.3	apex 0.1	asn1crypto 1.2.0
async-timeout 3.0.1	attrs 20.3.0	blinker 1.4	cachetools 4.2.0
certifi 2020.12.5	cffi 1.13.0	cryptography 2.8	click 7.1.2
conda 4.9.2	conda-package-handling 1.6.0	future 0.18.2	grpcio 1.31.0
google-auth 1.24.0	google-auth-oauthlib 0.4.2	importlib-metadata 2.0.0	idna 2.8
multidict 4.7.6	Markdown 3.3.3	mkl-fft 1.2.0	mkl-random 1.1.1
mkl-service 2.3.0	nvidia-dali 0.15.0	numpy 1.19.2	oauthlib 3.1.0
PySocks 1.7.1	Pillow 8.1.0	pip 20.2.4	protobuf 3.13.0
pyasn1 0.4.8	pyasn1-modules 0.2.8	pycosat 0.6.3	pycparser 2.19
PyJWT 2.0.0	pyOpenSSL 19.0.0	rsa 4.7	requests 2.22.0
requests-oauthlib 1.3.0	ruamel-yaml 0.15.46	six 1.15.0	sailfish 1.0.1
setuptools 50.3.1.post20201107	typing-extensions 3.7.4.3	tensorboard 2.3.0	tensorboard-plugin-wit 1.6.0
torch 1.5.1+ali	torchsummary 1.5.1	torchvision 0.6.1	tqdm 4.36.1
urllib3 1.24.2	wheel 0.35.1	Werkzeug 1.0.1	yarl 1.5.1
zipp 3.4.0

pytorch-training:1.6.0PAI-gpu-py37-cu100-ubuntu16.04

Operating system: Ubuntu 16.04.6 LTS
Python version: 3.7.4
CUDA version: 10.0

Third-party libraries: The following table lists the third-party libraries and versions.

Third-party library and version
absl-py 0.11.0	aiohttp 3.7.3	asn1crypto 1.2.0	async-timeout 3.0.1
attrs 20.3.0	blinker 1.4	cachetools 4.2.0	certifi 2020.12.5
cffi 1.13.0	cryptography 2.8	click 7.1.2	conda 4.9.2
conda-package-handling 1.6.0	future 0.18.2	grpcio 1.31.0	google-auth 1.24.0
google-auth-oauthlib 0.4.2	importlib-metadata 2.0.0	idna 2.8	multidict 4.7.6
Markdown 3.3.3	mkl-fft 1.2.0	mkl-random 1.1.1	mkl-service 2.3.0
nvidia-dali 0.15.0	numpy 1.19.2	oauthlib 3.1.0	PySocks 1.7.1
Pillow 8.1.0	pip 20.2.4	protobuf 3.13.0	pyasn1 0.4.8
pyasn1-modules 0.2.8	pycosat 0.6.3	pycparser 2.19	PyJWT 2.0.0
pyOpenSSL 19.0.0	ruamel-yaml 0.15.46	requests 2.22.0	requests-oauthlib 1.3.0
rsa 4.7	six 1.15.0	setuptools 50.3.1.post20201107	typing-extensions 3.7.4.3
tensorboard 2.3.0	tensorboard-plugin-wit 1.6.0	torch 1.6.0+ali	torchsummary 1.5.1
torchvision 0.7.0	tqdm 4.36.1	urllib3 1.24.2	Werkzeug 1.0.1
wheel 0.35.1	yarl 1.5.1	zipp 3.4.0

Custom image

Custom images that you uploaded to PAI. If you choose to use a custom image, we recommend that you go to the AI Computing Asset Management > Images page and add the custom image as an AI asset. This way, the image can be used by multiple training jobs. For more information, see View and add images.

Important

If you use a custom image to submit training jobs that run on LINGJUN resources, take note of the usage notes. For more information, see RDMA (intelligent computing LINGJUN resources).

Step 3: Prepare a dataset

Before you submit a deep learning job, you need to upload the dataset required by the job to an OSS bucket or a NAS file system and register the dataset so that the job can use the dataset.

Supported dataset types

Datasets of the following types are supported: OSS, General-purpose NAS, Extreme NAS, CPFS, and CPFS for Lingjun.

You can enable the dataset acceleration feature for datasets of the OSS and CPFS type. When you submit a distributed training job, you can use the dataset acceleration feature to improve data read efficiency.
If you use LINGJUN resources to run DLC jobs, you can enable dataset acceleration only for OSS datasets.

Create a dataset

For information about how to configure the parameters, see Create and manage datasets. Take note of the following items:

When you create a dataset for training jobs, you need to select Alibaba Cloud Storage Service and set Property to Folder.
Compared to NAS, OSS is a distributed object storage service instead of a file system. When you use OSS to store data, some file system features are not supported. For example, you cannot append data to or overwrite existing files in OSS buckets.
If you select a CPFS dataset, you also need to configure the virtual private cloud (VPC). The VPC must be the same as the one that you configured for the CPFS dataset. Otherwise, exceptions may occur and the DLC training jobs are removed from the queue after you submit the jobs.

Step 4: Prepare a code build

Before you submit a deep learning job, you need to add the code required by the job to a code build. We recommend that you go to the AI Computing Asset Management > Source Code Repositories page and add the code build as an AI asset. This way, the code build can be used by multiple training jobs. For more information, see Code builds.

References

After you complete the preparations, you can create a training task. For more information, see Submit a training job.