【DSW Gallery】How to use the SDK to submit DLC training tasks

overview
PAI-DLC (Deep Learning Containers) is a deep learning training platform based on Alibaba Cloud Container Service for Kubernetes (ACK), providing you with a flexible, stable, easy-to-use and extreme performance deep learning training environment.
In addition to submitting tasks on the PAI console interface, the PAI platform also provides a complete SDK and OpenAPI to implement code-based task submission, making PAI-DLC more flexible for your daily production.
This article will introduce how to use the SDK provided by PAI-DLC to implement the submission of training tasks.
If you need to submit PAI-DLC public resource group or exclusive resource group tasks through the interface, please refer to submitting tasks (via the Training page).
prerequisite
• Activate PAI-DLC and complete authorization. For details, see Cloud Product Dependency and Authorization: DLC.
• To run the training task, prepare the resource group cluster (this article uses the public DLC resource group for demonstration)
• You have obtained the AccessKey ID and AccessKey Secret of your Alibaba Cloud account. For details, see Obtaining an AccessKey.
Step 1: Install the Python SDK
• Install the AI Workspace SDK
• Install the SDK for PAI-DLC
• Install the SDK for prepaid resource group management (optional)
!pip install alibabacloud-aiworkspace20210204 -U -q
!pip install alibabacloud-pai-dlc20201203 -U -q
# Query the SDK required by the prepaid resource group
!pip install https://sdk-portal-us-prod.oss-accelerate.aliyuncs.com/downloads/u-b8602de7-c468-436c-8a02-2eca4a30d376-python-paistudio.zip -U -q
Step 2: Preparations before submitting the task
When PAI-DLC performs training tasks, it can use the resources of each AI asset module (including: data set, image, code set, etc.) of the PAI platform to facilitate the reuse of development materials. At the same time, users are also required to prepare an AI workspace in advance to integrate related workflows.
from __future__ import print_function
import json
import time
from alibabacloud_tea_openapi.models import Config
from alibabacloud_pai_dlc20201203.client import Client as DLCClient
from alibabacloud_pai_dlc20201203.models import (
ListJobsRequest,
ListEcsSpecsRequest,
CreateJobRequest,
UpdateJobRequest,
JobSpec
)
from alibabacloud_aiworkspace20210204.client import Client as AIWorkspaceClient
from alibabacloud_aiworkspace20210204.models import (
ListWorkspacesRequest,
CreateDatasetRequest,
ListDatasetsRequest,
ListImagesRequest,
ListCodeSourcesRequest,
ListResourcesRequest
)
from alibabacloud_paistudio20220112.models import (ListResourceGroupMachineGroupsRequest)
from alibabacloud_paistudio20220112.client import Client as StudioClient
# client config
region_id = 'cn-beijing' # Region, can be cn-hangzhou, cn-shanghai, cn-shenzhen, etc.
access_key_id = '**your access_key_id**'
access_key_secret = '**your access_key_secret**'
# Read AK information from the file
import os.path
config_path="/mnt/data/pai.config"
if os.path.isfile(config_path):
with open(config_path) as f:

workspace_client = AIWorkspaceClient(
Config(access_key_id=access_key_id,
access_key_secret=access_key_secret,
region_id=region_id,
endpoint='aiworkspace.{}.aliyuncs.com'.format(region_id)))
dlc_client = DLCClient(
Config(access_key_id=access_key_id,
access_key_secret=access_key_secret,
region_id=region_id,
endpoint='pai-dlc.{}.aliyuncs.com'.format(region_id)))
# Studio Client is only required for prepaid resource groups
studio_client = StudioClient(Config(access_key_id=access_key_id,
access_key_secret=access_key_secret,
region_id = region_id))
Prepare AI workspace (required)
Workspace is the top-level concept of PAI, which provides unified computing resource management and personnel authority management capabilities for the team. The purpose is to provide AI developers with full-process development tools and AI asset management capabilities that support team collaboration.
When the PAI platform is activated, it will automatically create a default workspace for the user
workspace_name = '**Existing AI workspace name**'
# Get a list of workspaces.
workspaces = workspace_client.list_workspaces(ListWorkspacesRequest(
page_number=1,
page_size=10,
workspace_name=workspace_name,
))
if len(workspaces. body. workspaces) == 0:
raise RuntimeError('Please specify the correct workspace_name')
for workspace in workspaces.body.workspaces:
print(workspace.workspace_name, workspace.workspace_id,
workspace.status, workspace.creator)
# Get the workspace ID used to submit the task
workspace_id = workspaces.body.workspaces[0].workspace_id
Prepare to mirror (required)
To perform training tasks, you need to explicitly specify the image used by the computing nodes. PAI-DLC supports the option to use different types of images:
• Community mirrors: Standard mirrors provided by the community. For details about different mirrors, see Community Mirror Version Details.
• PAI platform image: Various official images provided by Alibaba Cloud PAI products, supporting different resource types, Python versions, and deep learning frameworks TensorFlow and PyTorch. For the list of images, please refer to the list of public images.
• User-defined image: You can choose to use the custom image you added to the PAI. Before selecting, you need to add the custom image to the PAI. For details, see Viewing and Adding an Image.
• Mirror address: You can choose to use your custom mirror. After selecting the mirror address, you need to configure the Docker Registry Image URL accessible under the public network environment in the configuration box.
# Get the mirror list, you can use labels to filter
images = workspace_client.list_images(ListImagesRequest(
labels=','.join(['system.supported.dlc=true',
'system.framework=Tensorflow 1.15',
'system.pythonVersion=3.6',
'system. chipType=CPU']), verbose=True))

# You can view all available mirrors
for image in images.body.images:
print(image. image_id, image. image_uri)
# Obtain the image used to submit the task, here is the first one as an example
image_uri = images.body.images[0].image_uri
print('image_uri', image_uri)
Prepare node specifications (required for pay-as-you-go)
For training tasks, prepare the specifications of computing nodes. For a detailed list of specifications and fees, please refer to the PAI-DLC Billing Instructions.
# Get a list of node specifications for DLC.
ecs_specs = dlc_client.list_ecs_specs(ListEcsSpecsRequest(page_size=100, sort_by='Memory', order='asc'))
# for spec in ecs_specs.body.ecs_specs:
# print(spec. instance_type, spec. cpu, spec. memory, spec. memory, spec. gpu_type)
# Get the node specification used to submit the job
ecs_spec = ecs_specs.body.ecs_specs[0].instance_type
print('ecs_spec', ecs_spec)
Prepare dataset (optional)
High-quality datasets are the foundation of high-precision models and the core goal of data preparation. The PAI platform provides a dataset management module that supports the registration of various types of data (local data, data stored in Alibaba Cloud, etc.) as datasets, and also supports scanning OSS folders to generate index datasets to prepare for model training.
dataset_name = 'example-nas-data'
nas_id = '**The id of the created nas'
def create_nas_dataset(client, region, workspace_id, name,
nas_id, nas_path, mount_path):
'''Create NAS dataset.
'''
response = client.create_dataset(CreateDatasetRequest(
workspace_id = workspacee_id,
name=name,
data_type='COMMON',
data_source_type='NAS',
property='DIRECTORY',
uri=f'nas://{nas_id}.{region}{nas_path}',
accessibility='PRIVATE',
source_type='USER',
options=json.dumps({
'mountPath': mount_path
})
))
return response.body.dataset_id
def create_oss_dataset(client, region, workspace_id, name,
oss_bucket, oss_endpoint, oss_path, mount_path):

response = client.create_dataset(CreateDatasetRequest(
workspace_id=workspace_id,
name=name,
data_type='COMMON',
data_source_type='OSS',
property='DIRECTORY',
uri=f'oss://{oss_bucket}.{oss_endpoint}{oss_path}',
accessibility='PRIVATE',
source_type='USER',
options=json.dumps({
'mountPath': mount_path
})
))
return response.body.dataset_id
# datasets = workspace_client.list_datasets(ListDatasetsRequest(
# workspace_id=workspace_id,
# name=dataset_name, properties='DIRECTORY'))
# for dataset in datasets.body.datasets:
# print(dataset.name, dataset.dataset_id, dataset.uri, dataset.options)
# if len(datasets.body.datasets) == 0:
# dataset_id = create_nas_dataset(
# client=workspace_client,
# region=region_id,
# workspace_id=workspace_id,
# name=dataset_name,
# nas_id=nas_id,
# nas_path='/',
# mount_path='/mnt/data/example-nas')
# print('create dataset with id: {}'.format(dataset_id))
# else:
# dataset_id = datasets.body.datasets[0].dataset_id
# print('dataset_id', dataset_id)

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us