When the test data or hyperparameters of a training job are updated and require continuous incremental trainings or model fine-tuning, you can use the periodic scheduling feature to submit Deep Learning Containers (DLC) jobs at specific points in time. You can configure periodic scheduling for DLC jobs in DataWorks to automate job submission. This topic describes how to submit a DLC job at a scheduled time.
Background information
You can use one of the following methods to configure periodic scheduling for a DLC job:
Prerequisites
The permissions that are required to use DLC are obtained. For more information, see Grant the permissions that are required to use DLC.
DataWorks is authorized to access PAI.
You can complete the authorization with one click on the authorization page. For more information about the service-linked role that is created based on the authorization, see Role 1: AliyunServiceRoleForDataworksEngine. Only an Alibaba Cloud account or a RAM user to which the AliyunDataWorksFullAccess policy is attached can perform one-click authorization.
A workflow is created.
In DataStudio, development operations are performed on different development engines based on workflows. You must create a workflow before you can create a node. For more information, see Create a workflow.
Precautions
Each time a PAI DLC node is run, a new DLC task is generated on the DLC platform of PAI. To prevent multiple tasks that have the same name from being generated in PAI when you use DataWorks to periodically schedule PAI DLC nodes, we recommend that you configure an appropriate scheduling cycle based on your business requirements when you develop DLC tasks in DataWorks. We also recommend that you add a datetime variable to the task name and assign a time-based scheduling parameter to the variable as a value. This way, you can add a date and time to the task name. For more information, see the Step 2: Develop a PAI DLC task section in this topic.
You cannot use the shared resource group for scheduling to run PAI DLC tasks.
The operations described in this topic are performed in the China (Shanghai) region. You can perform operations in other regions based on the instructions displayed in the DataWorks console.
Method 1: Load a DLC job on a PAI DLC node in the DataWorks console and configure job scheduling
Step 1: Create a DLC job
Log on to the Platform for AI (PAI) console. Go to the Distributed Training Jobs page, and create a DLC job. In this topic, a PyTorch-based DLC job is used as an example. For information about how to create a PyTorch-based DLC job, see Submit a standalone training job that uses PyTorch.
Step 2: Create a PAI DLC node
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
On the DataStudio page, find the desired workflow, right-click the workflow name, and then choose .
In the Create Node dialog box, configure the Name parameter and click Confirm. Then, you can use the node to develop tasks and configure task scheduling properties.
On the DLC node tab, search for the DLC job that you created from the Load PAI DLC Job by its name and load the job.
After you load the job, the DLC node editor generates node code based on the configurations of the task in PAI. You can modify task configurations based on the code. For more information, see the "Step 2: Develop a PAI DLC task" section in the Create and use a PAI DLC node topic.
Step 3: Configure job scheduling
In the right-side pane of the node tab, click Properties. In the Properties panel, you can view configuration items such as General properties, Scheduling Parameter, Schedule, Resource Group, and Dependencies. Configure the parameters in the Schedule section. DataWorks automatically schedules and runs node tasks based on the specified scheduling cycle. For more information, see Overview.
Before you commit the node, you must configure the Rerun and Parent Nodes parameters on the Properties tab.
To prevent multiple tasks that have the same name from being generated in PAI when you use DataWorks to periodically schedule PAI DLC nodes, we recommend that you specify an appropriate scheduling cycle based on your business requirements.
Step 4: Debug the task code
To check whether the node is configured as expected, perform the following operations.
Optional. Select a resource group and assign custom parameters to variables.
Click the
icon in the top toolbar of the configuration tab of the node. In the Parameters dialog box, select a resource group for scheduling that you want to use to debug and run task code. If you use scheduling parameters in your task code, you can assign the scheduling parameters to variables as values in the task code for debugging. For more information about the value assignment logic of scheduling parameters, see What are the differences in the value assignment logic of scheduling parameters among the Run, Run with Parameters, and Perform Smoke Testing in Development Environment modes?
Save and execute SQL statements.
In the top toolbar, click the
icon to save SQL statements. Then, click the
icon to execute the SQL statements. Optional. Perform smoke testing.
You can perform smoke testing on the task in the development environment to check whether the task is run as expected when you commit the task or after you commit the task. For more information, see Perform smoke testing.
Step 5: Submit the task
After a task on a node is configured, you must commit and deploy the task. After you commit and deploy the task, the system runs the task on a regular basis based on scheduling configurations.
Click the
icon in the top toolbar to save the task. Click the
icon in the top toolbar to commit the task. In the Submit dialog box, configure the Change description parameter. Then, determine whether to review task code after you commit the task based on your business requirements.
NoteYou must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the task.
You can use the code review feature to ensure the code quality of tasks and prevent task execution errors caused by invalid task code. If you enable the code review feature, the task code that is committed can be deployed only after the task code passes the code review. For more information, see Code review.
If you use a workspace in standard mode, you must deploy the task in the production environment after you commit the task. To deploy a task on a node, click Deploy in the upper-right corner of the configuration tab of the node. For more information, see Deploy nodes.
Step 6: View operations logs
After you commit and deploy the task, the task is periodically run based on the scheduling configurations. You can click Operation Center in the upper-right corner of the configuration tab of the corresponding node to go to Operation Center and view the scheduling status of the task. For more information, see View and manage auto triggered nodes.
Method 2: Create a script task and configure job scheduling
Step 1: Create an exclusive resource group for scheduling
Create an exclusive resource group for scheduling in the DataWorks console. For more information, see Create an exclusive resource group for Data Integration.
Step 2: Associate the exclusive resource group with a workspace
Associate the exclusive resource group with a workspace. This way, you can select the resource group in the workspace when you submit a job. For more information, see Step 2: Associate the exclusive resource group for scheduling with a workspace.
Step 3: Install the DLC package
To install the package, contact technical support to obtain the required permissions.
Create a command
Log on to the DataWorks console. In the left-side navigation pane, click Resource Group. The Exclusive Resource Groups tab appears on the Resource Groups page.
Find the Data Scheduling exclusive resource group. Click the
icon in the Actions column and then click O&M Assistant. On the O&M Assistant page, click Create Command. Configure the following key parameters and click OK.
Parameter
Description
Command Type
The type of the command. Select Manual Installation.
Command Content
The content of the command. Enter the following content:
wget -P /home/admin/usertools/tools/ https://dlc-release.oss-cn-zhangjiakou.aliyuncs.com/console/public/latest/dlc --no-check-certificate chmod +x /home/admin/usertools/tools/dlcInstallation Directories
The directory used for installation. Save the command to the /home/admin/usertools/tools/ directory.
Timeout
The timeout period of the command. Unit: seconds. If the command times out, the system forcibly stops the command. We recommend that you set the parameter to 60.
Run a command.
On the O&M Assistant page, find the command and click Run command in the Actions column.

In the Run command panel, click Run.
View the command execution results.
On the O&M Assistant page, find the command and click View Result in the Actions column.

In the Command Execution Result dialog box, view the command execution result. If the execution progress is 100%, the DLC package is installed.

Step 4: Create a workflow
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Move the pointer over the
icon and choose . In the Create Node dialog box, configure the Name and Path parameters. Click Confirm to create the node.
Step 5: Submit a job for testing
To configure automatic job submission at specific points in time, a job node is required. Before you submit a job, create an initial job node and run a smoke test on the node. If an initial node is available, go to Step 6.
Modify the deployment script.
On the tab of the workflow, double-click the created Shell node. In this example, double-click the Deployment node.
On the Shell node tab, enter the following commands:
# Generate a job description file. cat << EOF > jobfile name=dataworks-job workers=1 worker_spec=ecs.g6.large worker_image=registry-vpc.cn-hangzhou.aliyuncs.com/pai-dlc/pytorch-training:1.7.1-gpu-py37-cu110-ubuntu18.04 command=echo $(date) EOF # Submit the job. /home/admin/usertools/tools/dlc submit pytorchjob\ --access_id=<access_id> \ --access_key=<access_key> \ --endpoint=pai-dlc.cn-hangzhou.aliyuncs.com \ --region=cn-hangzhou \ --job_file=./jobfile \ --interactivejobfile indicates the job description file. For more information about the job configuration, see Commands used to submit jobs. You must configure the endpoint parameter based on the region where you want to deploy the job.
Region
Endpoint
China (Shanghai)
pai-dlc.cn-shanghai.aliyuncs.com
China (Beijing)
pai-dlc.cn-beijing.aliyuncs.com
China (Hangzhou)
pai-dlc.cn-hangzhou.aliyuncs.com
China (Shenzhen)
pai-dlc.cn-shenzhen.aliyuncs.com
China (Hong Kong)
pai-dlc.cn-hongkong.aliyuncs.com
Singapore
pai-dlc.ap-southeast-1.aliyuncs.com
Malaysia (Kuala Lumpur)
pai-dlc.ap-southeast-3.aliyuncs.com
Germany (Frankfurt)
pai-dlc.eu-central-1.aliyun.cs.com
Run the script.
In the upper part of the Shell node tab, click the
icon. In the Warning message, click Continue to Run.
In the Runtime Parameters dialog box, set the Resource Group parameter to the created exclusive resource group.
Then, click OK.
After the script is run, a job is generated. You can go to the DLC page of the default workspace to view the generated job.
Step 6: Perform job scheduling
Run the scheduling job.
In the right-side pane of the Shell node tab, click the Properties tab.
In the Schedule section of the Properties page, configure the Scheduling Cycle and Rerun parameters.
In the Dependencies section, click Use Root Node next to the Parent Nodes field.
Configure dependencies. For more information, see Configure same-cycle scheduling dependencies.
On the Shell node tab, click the
icon to save the configurations. On the Shell node tab, click the
icon to commit the scheduled node.
View the instances of the scheduling node.
In the upper-right corner of the Shell node tab, click Operation Center.
On the Operation Center page, choose .
On the instance list page, view the scheduled time for automatic job submission in the Schedule column.
In the Actions column, choose to view the operations logs of each scheduled job submission.
References
You can view and manage DLC jobs that are submitted in the PAI console.