All Products
Search
Document Center

DataWorks:PAI DLC node

Last Updated:Jun 17, 2026

Deep Learning Containers (DLC) from Platform for AI (PAI) provides a flexible, stable, easy-to-use, and high-performance environment for distributed training. DataWorks offers a PAI DLC node that allows you to directly load DLC tasks and configure scheduling dependencies to run them periodically.

Prerequisites

  • You have authorized DataWorks to access Platform for AI (PAI).

    You can go to the authorization page to grant the required permissions in a single click. For details about the policy, see AliyunServiceRoleForDataWorksEngine. Only an Alibaba Cloud account or a RAM user with the Aliyundataworksfullaccess policy can perform this one-click authorization.

  • You have created a project directory. For more information, see Project directory.

  • You have created a PAI DLC node. For more information, see Create a node in a workflow.

Procedure

  1. On the PAI DLC node editing page, develop the task code.

    Develop task code

    You can write the DLC task code in either of the following ways based on your business requirements:

    From existing task

    Search for an existing DLC task in Platform for AI (PAI) by name and load it. After the task is loaded, the PAI DLC node editor generates the node code from the task's PAI configuration. You can then edit this code.

    image

    Note

    From scratch

    Write the task code directly in the PAI DLC node editor in DataWorks.

    You can define variables by using the ${variable_name} format. Then, in the Scheduling Settings pane on the right, assign values to these variables in the Scheduling Parameters section. This allows you to pass dynamic parameters to the code for scheduled runs. For more information about how to use scheduling parameters, see Sources and expressions of scheduling parameters. The following code is an example.

    dlc submit pytorchjob \    #Submit a PyTorch job by using DLC.
        --name=test \    #The DLC task name. We recommend that you use a variable name or the DataWorks node name.
        --command='echo '\''hi'\''' \    #The command to execute. In this example, the command is echo 'hi'.
        --workspace_id=309801 \   #The workspace in which the task runs.
        --priority=1 \   #The task priority. Valid values: 1 to 9. 1 indicates the lowest priority, and 9 indicates the highest priority.
        --workers=1 \    #The number of task nodes. If this value is greater than 1, the task is a distributed task that can run concurrently on multiple nodes.
        --worker_image=<image> \   #The image path for the worker.
        --image_repo_username=<username> \   #The username for private image authorization.
        --image_repo_password=<password> \   #The password for private image authorization.
        --data_source_uris=oss://oss-cn-shenzhen.aliyuncs.com/::/mnt/data/:{mountType:jindo} \   #Mount an OSS data source to the specified path in the container. In this example, the mount type is jindo.
        --worker_spec=ecs.g6.xlarge   #The node specification, which specifies the compute node type. 

  2. After you finish writing the PAI DLC task, run the node task.

    1. In Run Configuration, select and configure the Resource Group.

      Select a resource group for scheduling that has passed the connectivity test with your data source. For more information, see Create and manage resource groups.

    2. On the toolbar, click Run to run the node task.

  3. To run the node task on a regular basis, configure the schedule settings based on your business requirements. For more information, see Configure schedule settings.

  4. After the node task is configured, you must deploy the node. For more information, see Deploy nodes.

  5. After the task is deployed, you can go to Operation Center to view the running status of the scheduled task. For more information, see View scheduled tasks.