Machine Learning Designer is a visualized modeling tool that is provided by Platform for AI (PAI) to implement end-to-end machine learning development. DataWorks provides PAI Designer nodes. You can use the nodes to load pipelines of Machine Learning Designer. This way, pipeline tasks can be periodically scheduled based on the scheduling configurations of the PAI Designer nodes.
Prerequisites
DataWorks is authorized to access PAI.
You can perform one-click authorization on the authorization page. For more information about the policy, see Role 1: AliyunServiceRoleForDataworksEngine. Only an Alibaba Cloud account or a RAM user to which the AliyunDataWorksFullAccess policy is attached can perform one-click authorization.
A workflow is created.
In DataStudio, development operations are performed on different development engines based on workflows. Therefore, you must create a workflow before you can create a node. For more information, see Create a workflow.
Step 1: Create a PAI Designer node
Go to the DataStudio page.
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.
Right-click the desired workflow and choose .
In the Create Node dialog box, configure the Name and Path parameters and click Confirm. You can develop and configure the related pipeline task on the configuration tab of the node later.
Step 2: Develop tasks on the PAI Designer node
Develop a task: Write a Machine Learning Designer pipeline
If you want to load an existing pipeline when you edit the PAI Designer node, you must create a pipeline in PAI in advance. This way, you can load the created pipeline by searching for the pipeline by name. On the configuration tab of the PAI Designer node, you can create a pipeline by using one of the following methods:
Create a blank pipeline.
You can create a blank pipeline, add components, and perform drag-and-drop operations on the components to build a model based on your business requirements. For more information, see Create a blank pipeline.
Create a preset template.
Machine Learning Designer provides preset templates for you to quickly create pipelines that are similar to the templates. You can modify components in a preset template or the configurations of components to build a model. For more information, see Create a pipeline from a preset template.
Create a custom template.
You can save a stable pipeline as a custom template for other members in your workspace to use and edit. For more information, see Create a pipeline from a custom template.
For information about how to create a preset template for visualized data modeling, see Use DataWorks tasks to schedule pipelines in Machine Learning Designer.
For information about how to create a custom pipeline for visualized data modeling, see Create a custom pipeline.
Develop SQL code: Use scheduling parameters
DataWorks provides scheduling parameters whose values are dynamically replaced in the code of a node based on the configurations of the scheduling parameters in periodic scheduling scenarios. You can define variables in the node code in the ${Variable} format and assign values to the variables in the Scheduling Parameter section of the Properties tab. For information about the supported formats of scheduling parameters, see Supported formats of scheduling parameters.
Sample code of scheduling parameters:
--command='echo '\''${Variable}'\'';' \ --Scheduling parameters are supported.Step 3: Configure task scheduling properties
If you want to periodically run tasks on the created node, click Properties in the right-side navigation pane of the node configuration tab to configure the scheduling information of the node based on your business requirements. For more information, see Overview.
You must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the node.
Step 4: Debug the node code
You can perform the following operations to check whether the node is configured as expected based on your business requirements:
Optional. Select a resource group and assign custom parameters to variables.
Click the
icon in the top toolbar of the configuration tab of the node. In the Parameters dialog box, select a resource group for scheduling that you want to use to debug and run tasks on the node. If you use scheduling parameters in your node code, you can assign the scheduling parameters to variables as values in the node code for debugging. For more information about the assignment logic of scheduling parameters, see What are the differences in the value assignment logic of scheduling parameters among the Run, Run with Parameters, and Perform Smoke Testing in Development Environment modes?
Save and execute SQL statements.
In the top toolbar, click the
icon to save SQL statements. Then, click the
icon to execute the SQL statements. Optional. Perform smoke testing.
When you commit the node or after you commit the node, you can perform smoke testing on the node in the development environment to check whether the node is run as expected. For more information, see Perform smoke testing.
Step 5: Commit the node
After the node is configured, you must commit and deploy the node. After you commit and deploy the node, the system periodically runs tasks on the node on a regular basis based on scheduling configurations.
Click the
icon in the top toolbar to save the code. Click the
icon in the top toolbar to commit the node. In the Submit dialog box, configure the Change description parameter. Then, determine whether to review node code after you commit the node based on your business requirements.
NoteYou must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the node.
You can use the code review feature to ensure the code quality of nodes and prevent execution errors caused by invalid node code. If you enable the code review feature, the node code that is committed can be deployed only after the node code passes the code review. For more information, see Code review.
If you use a workspace in standard mode, you must deploy the node to the production environment after you commit the node. To deploy a node, click Deploy in the upper-right corner of the configuration tab of the node. For more information, see Deploy nodes.
What to do next
After you commit and deploy the node, tasks are periodically run on the node based on the node configurations. You can click Operation Center in the upper-right corner of the configuration tab of the node to go to Operation Center and view the scheduling status of the node. For more information, see View and manage auto triggered tasks.