Offline scheduling for a PAI Designer pipeline - Platform For AI

You can integrate PAI Designer with DataWorks to schedule pipelines for offline execution. This helps you periodically update models and automate your model training process. This topic describes how to use DataWorks to schedule a PAI Designer pipeline and how to automatically sync PAI models to OSS as part of the scheduled task.

Prerequisites

All nodes in the pipeline must have run successfully.
You have activated DataWorks and created a workflow. For more information, see Create a workflow.

The workflow must be in the same workspace as the PAI Designer pipeline. Otherwise, you cannot select the created workflow from the Path drop-down list when you create an offline scheduling task.
If your DataWorks workspace is in standard mode, you must sync the offline-trained model to the production environment before you can run a periodic scheduling task. This is because MaxCompute data is isolated between the development and production environments. For more information, see Periodically schedule a batch prediction pipeline.

Procedure

Note

The ratio of PAI Designer pipelines to DataWorks Designer nodes is 1:N. You can create multiple Designer-type nodes in DataWorks based on a single PAI Designer pipeline.

Go to Visualized Modeling, select a workspace to open the PAI Designer page, and then double-click the target pipeline to open it.
(Optional) If you need to synchronize a Designer model to OSS during periodic scheduling, you can add a model export component.
1. On the Pipeline Attributes tab, set the Data Storage parameter to the OSS path where you want to store the model.
2. To export a model file in PMML format, click the target model component (for example, Logistic Regression for Binary Classification), go to the Field Settings tab, and select Whether To Generate PMML.
  
  Note
  You can skip this step if the component does not support this feature or if you do not need a PMML file.
3. Connect a Model Export component downstream of the model component. For configuration details, see Model Export.
Use DataWorks to schedule the offline execution of the PAI Designer pipeline.
1. In the upper-left corner of the canvas, click Periodic Scheduling, and then click Create Scheduling Node. This redirects you to DataWorks for offline scheduling. In the Create Node dialog box, enter a name for the node and click Confirm.
2. On the node editing page, select your PAI Designer pipeline from the Select PAI Designer Pipeline drop-down list.
  
  If you need to modify the PAI Designer pipeline, click Edit in PAI Designer to go to the pipeline editing page. After you select a pipeline, you can click Reload to refresh its content, or click Edit in PAI Designer to edit the pipeline in PAI Designer.
3. Click Properties on the right side of the node editing area to configure the scheduling properties for the node. For more information, see Configure node scheduling properties.
  
  The Properties panel includes sections such as General, Scheduling Parameter, Schedule, Resource Group, and Dependencies. You can configure the scheduling cycle in the Schedule section. DataWorks then automatically runs the node task based on the configured cycle.
  
  Note
  DataWorks scheduling may occasionally report a "Start Container timeout" error. This is usually an intermittent timeout issue. We recommend that you enable Auto Rerun upon Failure when you configure the scheduling properties. When enabled, the scheduling system automatically retries a failed task (excluding manual terminations) based on the configured number of retries and interval.
4. Click the and icons in the toolbar in sequence. Follow the on-screen instructions to save and commit the node.
  
  If your workspace is in standard mode, after the node is committed, click Deploy at the top of the page. For more information, see Deploy nodes.
5. Click Operation Center at the top of the page to view the running status and operational logs of the machine learning task.
  
  You can also perform operations such as backfill and pipeline test runs. For more information, see Manage scheduled tasks.

References

Model prediction and deployment.
PAI Designer allows you to use the Update EAS Service (Beta) component to update an online model service on a schedule. For more information, see Schedule updates for online model services.