Global variable for pipelines - Platform For AI - Alibaba Cloud Documentation Center

When multiple components in a pipeline depend on the same value, updating that value in every component individually is error-prone. Global variables let you define a value once at the pipeline level and reference it in any component using ${variableName}. For DataWorks offline scheduling pipelines, a global variable can be bound to a DataWorks scheduling parameter so the pipeline receives the correct date on each scheduled run automatically.

How it works

Each global variable has a name and a value. To reference a variable in a component, enter ${variableName} in the relevant field.

When the pipeline runs, PAI replaces every ${variableName} reference with the variable's current value. For DataWorks offline scheduling, define the variable in PAI, then create a DataWorks scheduling parameter with the same name to replace the timed scheduling parameter at runtime.

Configure a global variable

Open a pipeline, then click a blank area on the canvas.
On the right panel, click Add Global Variable.
Enter a Variable Name and a Variable Value.
In any component where you want to use the variable, enter ${variableName} in the relevant field.

Example 1: Share a parameter across components in an online pipeline

This example uses the global variable para1 to rename the column status to ifHealth in a SQL script, without hardcoding the value in the component.

Create a pipeline based on the preset template Heart Disease Prediction, then delete all nodes except the first two.
Click a blank area on the canvas. On the right panel, set Variable Name to para1 and Variable Value to ifHealth.
Open the SQL component and reference the variable in the SQL script: ${para1}.
Run the pipeline. After the run completes, right-click the SQL component and select View Data > SQL Script Output Port. The column status is now renamed to ifHealth.

Example 2: Link an offline scheduling pipeline to DataWorks

This example uses the global variable gDate to pass a business date from DataWorks into the pipeline at runtime, so the pipeline automatically queries data for the correct date on each scheduled run.

Create a table named dwtest on the MaxCompute console. For the table schema, see SQL references.
Create a pipeline, then click a blank area on the canvas and create a global variable named gDate.
Configure the pipeline components:
- Read Table: Set Table Name to dwtest.
- SQL Script: Reference gDate in the SQL script:
```
select * from ${t1} where ds=${gDate}
```
Run the pipeline. After the run completes, right-click the SQL Script component and select View Data > SQL Script Output Port to verify the data for gDate.
Click Periodic Scheduling in the upper-left corner of the canvas, then click Create Scheduling Node to open DataWorks. In the Create Node dialog, enter a name and click Confirm.
Select PAI Designer Experiment, then click Properties on the right side of the screen. Configure the following settings (leave all others as default):
- Scheduling Parameter: Add a parameter named gDate with the value $bizdate. The scheduling parameter name must match the global variable name exactly.
- Schedule: Set Rerun to Allow Regardless of Running Status.
- Dependencies: Select Add Root Node.
In the toolbar, click the and icons to save and submit the node.
Click Operation Center at the top of the page to view the running status and operation log. From here, you can also trigger data backfill and pipeline trial runs. For details, see View and manage auto triggered tasks.

Platform For AI:Global variables

How it works

Configure a global variable

Example 1: Share a parameter across components in an online pipeline

Example 2: Link an offline scheduling pipeline to DataWorks

What's next