All Products
Search
Document Center

Platform For AI:Time Window SQL

Last Updated:Apr 19, 2023

You can use the multi-date loop execution feature to execute multiple day-level SQL tasks within a certain period. For example, you can run several SQL tasks in parallel to extract daily insights from the past seven days of behavior data to backfill data for a recommendation engine.

Limits

  • This feature is applicable to only the day-level data backfill loops.

  • Disable the multi-date loop execution before you use Periodic Scheduling to schedule your pipeline. This ensures that no extra data backfill tasks are performed in the production environment, and ensures that you do not generate more than the necessary data.

  • If you set the Maximum number of concurrent parameter on the Parameters Setting tab, the settings take effect only on the node for which it is configured. If you want to run data backfill on multiple nodes, take note of the total concurrency limit supported by the resources of the current project.

Usage notes

The Time Window SQL component supports up to four inputs and one output. Take note of the following items:

  • When you draw a line between an upstream component and Time Window SQL component, the input that you select determines the mapped name of the input table. The mapped name of the input table can be t1, t2, t3, or t4. You can directly use the ${t1}, ${t2}, ${t3}, ${t4} variable to indicate the input table, instead of specifying the original table name.

  • You can use the ${o1} variable in the SQL script to indicate the output table.

  • You can use the ${lifecycle} variable in the SQL script to obtain the lifecycle settings for temporary tables in the workspace. The default lifecycle is 28 days. For more information about the lifecycle of temporary tables in a workspace, see Manage workspaces.

  • If you select Whether the system adds a create table statement, the last statement in the SQL script must be a SELECT statement. Aside from that, you can include any other SQL statement you require inside the script. The system automatically creates a temporary table to store the query results of SELECT statements. If you specify a CREATE TABLE statement in the SQL script to create a temporary table, you need to configure the lifecycle of the table. For more information, see Manage workspaces.

Configure the component in Machine Learning Designer

Machine Learning Designer allows you to configure the component in the Machine Learning Platform for AI (PAI) console. The following table describes the parameters.

Parameter

Description

Business base date

You can set this parameter in one of the following ways:

Whether to open multi-date loop execution

Multi-date loop execution is enabled by default. If multi-date loop execution is disabled, this component functions the same as the SQL script component.

Execution time window

The value can contain integers and time ranges. Separate time ranges with commas (,).

The system calculates the execution time based on the Business base date and starts subtasks at the specified time. Up to 100 subtasks can be executed.

For example, if you set the Business base date to 20230210 and the Execution time window to (-4,-2],0, then tasks are executed on data obtained for 20230207, 20230208, and 20230210.

Maximum number of concurrent

We recommend that you do not run a large number of concurrent tasks at a time to avoid resource contention.

Date format

The value is used to generate the ${pai.system.cycledate} system variable. Valid values:

  • yyyyMMdd (default)

  • yyyy-MM-dd

  • yyyy/MM/dd

Example: If you set the Business base date to 20230210 and the Date format to yyyy-MM-dd, then the ${pai.system.cycledate} variable in the SQL script is converted to 2023-02-10.

Whether the system adds a create table statement

  • If this feature is enabled, the last statement in the SQL script must be a SELECT statement. The system automatically creates a temporary table to store the query results of the SELECT statement.

  • If this feature is disabled, you need to create a data table ${o1} in the SQL statement to pass to the downstream.

SQL Script

If the multi-date loop execution is enabled, you need to replace the date with the system variable ${pai.system.cycledate}..

Example: if you set the Business base date to 20230210 and specify select * from ${t1} where dt=${pai.system.cycledate}, then the following three tasks are executed:

  • select * from ${t1} where dt=2022-02-07

  • select * from ${t1} where dt=2022-02-08

  • select * from ${t1} where dt=2022-02-10

Other features are the same as the SQL script component.