In common data development scenarios, the code of different types of nodes may be subject to change from time to time. You must dynamically modify the values of some parameters, such as the date and time, based on the requirement changes and time changes. In this case, you can use the scheduling parameter configuration feature of DataWorks.

After relevant parameters are configured, auto triggered nodes can automatically parse the code to obtain required data. Configurable parameters in DataWorks are classified into system parameters and custom parameters. We recommend that you use custom parameters.

Parameter types

Configurable scheduling parameters in DataWorks are classified into system parameters and custom parameters.
  • You can directly reference system parameters in the code.
  • To use custom parameters, assign variables as values to the parameters and reference the variable names in the code.
Parameter type Applicable to Configuration method Parameter value Example
System parameters: including bdp.system.bizdate and bdp.system.cyctime All nodes Reference ${bdp.system.bizdate} and ${bdp.system.cyctime} in the code. None None
Non-system parameters: custom parameters (recommended) ODPS SQL nodes and sync nodes Reference ${key1} and ${key2} in the code. key1=value1 key2=value2
  • Constant parameters: key1="abc" key2=1234
  • Variables: key1=${yyyymmdd}, the value of which is calculated based on the value of bdp.system.bizdate

    key2=$[yyyy-mm-dd hh24:mi:ss], the value of which is calculated based on the value of bdp.system.cyctime

PyODPS nodes Add a dictionary object named args to the global variable: args=['key1'] args=['key2'].
Shell nodes Reference $1, $2, ... in the code. value1 value2
  • Constant parameters: "abc" 1234
  • Variables: ${yyyymmdd}, the value of which is calculated based on the value of bdp.system.bizdate

    $[yyyy-mm-dd hh24:mi:ss], the value of which is calculated based on the value of bdp.system.cyctime

System parameters

DataWorks provides the following system parameters:
  • ${bdp.system.cyctime}: the scheduled time to run an instance. Default format: yyyymmddhh24miss.

    This parameter can specify the hour and minutes of the scheduled time. The value of this parameter is the same as that of the cyctime parameter.

  • ${bdp.system.bizdate}: the timestamp of data to be analyzed by an instance. Default format: yyyymmdd. The default data timestamp is one day before the scheduled time. The value of this parameter is the same as that of the bizdate parameter.

The formula for calculating the scheduled time and data timestamp is as follows: Scheduled time = Data timestamp + 1.

The following figure shows how to assign values to system parameters and custom parameters of an ODPS SQL node, and how to call the parameters in the code. The left section is the code compilation area, and the right section is the parameter configuration area.

You can click the Properties tab in the right-side navigation pane, and assign values to customer parameters in the Arguments field in the General section. Note the following issues when configuring parameters:
  • When configuring a parameter in the format of Variable name=Parameter, do not add spaces on either side of the equal sign (=) for the parameter. For example, enter bizdate=$bizdate.
  • Separate multiple parameters (if any) with spaces. To add spaces to a parameter value, divide the parameter value into multiple variables and assign values separately. Multiple variables can be separated with spaces in the code. For example, enter bizdate=$bizdate datetime=${yyyymmdd}.

Custom parameters

DataWorks supports the following custom parameters: constant parameters, built-in parameters bizdate and cyctime, ${...}, and $[...].

Note For more information about how to reference parameters for different types of nodes, see the Parameter types section.

The following section uses an ODPS SQL node as an example and describes how to assign values to custom parameters for the node on the Properties tab of the node configuration tab.

Assume that the current date is November 1, 2019, and the node is scheduled to run at 00:00 every day. The following table lists various custom parameters and their values.
Parameter Sample code Assigned value Replaced value
${yyyymmdd} pt=${datetime1} datetime1=${yyyymmdd} datetime1=20191031
$[yyyymmddhh24miss] pt=${datetime2} datetime2=$[yyyymmddhh24miss] datetime2=201911010000
$bizdate: the data timestamp. pt=${datetime3} datetime3=$bizdate datetime3=20191031
$cyctime: the scheduled time, which is accurate to seconds. pt=${datetime4} datetime4=$cyctime datetime4=201911010000
$gmtdate: the scheduled time, which is accurate to the day. pt=${datetime5} datetime5=$gmtdate datetime5=20191101
$bizmonth: the month of the data timestamp. pt=${datetime6} datetime6=$bizmonth
  • If the current date is November 1, 2019, the value is replaced as follows: datetime6=201910.
  • If the selected data timestamp is November 2, 2019, the value is replaced as follows: datetime6=201910.
  • If the selected data timestamp is October 31, 2019, the value is replaced as follows: datetime6=201910.
Note Do not add spaces on either side of the equal sign (=) for a parameter.
  • Built-in parameters
    Parameter Description
    $jobid The ID of the workflow to which a node belongs. Example: jobid=$jobid.
    $nodeid The ID of a node. Example: nodeid=$nodeid.
    $taskid The instance ID of a node. Example: taskid=$taskid.
    $bizdate The data timestamp in the format of yyyymmdd.

    This parameter is widely used. By default, the value of this parameter is one day before the scheduled time to run a node.

    $cyctime The scheduled time to run a node, in the format of yyyymmddhh24miss.

    If no scheduled time is configured for a node scheduled by day, $cyctime is set to 00:00 of the day. The time is accurate to seconds. This parameter is usually used for nodes scheduled by hour or minute. Example: cyctime=$cyctime.

    $gmtdate The current date in the format of yyyymmdd.

    By default, the value of this parameter is the current date. During retroactive data generation, the input value is the data timestamp plus one day.

    $bizmonth The month of the data timestamp, in the format of yyyymm.
    • If the month of a data timestamp is the current month, the value of $bizmonth is the month of the data timestamp minus 1.
    • Otherwise, the value of $bizmonth is the month of the data timestamp.
  • ${...} custom parameter

    You can customize a time format based on the value of $bizdate, where yyyy indicates the four-digit year, yy indicates the two-digit year, mm indicates the month, and dd indicates the day. You can use any combination of these parameters, for example, ${yyyy}, ${yyyymm}, ${yyyymmdd}, and ${yyyy-mm-dd}.

    $bizdate is accurate to the day. Therefore, ${...} can only specify the year, month, or day.

    The following table describes how to specify other intervals based on $bizdate.
    Interval Expression
    N years later ${yyyy+N}
    N years before ${yyyy-N}
    N months later ${yyyymm+N}
    N months before ${yyyymm-N}
    N weeks later ${yyyymmdd+7*N}
    N weeks before ${yyyymmdd-7*N}
    N days later ${yyyymmdd+N}
    N days before ${yyyymmdd-N}
    N days after the specified date ${yyyymmdd+N}
    N days before the specified date ${yyyymmdd-N}
    N years after the specified year, in the format of yyyy ${yyyy+N}
    N years before the specified year, in the format of yyyy ${yyyy-N}
    N years after the specified year, in the format of yy ${yy+N}
    N years before the specified year, in the format of yy ${yy-N}
  • $[...] custom parameter

    You can customize a time format based on the value of $cyctime, where yyyy indicates the four-digit year, yy indicates the two-digit year, mm indicates the month, dd indicates the day, hh24 indicates the hour in the 24-hour format, hh indicates the hour in the 12-hour format, mi indicates the minutes, and ss indicates the seconds. You can use any combination of these parameters, for example, $[yyyymmdd], $[yyyy-mm-dd], $[hh24miss], $[hh24:mi:ss], and $[yyyymmddhh24miss].

    $cyctime is accurate to seconds. Therefore, $[...] can specify the hour, minutes, or seconds.

    $cyctime is accurate to seconds. Therefore, $[...] can specify the hour, minutes, or seconds.

    The following table describes how to specify other intervals based on $cyctime.
    Interval Expression
    N years later $[add_months(yyyymmdd,12*N)]
    N years before $[add_months(yyyymmdd,-12*N)]
    N months later $[add_months(yyyymmdd,N)]
    N months before $[add_months(yyyymmdd,-N)]
    N weeks later $[yyyymmdd+7*N]
    N weeks before $[yyyymmdd-7*N]
    N days later $[yyyymmdd+N]
    N days before $[yyyymmdd-N]
    N hours later $[hh24miss+N/24]
    N hours before $[hh24miss-N/24]
    N minutes later $[hh24miss+N/24/60]
    N minutes before $[hh24miss-N/24/60]

$[yyyymmddhh24miss] specifies the date (yyyymmdd) and time (hh24miss) as a single value, for example, 201911210000.

If you want to add a space between the date and time, configure two variables and separate them with a space. On the Properties tab of the node configuration tab, configure the %$[yyyymmdd]; and $[hh24miss] variables.Properties tab
Note DataWorks supports the immediate instance generation and daylight saving time-based parameter computing features so that nodes can run properly when the daylight saving time begins or ends. Assume that the time zone is UTC-8:
  • When the daylight saving time begins, 23 instances are generated on that day. 10 minutes before 03:00 is 01:50.
  • When the daylight saving time ends, 24 instances are generated on that day. 10 minutes before 03:00 is 02:50.

If a node scheduled by day, week, or month is scheduled to run within the skipped period of the day when the daylight saving time begins, a node instance is generated and run at 00:00 on that day.

Differences between the time parameters configured by using $[] and ${}

  • $bizdate: the data timestamp. By default, the value of this parameter is one day before the scheduled time to run a node.
  • $cyctime: the scheduled time to run a node. If no scheduled time is configured for a node scheduled by day, $cyctime is set to 00:00 of the day. The time is accurate to seconds. This parameter is usually used for nodes scheduled by hour or minute.

    For example, the scheduled time is 00:30 on the current day, that is, yyyy-mm-dd 00:30:00.

  • If a time parameter is configured by using ${}, $bizdate is used as the benchmark for running nodes. During retroactive data generation, the time parameter is replaced with the data timestamp selected.
  • If a time parameter is configured by using $[], $cyctime is used as the benchmark for running nodes. The time is calculated in the same way as the time in Oracle. During retroactive data generation, the time parameter is replaced with the data timestamp selected plus one day.

    For example, if the data timestamp is set to 20190720 for retroactive data generation, cyctime is replaced with 20190721.

Assume that $cyctime=20190720103000:
  • $[yyyy]=2019, $[yy]=19, $[mm]=07, $[dd]=20, $[yyyy-mm-dd]=2019-07-20, $[hh24:mi:ss]=10:30:00, $[yyyy-mm-dd] $[hh24:mi:ss]=2019-07-20 10:30:00
  • $[hh24:mi:ss-1/24]=09:30:00
  • $[yyyy-mm-dd] $[hh24:mi:ss -1/24/60]=2019-07-20 10:29:00
  • $[yyyy-mm-dd] $[hh24:mi:ss -1/24] = 2019-07-20 09:30:00
  • $[add_months(yyyymmdd,-1)]=2019-06-20
  • $[add_months(yyyymmdd,-12*1)]=2019-07-20
  • $[hh24]=10
  • $[mi]=30
Examples of the time parameters configured by using ${}
  • The code of an ODPS SQL node includes pt=${datetime}, and the parameter configured for the node is datetime=${yyyy-mm-dd}. If the node is run on July 20, 2019, ${yyyy-mm-dd} is replaced as follows: pt=2019-07-20.
  • The code of an ODPS SQL node includes pt=${datetime}, and the parameter configured for the node is datetime=${yyyymmdd-2}. If the node is run on July 20, 2019, ${yyyymmdd-2} is replaced as follows: pt=20190718.
  • The code of an ODPS SQL node includes pt=${datetime}, and the parameter configured for the node is datetime=${yyyymm-2}. If the node is run on July 20, 2019, ${yyyymm-2} is replaced as follows: pt=201905.
  • The code of an ODPS SQL node includes pt=${datetime}, and the parameter configured for the node is datetime=${yyyy-2}. If the node is run on July 20, 2019, ${yyyy-2} is replaced as follows: pt=2017.

If you assign values to multiple parameters when configuring an ODPS SQL node, separate the parameters with spaces. For example, enter startdatetime=$bizdate enddatetime=${yyyymmdd+1} starttime=${yyyy-mm-dd} endtime=${yyyy-mm-dd+1}.

Test scheduling parameters

On the node configuration tab, you can click the Run or Run with Arguments icon in the top navigation bar. The configured values are assigned to the variables in the code. You cannot check whether the parameter values configured on the Properties tab are as expected.

The parameter values are replaced only when the node is run in Operation Center. To test the parameter values on the node configuration tab, click the Run Smoke Test in Development Environment icon in the top navigation bar. In the dialog box that appears, you can enter a data timestamp to simulate automatic node scheduling and obtain the replaced values of scheduling parameters at the specified data timestamp.

Note
  • When running a smoke test for a node in the development environment, if you modify the scheduling parameters of the node, you must save the node and commit the node in the development environment again. In this case, the code is updated.
  • When you run a smoke test in the development environment, a fee is charged for the generated test instance.

View the replaced parameter values in Operation Center

On the DataStudio page, click the DataWorks icon in the upper-left corner and choose All Products > Operation Center.

In the left-side navigation pane, choose Cycle Task Maintenance > Cycle Task to view the dependencies and scheduling configuration of a node in the production environment.

Each time you deploy a node to the production environment, you can check whether the node is configured as expected. For example, you can check the values of the scheduling parameters.

For auto triggered nodes, you can check whether the replaced values of the scheduling parameters of the instances generated for each node every day meet your expectations. The values of the scheduling parameters of the instances generated for auto triggered nodes are replaced when the instances are generated. If an error occurs in the business logic, right-click the target node in the directed acyclic graph (DAG) and select View Node Details. On the Node Information page that appears, check the replaced values of the scheduling parameters of ancestor and descendant nodes.

If you modify the scheduling parameters of an auto triggered node on the DataStudio page, commit the node, and then deploy the node to the production environment, you must check whether the parameter values of the node meet your expectations in the Arguments field on the Node Information page.

Notice
  • Do not rerun instances that have run successfully or failed. After you modify the scheduling parameters of a node, instances are generated for the node again.
  • If the recurrence is modified for a node whose instances are generated immediately after the node is deployed, the modified scheduling parameters take effect for instances that are not run in the production environment. For more information about how to generate instances immediately after nodes are deployed, see Immediate instance generation.

Examples

    • If you use system parameters, you can directly reference them in the code, instead of configuring them in the Arguments field on the Properties tab.
    • If you use custom parameters, you must assign values in the format of Variable name=Custom parameter in the Arguments field on the Properties tab, and reference the variable names in the code.
    • The parameter configuration procedure of a Shell node is similar to that of an ODPS SQL node, except that the variable naming rules are different.
    • Variable names for a Shell node cannot be customized, but must follow the $1,$2,$3... format. If the number of parameters in a Shell node reaches 10, use ${10} to declare the tenth variable.
  • To prevent code intrusion, the ${param_name} string of a PyODPS node is not replaced in the code. However, before the code is run, you can add a dictionary object named args to the global variable to obtain the values of the scheduling parameters.

    For example, on the Properties tab of the node configuration tab, enter def=${yyyymmdd} in the Arguments field in the General section. The following figure shows how to obtain the value of the parameter in the code.

FAQ

  • Q: The table partition format is pt=yyyy-mm-dd hh24:mi:ss, but spaces are not allowed in scheduling parameters. How can I configure the format of $[yyyy-mm-dd hh24:mi:ss]?

    A: Use the custom variables datetime=$[yyyy-mm-dd] and hour=$[hh24:mi:ss] to obtain the date and time, respectively. Then, join them together to form pt=${datetime} ${hour} in the code.

    Note Separate the two variables with a space.
  • Q: The table partition is pt=${datetime} ${hour} in the code. To obtain the data for the last hour when the node is run, the custom variables datetime=$[yyyymmdd] and hour=$[hh24-1/24] can be used to obtain the date and time, respectively. However, for an instance running at 00:00, it analyzes data for 23:00 of the current day, instead of 23:00 of the previous day. What measures can I take in this case?
    A: Modify the formula of datetime to $[yyyymmdd-1/24] and keep the formula of hour unchanged at $[hh24-1/24]. The node is run as follows:
    • For an instance that is scheduled to run at 2015-10-27 00:00:00, the values of $[yyyymmdd-1/24] and $[hh24-1/24] are 20151026 and 23, respectively. This is because the scheduled time minus 1 hour is a time value that belongs to yesterday.
    • For an instance that is scheduled to run at 2015-10-27 01:00:00, the values of $[yyyymmdd-1/24] and $[hh24-1/24] are 20151027 and 00, respectively. This is because the scheduled time minus 1 hour is a time value that belongs to the current day.
DataWorks offers the following node running modes:
  • You can run a node in one of the following ways on the DataStudio page:
    • Run: After you click the Run icon for the first time, you must manually assign values to the variables in the code. DataWorks records these values. If you modify the code, the variables still use the values assigned when the node is run for the first time.
    • Run with Arguments: You must manually assign value to the variables in the code.
    • Run Smoke Test in Development Environment: You can enter a data timestamp to simulate automatic node scheduling and obtain the replaced values of scheduling parameters at the specified data timestamp.
  • Run a node in the production environment: The scheduling system automatically replaces the values of scheduling parameters, including system parameters and custom parameters, based on the scheduled running time of the current instance.
  • Test a node or generate retroactive data: You must specify the data timestamp. The scheduled time of each instance can be calculated according to the formula described earlier in this topic.