Reuse SQL Logic with Script Templates in DataWorks - DataWorks

How it works

A script template works like a function in a programming language: define the logic once, call it repeatedly with different arguments. The template body contains @@{parameter_name} placeholders for variable table names and string values. When an SQL Snippet node references the template and supplies concrete parameter values, DataWorks substitutes the placeholders and generates executable SQL.

For example, a template body might contain:

INSERT OVERWRITE TABLE @@{myoutput} PARTITION (pt='${bizdate}')
SELECT ...
FROM @@{myinputtable}
WHERE r2.rank <= @@{topn};

When the SQL Snippet node binds myinputtable to company_sales_record, myoutput to company_sales_top_n, and topn to 10, DataWorks substitutes those values and runs the resulting SQL.

Script templates work exclusively with SQL Snippet nodes. To run a template-based pipeline, create an SQL Snippet node that references the template.

Key concepts

Concept	Description
Script template	A reusable SQL code process with named input and output parameters in `@@{parameter_name}` format
SQL Snippet node	The node type that references a script template and supplies concrete parameter values
Input parameter (Table)	Binds a source table to the template. The bound table must have the same number of fields and compatible field types as the parameter definition.
Input parameter (String)	Binds a string value or variable to the template, such as a filter threshold or a province name
Output parameter	Defines the output table that the SQL code process generates. Must be of type Table.
Workspace-level template	Available only to members of the current workspace after deployment
Public template	Published to the current tenant so all tenant users can use it

Parameter definitions are for reference only. DataWorks does not enforce a schema check immediately. Mismatches between the parameter definition and the actual table schema cause a runtime error when the node runs.

Limitations

SQL Snippet nodes require DataWorks Standard Edition or a higher edition. For details, see Differences among DataWorks editions.
Workspace-level templates appear on the Workspace-Specific tab. Only workspace members can use them.
Public templates appear on the Public tab. All users within the tenant can use them.

Prerequisites

Before you begin, ensure that you have:

The Development role in the target workspace. To assign roles, see the Add a RAM user to a workspace as a member and assign roles to the member section in "Manage permissions on workspace-level services."
DataWorks Standard Edition or a higher edition

Define a script template

Step 1: Open the Snippets pane

Log on to the DataWorks console. In the top navigation bar, select the target region. In the left-side navigation pane, choose Data Development and O&M > Data Development. Select the target workspace and click Go to Data Development.
In the left-side navigation pane of the DataStudio page, click Snippets.

If Snippets is not visible, click the icon in the lower-left corner and follow the instructions in Configure settings in the DataStudio Modules section to add it.

Step 2: Create the template

In the Snippets pane, use the actions shown in the following figure to create and name a script template.

Templates created by member accounts in the current workspace appear on the Workspace-Specific tab. Templates created within the tenant appear on the Public tab.

Step 3: Configure the template

The configuration tab has three parts: the SQL code process, input parameters, and output parameters.

SQL code process

Write the SQL logic in the code editor. Replace variable table names and string values with @@{parameter_name} placeholders. DataWorks substitutes these placeholders with the actual values when an SQL Snippet node runs.

Input parameters

Add one or more input parameters. Each parameter has one of the following types:

Table: Binds a source table. Define the expected schema as a text block so that users know what schema to supply. Define the schema in the following format:

Name of Field 1  Type of Field 1  Description of Field 1
Name of Field 2  Type of Field 2  Description of Field 2
...
Name of Field n  Type of Field n  Description of Field n

Example:

area_id   string  'Region ID'
city_id   string  'City ID'
order_amt double  'Order amount'

String: Binds a string value, such as a numeric threshold or a filter condition. Optionally set a default value that is used when no value is supplied at reference time. Use cases:
- Export the top N cities by sales: use a String parameter to pass the value of N.
- Filter by province: use a String parameter to pass the province name.

Output parameters

Add one or more output parameters of type Table. Define the expected output schema for reference.

Example:

area_id   string  'Region ID'
city_id   string  'City ID'
order_amt double  'Order amount'
rank      bigint  'Ranking'

A template can contain multiple input and output parameters.

Step 4: Save and commit the template

In the toolbar on the configuration tab, click the icon to save, then click the icon to commit. After the template is committed, it is available for SQL Snippet nodes to reference.

Reference a script template

Prerequisites

Before you begin, ensure that you have:

A created script template. See Define a script template.
An SQL Snippet node. See Create and manage ODPS nodes.

Reference the template

On the configuration tab of the SQL Snippet node, follow the steps in the figure below to select and configure a script template.

Select the script template to reference.
- If a newer version of the template is available, click Update Code to pull it in.
- To inspect the template definition, click Edit Snippet.
Configure the input and output parameter values based on your pipeline's tables and requirements.

Manage script templates

Publish a template to all tenant users

After deployment, a template is workspace-level by default. To make it available to all users in the tenant, click the Publish Snippet icon (marked 1 in the figure) in the toolbar on the template's configuration tab.

View reference records

On the Snippet Nodes tab (marked 2 in the figure), see all nodes that reference the current template. Review this list before modifying a template to understand the downstream impact.

Upgrade a template

Edit the SQL code or parameter settings in the template, then save and recommit. Each recommit creates a new version. View the version history on the Versions tab (marked 3 in the figure) in the right-side navigation pane.

Impact on referencing nodes

When a template is upgraded, you can determine whether to use the latest version of the template in your SQL Snippet node. To adopt the new version, open the SQL Snippet node, click Update Code, verify that the new parameter settings work for your pipeline, and then commit and deploy the node.

Example scenario

Developer C creates version V1.0 of a template. User A references V1.0. Developer C later upgrades to V2.0. User A sees that a newer version is available, reviews the V2.0 details, and decides whether to update their node to use V2.0.

Configuration tab reference

Feature	Description
Save	Saves the current template settings
Steal Lock	Takes over the edit lock from another user so you can edit the template, if you are not the owner of the script template
Submit	Commits the template to the development environment
Publish Snippet	Makes the template available to all users in the current tenant
Parse I/O Parameters	Parses input and output parameters from the SQL code
Run	Runs the template in the development environment
Stop	Stops a running template
Format Code	Formats the SQL code by keyword
Parameters	View basic information and configure input and output parameters
Versions	View all deployed versions of the template
Snippet Nodes	View all nodes that reference the template

Best practices: top-N sales ranking

This example walks through creating a get_top_n script template that returns the top N cities by total sales in each region, then referencing it from an SQL Snippet node.

Prerequisites

Before you begin, ensure that you have:

An SQL Snippet node. See Create and manage ODPS nodes.
An input table and an output table created in an ODPS SQL node.

Step 1: Define the get_top_n template

Create a script template named get_top_n. See Define a script template for the full procedure. Use the following configuration.

Parameter settings

Category	Parameter	Type	Description	Parameter definition
Input parameter	`myinputtable`	Table	Sales data table	`area_id string`, `city_id string`, `order_amt double`, `rank bigint`
Input parameter	`topn`	String	Number of top cities to return	N/A
Output parameter	`myoutput`	Table	Rankings of the top cities per region	`area_id string`, `city_id string`, `order_amt double`, `rank bigint`

SQL code process

The template uses @@{myinputtable}, @@{myoutput}, and @@{topn} as placeholders. DataWorks substitutes them with the bound values when the node runs.

INSERT OVERWRITE TABLE  @@{myoutput}  PARTITION (pt='${bizdate}')
 SELECT r3.area_id,
 r3.city_id,
 r3.order_amt,
 r3.rank
from (
SELECT
   area_id,
   city_id,
   rank,
   order_amt_1505468133993_sum as order_amt ,
   order_number_1505468133991_sum,
   profit_amt_1505468134000_sum
FROM
   (SELECT
   area_id,
   city_id,
   ROW_NUMBER() OVER (PARTITION BY r1.area_id ORDER BY r1.order_amt_1505468133993_sum DESC) AS rank,
   order_amt_1505468133993_sum,
   order_number_1505468133991_sum,
   profit_amt_1505468134000_sum
FROM
   (SELECT
   area AS area_id,
   city AS city_id,
   SUM(order_amt) AS order_amt_1505468133993_sum,
   SUM(order_number) AS order_number_1505468133991_sum,
   SUM(profit_amt) AS profit_amt_1505468134000_sum
FROM
  @@{myinputtable}
WHERE
   SUBSTR(pt, 1, 8) IN ( '${bizdate}' )
GROUP BY
   area,
   city )
   r1 ) r2
WHERE
   r2.rank >= 1 AND r2.rank <= @@{topn}
ORDER BY
   area_id,
   rank limit 10000) r3;

Step 2: Reference the template from an SQL Snippet node

Create an SQL Snippet node named xc_Referenced script template_get_top_n, select the get_top_n template, and configure parameters as shown below.

Parameter configuration

myinputtable: bind to company_sales_record, a partitioned table with a 365-day lifecycle.

CREATE TABLE IF NOT EXISTS company_sales_record
(
    order_id         STRING COMMENT 'Order ID (PK)',
    report_date      STRING COMMENT 'Order generation date',
    customer_name    STRING COMMENT 'Customer name',
    order_level      STRING COMMENT 'Order level',
    order_number     DOUBLE COMMENT 'Number of orders',
    order_amt        DOUBLE COMMENT 'Order amount',
    back_point       DOUBLE COMMENT 'Discount',
    shipping_type    STRING COMMENT 'Transportation method',
    profit_amt       DOUBLE COMMENT 'Amount of profit',
    price            DOUBLE COMMENT 'Unit price',
    shipping_cost    DOUBLE COMMENT 'Cost of transportation',
    area             STRING COMMENT 'Region',
    province         STRING COMMENT 'Province',
    city             STRING COMMENT 'City',
    product_type     STRING COMMENT 'Product type',
    product_sub_type STRING COMMENT 'Product subtype',
    product_name     STRING COMMENT 'Product name',
    product_box      STRING COMMENT 'Product packaging',
    shipping_date    STRING COMMENT 'Date of transportation'
)
COMMENT 'Detailed sales data'
PARTITIONED BY
(
    pt               STRING
)
LIFECYCLE 365;

topn: set to 10 to return the top 10 cities by total sales in each region.

myoutput: bind to company_sales_top_n.

CREATE TABLE IF NOT EXISTS company_sales_top_n
(
area STRING COMMENT 'Region',
city STRING COMMENT 'City',
sales_amount DOUBLE COMMENT 'Sales amount',
rank BIGINT COMMENT 'Ranking'
)
COMMENT 'Company sales rankings'
PARTITIONED BY (pt STRING COMMENT '')
LIFECYCLE 365;

What's next

After completing development, configure the node for production:

Configure scheduling properties: Set up periodic scheduling, rerun settings, and scheduling dependencies. See Overview.
Debug the node: Test the code logic before deployment. See Debugging procedure.
Deploy the node: Deploy to activate periodic scheduling. See Deploy nodes.