This topic describes the definition and composition of script templates and how to create a script template.

Definition

A script template defines an SQL code process that involves multiple input and output parameters. Each SQL code process references one or more source tables. You can filter source table data, join source tables, and aggregate them to generate a result table required for new business.

Value

In actual business, many SQL code processes are similar. The input and output tables in these processes may have the same or compatible schema but different names. In this case, developers can abstract an SQL code process as a script template to reuse the SQL code. The script template extracts input parameters from input tables and generates output parameters in output tables.

To create SQL script templates, you can select script templates from the script template list according to your business process and configure specific input and output tables in your business for the selected script templates, without repeatedly copying the code. This greatly improves the development efficiency and avoids repeated development. You can deploy and run the created SQL script templates in the same way as other SQL nodes.

Composition

Similar to a function, a script template consists of input parameters, output parameters, and an SQL code process.

Input parameters

The input parameters of a script template have the properties such as the parameter name, parameter type, parameter description, and parameter definition. The parameter type can be table or string.
  • A table-type parameter specifies the table to be referenced in an SQL code process. When using a script template, you can specify the input table required for the specific business.
  • A string-type parameter specifies the variable control parameter in an SQL code process. For example, to export only the sales amount of the top N cities in each region in a result table of an SQL code process, you can use a string-type parameter to specify the value of N.

    To export the total sales amount of a province in a result table of an SQL code process, you can set a string-type parameter to specify the province and obtain the sales data of the specified province.

  • The parameter description specifies the role of a parameter in an SQL code process.
  • The parameter definition is a text definition of the table schema, which is required only for table-type parameters. When specifying the parameter definition for a table-type parameter, you must provide an input table that contains the same field names and compatible types defined by the table-type parameter so that the SQL code process can run properly. Otherwise, an error is returned when the SQL code process runs because the specified field name cannot be found in the input table. The input table must contain the field names and types defined by the table-type parameter. The input table can contain other fields. The field names and types in the input table can be in any order. The parameter definition is for reference only.
  • We recommend that you enter the parameter definition in the following format:
    Name of field 1 Type of field 1 Description of field 1 
    Name of field 2 Type of field 2 Description of field 2 
    Name of field n Type of field n Description of field n
    Example:
    area_id string 'Region ID' 
    city_id string 'City ID' 
    order_amt double 'Order amount'

Output parameters

  • The output parameters of a script template have the properties such as the parameter name, parameter type, parameter description, and parameter definition. The parameter type must be table. A string-type output parameter has no logical meaning.
  • A table-type parameter specifies the table to be generated in an SQL code process. When using a script template, you can specify the result table that the SQL code process generates for the specific business.
  • The parameter description specifies the role of a parameter in an SQL code process.
  • The parameter definition is a text definition of the table schema. When specifying the parameter definition for a table-type parameter, you must provide an output table that contains the same number of fields and compatible types defined by the table-type parameter so that the SQL code process can run properly. Otherwise, an error is returned when the SQL code process runs because the number of fields does not match or the field type is incompatible. The field names of the output table do not need to be consistent with those defined by the table-type parameter. The parameter definition is for reference only.
  • We recommend that you enter the parameter definition in the following format:
    Name of field 1 Type of field 1 Description of field 1 
    Name of field 2 Type of field 2 Description of field 2 
    Name of field n Type of field n Description of field n
    Example:
    area_id string 'Region ID' 
    city_id string 'City ID' 
    order_amt double 'Order amount' 
    rank bigint 'Ranking'

SQL code process

The parameters in an SQL code process are referenced in the following format: @@{Parameter name}.

By containing an abstract SQL code process, a script template controls and processes an input table based on input parameters to generate an output table with business value.

To develop an SQL code process, you must use input and output parameters in the code properly to make sure that they can be set as needed and correct SQL code can be generated and run during the process.

Create a script template

  1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces. On the Workspaces page, find the target workspace and click Data Analytics in the Actions column.
  2. In the left-side navigation pane, click Snippets.
  3. Move the pointer over Create and choose Create > Snippet.
  4. In the Create Snippet dialog box, configure Snippet Name, Description, and Location.
  5. Click Submit.

Source table schema

The following table describes the schema of a source MySQL table that contains sales data.
Field Type Description
order_id varchar The ID of the order.
report_date datetime The date of the order.
customer_name varchar The name of the customer.
order_level varchar The priority of the order.
order_number double The number of orders.
order_amt double The amount of the order.
back_point double The discount.
shipping_type varchar The method of transportation.
profit_amt double The amount of the profit.
price double The unit price.
shipping_cost double The cost of transportation.
area varchar The region.
province varchar The province.
city varchar The city.
product_type varchar The type of the product.
product_sub_type varchar The subtype of the product.
product_name varchar The name of the product.
product_box varchar The packaging of the product.
shipping_date datetime The date of transportation.

Business implication

Script template name: get_top_n

This script template uses the specified sales data table as the table-type input parameter, the number of the top cities as the string-type input parameter, and the total sales amount of the cities for ranking. Using this SQL code process, you can easily obtain the rankings of the specified top cities in each region.

Script template parameters

Input parameter 1
  • Parameter name: myinputtable
  • Type: table
Input parameter 2
  • Parameter name: topn
  • Type: string
Output parameter 3
  • Parameter name: myoutput
  • Type: table
Parameter definition:
  • area_id string
  • city_id string
  • order_amt double
  • rank bigint
A sample table creation statement is as follows:
CREATE TABLE IF NOT EXISTS company_sales_top_n
( 
area STRING COMMENT 'Region', 
city STRING COMMENT 'City', 
sales_amount DOUBLE COMMENT 'Sales amount', 
rank BIGINT COMMENT 'Ranking'
)
COMMENT 'Company sales rankings'
PARTITIONED BY (pt STRING COMMENT '')
LIFECYCLE 365;

Example of defining an SQL code process

INSERT OVERWRITE TABLE @@{myoutput} PARTITION (pt='${bizdate}')
    SELECT r3.area_id,
    r3.city_id,
    r3.order_amt,
    r3.rank
from (
SELECT
    area_id,
    city_id,
    rank,
    order_amt_1505468133993_sum as order_amt ,
    order_number_150546813****_sum,
    profit_amt_15054681****_sum
FROM
    (SELECT
    area_id,
    city_id,
    ROW_NUMBER() OVER (PARTITION BY r1.area_id ORDER BY r1.order_amt_1505468133993_sum DESC) 
AS rank,
    order_amt_15054681****_sum, 
    order_number_15054681****sum,
    profit_amt_1505468****_sum
FROM     
    (SELECT area AS area_id,
     city AS city_id,
     SUM(order_amt) AS order_amt_1505468****_sum,
     SUM(order_number) AS order_number_15054681****_sum,
     SUM(profit_amt) AS profit_amt_1505468****_sum
FROM
    @@{myinputtable}
WHERE
    SUBSTR(pt, 1, 8) IN ( '${bizdate}' )
GROUP BY 
    area,
    city )
    r1 ) r2
WHERE
    r2.rank >= 1 AND r2.rank <= @@{topn}
ORDER BY
    area_id, 
    rank limit 10000) r3;

Sharing scope

Script templates can be shared within a workspace or made public.

By default, a deployed script template is visible and available to users within the current workspace. The developer of a script template can click the Publish Snippet icon to make the general-purpose script template public to the current tenant account so that all users under the account can view and use the script template.

As shown in the following figure, you can view the Publish Snippet icon on the configuration tab of a script template. If the icon is clickable, the script template is made public.

Use of script templates

For more information about how to use a developed script template, see Use a script template.

Reference records

In the script template list, double-click a script template. On the configuration tab that appears, click the Snippet Nodes tab in the right-side navigation pane to view the reference records of the script template.Snippet Nodes tab