ODPS SQL nodes can process terabytes of data in distributed scenarios that do not require real-time processing by using the SQL-like syntax. This topic describes how to create an ODPS SQL node.

Prerequisites

A MaxCompute compute engine instance is associated with your workspace. The MaxCompute service is available in a workspace only after you associate a MaxCompute compute engine instance with the workspace on the Workspace Management page. For more information, see Configure a workspace.

Background information

In most cases, a long period of time is required from preparation to job committing. To reduce the time required, you can use ODPS SQL nodes to process thousands to tens of thousands of transactions. ODPS SQL nodes are online analytical processing (OLAP) applications that are designed to process large amounts of data..

Limits

ODPS SQL nodes have the following limits:
  • You cannot use a SET or USE statement separately in the code of an ODPS SQL node. They must be executed with other SQL statements. For example, you can use a SET statement together with a CREATE TABLE statement.
    set a=b;
    create table name(id string);
  • You cannot add comments to statements that contain keywords in the code of an ODPS SQL node. The statements include the SET and USE statements. For example, a comment is not allowed in the following code:
    create table name(id string);
    set a=b; --Comment   // You cannot add a comment. 
    create table name1(id string);
  • You cannot add comments to the end of a complete statement in the code of an ODPS SQL node. Examples:
    Note If a semicolon (;) is added to the end of an SQL statement, the SQL statement is considered complete.
    select *   --Comment   // This statement is incomplete. You can add a comment. 
    from dual;--Comment   // This statement is complete. You cannot add a comment. 
    show tables;
  • The running of an ODPS SQL node during data development and the scheduled running of an ODPS SQL node have the following differences:
    • Running during data development: All the statements that contain keywords are combined in the node code and are executed before you execute other SQL statements.
    • Scheduled running: All SQL statements are executed in sequence.
    set a=b;
    create table name1(id string);
    set c=d;
    create table name2(id string);
    The following table describes the differences between the two execution modes for the preceding SQL statements.
    SQL statement Running during data development Scheduled running
    First SQL statement
    set a=b;
    set c=d;
    create table name1(id string);
    set a=b;
    create table name1(id string);
    Second SQL statement
    set a=b;
    set c=d;
    create table name2(id string);
    set c=d;
    create table name2(id string);
  • You must configure a scheduling parameter in the format of key=value. Do not add spaces before or after the equal sign (=). Examples:
    time={yyyymmdd hh:mm:ss} -- The format is invalid.
    a =b -- The format is invalid.
  • If you use keywords such as bizdate and date as scheduling parameters, you must specify the values in the format of yyyymmdd. If you want to use other time formats, do not use the preceding keywords as scheduling parameters. Example:
    bizdate=201908 -- The format is invalid. 
  • You can use only statements that start with SELECT, READ, or WITH to query the result data of a node during data development. Otherwise, no results are returned.
  • If you want to execute multiple SQL statements, separate the SQL statements with semicolons (;). Specify each SQL statement in a separate line.
    • Incorrect example
      create table1;create table2
    • Correct example
      create table1;
      create table2;
  • If new data types are used for the additional functions of MaxCompute V2.0, you must add set odps.sql.type.system.odps2=true; before the SQL statements that use the functions, and commit and execute the code together with the SQL statements.
  • If you want to add comments to SQL statements, do not use semicolons (;) in comments.
    Incorrect example:
    create table1;  -- Create a table named table1; then, create a table named table2.
    create table2;
  • The code of an ODPS SQL node can be a maximum of 200 KB in size and can contain a maximum of 200 SQL statements.

Procedure

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region in which the workspace that you want to manage resides. Find the workspace and click DataStudio in the Actions column.
  2. In the Scheduled Workflow pane, move the pointer over the Create icon icon and choose Create Node > MaxCompute > ODPS SQL.
    Alternatively, you can click a workflow in the Scheduled Workflow pane, right-click MaxCompute, and then choose Create Node > ODPS SQL.
  3. In the Create Node dialog box, configure the Name and Path parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.
  5. On the configuration tab of the ODPS SQL node, write and run code in the code editor.
    After the node is created, write the code of the ODPS SQL node. The code must conform to the SQL syntax. For more information about SQL syntax, see Overview of MaxCompute SQL.
    Note Due to the adjustment made by the International Organization for Standardization (ISO) on the UTC+8 time zone, differences exist between the actual time and the output time when you execute related SQL statements in DataWorks. In a year from 1900 to 1928, the time difference is 352 seconds. Before the year of 1900, the time difference is 9 seconds.
    You cannot use a SET statement separately in the node code. The SET statement must be executed together with other SQL statements. For example, you can execute a SET statement together with a SELECT statement.
    set odps.sql.allow.fullscan=true;
    select 1;

    For more information about SET statements, see SET operations.

    Example: Create a table, insert data into the table, and then query data in the table.

    1. Create a table named test1.
      CREATE TABLE IF NOT EXISTS test1 
      ( id BIGINT COMMENT '' ,
        name STRING COMMENT '' ,
        age BIGINT COMMENT '' ,
        sex STRING COMMENT '');
    2. Insert data into the table.
      INSERT INTO test1 VALUES (1,'Adam',43,'Male');
      INSERT INTO test1 VALUES (1,'Carlos',32,'Male');
      INSERT INTO test1 VALUES (1,'Claire',27,'Female');
      INSERT INTO test1 VALUES (1,'Alan',24,'Male');
      INSERT INTO test1 VALUES (1,'Alice',35,'Female');
      INSERT INTO test1 VALUES (1,'Sofia',22,'Female');
      INSERT INTO test1 VALUES (1,'Raju',55,'Male');
    3. Query data in the table.
      select * from test1;
    4. After you enter the preceding SQL statements in the code editor, click the Run icon icon in the toolbar. DataWorks executes the SQL statements from top to bottom and then displays logs.
      Run icon
      Note
      • If multiple MaxCompute compute engine instances are associated with the current workspace, select a desired MaxCompute compute engine instance before you click the Run icon.
      • If the MaxCompute compute engine instance that you selected uses a shared resource group charged based on the pay-as-you-go billing method, click the Cost Estimation icon icon in the toolbar to estimate the cost. You can view your bill for the accurate expenses that are generated when you run this node.
      If you execute an INSERT INTO statement, unexpected data duplication may occur. DataWorks does not re-execute the INSERT INTO statement. However, DataWorks may rerun the nodes that contain the statement. We recommend that you do not use the INSERT INTO statement. If DataWorks runs a node that contains the INSERT INTO statement, the following information appears in run logs:
      The INSERT INTO statement may cause repeated data insertion. DataWorks does not re-execute the INSERT INTO statement, but it may rerun nodes that contain the statement. We recommend that you do not use the INSERT INTO statement.  
      If you continue to use the INSERT INTO statement, you are deemed to be aware of the risks and are willing to take the consequences of potential data duplication. 
    5. After the SQL statements are executed, click the Save icon icon in the toolbar to save the SQL code.
    6. View the query result.
      DataWorks displays the query result in a workbook.
      You can view or manage the query result in the workbook, or copy the query result to an Excel file on your on-premises machine. Query result
      Button Description
      Hide Column Select one or more columns and click Hide Column in the lower part to hide the selected columns.
      Copy Row Select one or more rows and click Copy Row in the lower part to copy the selected rows.
      Copy Column Select one or more columns and click Copy Column in the lower part to copy the selected columns.
      Copy Selected Select one or more cells in the workbook and click Copy Selected in the lower part to copy the selected cells.
      Data Analysis Click Data Analysis in the lower part to go to the workbook editing page.
      Search Click Search in the lower part to search for data in the workbook. After you click the button, a search box appears in the upper-right corner of the Result tab.
      Download Download the result data in the GBK or UTF-8 encoding format.
  6. On the node configuration tab, click the Properties tab in the right-side navigation pane and configure scheduling properties for the node. For more information, see Configure basic properties.
  7. Commit the node.
    Important You can commit the node only after you configure the Rerun and Parent Nodes parameters.
    1. Click the Submit icon icon in the toolbar.
    2. In the Commit Node dialog box, configure the Change description parameter.
      If an alert appears, indicating that the input and output that you configured do not match with those identified in the code lineage analysis process, you can choose to ignore the alert, or click Cancel and then modify the dependencies.
    3. Click OK.
    In a workspace in standard mode, you must click Deploy in the upper-right corner after you commit a node. For more information, see Deploy nodes.
  8. Perform O&M operations on the node. For more information, see O&M overview of auto triggered nodes.