This topic describes how to create an ODPS SQL node. ODPS SQL nodes can process terabytes of data in distributed scenarios that do not require real-time processing by using the SQL-like syntax.

Prerequisites

The MaxCompute folder appears on the page only after you add a MaxCompute compute engine instance on the Workspace Management page. For more information, see Configure a workspace.

Background information

Generally, it takes a long time from preparing to committing a job. You can use ODPS SQL nodes to process thousands to tens of thousands of transactions. ODPS SQL nodes are online analytical processing (OLAP) applications that are designed to deal with large amounts of data..

Limits

ODPS SQL nodes have the following limits:
  • You cannot use the SET, USE, or SQL alias statement independently in the code of an ODPS SQL node. They must be executed with other SQL statements. For example, you can use a SET statement together with a CREATE TABLE statement.
    set a=b;
    create table name(id string);
  • You cannot add comments to statements containing keywords, including the SET, USE, and SQL alias statements, in the code of an ODPS SQL node. For example, the comment is not allowed in the following code:
    create table name(id string);
    set a=b; // You cannot add a comment here. 
    create table name1(id string);
  • The execution of an ODPS SQL node during workflow development and the scheduled execution of an ODPS SQL node have the following differences:
    • Execution during workflow development: combines all the statements containing keywords, including the SET, USE, and SQL alias statements, in the node code and executes them before you execute other SQL statements.
    • Scheduled execution: executes all SQL statements in sequence.
    set a=b;
    create table name1(id string);
    set c=d;
    create table name2(id string);
    The following table describes the differences between the two execution modes for the preceding SQL statements.
    SQL statement Execution during workflow development Scheduled execution
    First SQL statement
    set a=b;
    set c=d;
    create table name1(id string);
    set a=b;
    create table name1(id string);
    Second SQL statement
    set a=b;
    set c=d;
    create table name2(id string);
    set c=d;
    create table name2(id string);
  • You must specify a scheduling parameter in the format of key=value. Do not add spaces before or after the equal sign (=). Examples:
    time = {yyyymmdd hh:mm:ss} // The format is invalid.
    a =b // The format is invalid.
  • If you use keywords such as bizdate and date as scheduling parameters, you must specify the values in the format of yyyymmdd. If you want to use other time formats, do not use the preceding keywords as scheduling parameters. Example:
    bizdate=201908 // Incorrect format. 
  • You can only use statements starting with SELECT, READ, or WITH to query the result data for a node during the workflow development. Otherwise, no results are returned.
  • Separate multiple SQL statements with semicolons (;). Each SQL statement must occupy a line.
    • Incorrect example
      create table1;create table2
    • Correct example
      create table1;
      create table2;
  • If extension functions in MaxCompute V2.0 use new data types, you must add set odps.sql.type.system.odps2=true; before the SQL statements that use the functions, and commit and execute the code together with the SQL statements.
  • If you want to add comments to SQL statements, you cannot use semicolons (;) in comments.
    Incorrect example:
    create table1; // Create a table named table1; Then, create a table named table2.
    create table2;
  • An ODPS SQL node can contain up to 130 KB of SQL code and up to 99 SQL statements.

Procedure

  1. Go to the DataStudio page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. In the top navigation bar, select the region where your workspace resides, find the workspace, and then click Data Analytics in the Actions column.
  2. In the Scheduled Workflow pane, move the pointer over the Create icon and choose MaxCompute > ODPS SQL.
    Alternatively, you can click a workflow in the Scheduled Workflow pane, right-click MaxCompute, and then choose Create > ODPS SQL.
  3. In the Create Node dialog box, set the Node Name and Location parameters.
    Note The node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
  4. Click Commit.
  5. Write and run the code of the ODPS SQL node.
    After the node is created, write the code of the ODPS SQL node. The code must conform to the syntax. For more information about SQL syntax, see Overview of MaxCompute SQL.
    Note Due to the adjustment made by the International Organization for Standardization (ISO) on the UTC+8 time zone, differences exist between the real time and the output time when you execute related SQL statements in DataWorks. Between the years of 1900 and 1928, the time difference is 352 seconds. Before the year of 1900, the time difference is 9 seconds.
    You cannot use the SET statement independently in the node code. The SET statement must be executed together with other SQL statements. For example, you can use a SET statement together with a SELECT statement.
    set project odps.sql.allow.fullscan=true;
    select 1;

    For more information about the SET statement, see SET operations.

    The following example creates a table, inserts data to the table, and queries data in the table:

    1. Create a table named test1.
      CREATE TABLE IF NOT EXISTS test1 
      ( id BIGINT COMMENT '' ,
        name STRING COMMENT '' ,
        age BIGINT COMMENT '' ,
        sex STRING COMMENT '');
    2. Insert data to the table.
      INSERT INTO test1 VALUES (1,'Zhang San',43,'Male');
      INSERT INTO test1 VALUES (1,'Li Si',32,'Male');
      INSERT INTO test1 VALUES (1,'Chen Xia',27,'Female');
      INSERT INTO test1 VALUES (1,'Wang Wu',24,'Male');
      INSERT INTO test1 VALUES (1,'Ma Jing',35,'Female');
      INSERT INTO test1 VALUES (1,'Zhao Qian',22,'Female');
      INSERT INTO test1 VALUES (1,'Zhou Zhuang',55,'Male');
    3. Query data in the table.
      select * from test1;
    4. After you enter the preceding SQL statements in the code editor, click the Run icon icon in the toolbar. DataWorks executes your SQL statements from top to bottom and displays logs.
      Run icon
      Note
      • If multiple MaxCompute compute engine instances are associated with the current workspace, select one MaxCompute compute engine instance before you click the Run icon.
      • If the MaxCompute compute engine instance that you selected uses the custom resource group in pay-as-you-go mode, click the Estimate MaxCompute Computing Cost icon icon in the toolbar to estimate the cost. The actual expenses generated for running this node are subject to the bill.
      The INSERT INTO statement may result in unexpected data duplication. DataWorks does not re-execute the INSERT INTO statement, but it may rerun nodes that contain the statement. We recommend that you avoid using the INSERT INTO statement. If DataWorks runs a node that contains the INSERT INTO statement, the following information appears in operational logs:
      The INSERT INTO statement may cause repeated data insertion. DataWorks does not re-execute the INSERT INTO statement, but it may rerun nodes that contain the statement. We recommend that you avoid using the INSERT INTO statement.  
      If you continue to use the INSERT INTO statement, you are deemed to be aware of the risks and are willing to take the consequences of potential data duplication. 
    5. After the SQL statements are executed, click the Save icon icon in the toolbar to save the SQL code.
    6. View the query result.
      DataWorks displays the query result in a workbook.
      You can view or manage the query result in the workbook, or copy the query result to an Excel file on your on-premises machine. View the query result
      Button Description
      Hide Column Select one or more columns and click Hide Column in the lower part to hide the selected columns.
      Copy Row Select one or more rows and click Copy Row in the lower part to copy the selected rows.
      Copy Column Select one or more columns and click Copy Column in the lower part to copy the selected columns.
      Copy Selected Select one or more cells in the workbook and click Copy Selected in the lower part to copy the selected cells.
      Data Analysis Click Data Analysis in the lower part to go to the workbook editing page.
      Search Click Search in the lower part to search for data in the workbook. After you click the button, a search box appears in the upper-right corner of the Result tab.
      Download Download the result data in the encoding format of GBK or UTF-8.
  6. On the node configuration tab, click the Properties tab in the right-side navigation pane. In the Properties panel, configure the scheduling properties for the node. For more information, see Configure basic properties.
  7. Commit the node.
    Notice You can commit the node only after you specify the Rerun and Parent Nodes parameters.
    1. Click the Submit icon icon in the toolbar.
    2. In the Commit Node dialog box, set the Change description parameter and select I confirm to proceed with the commit operation.
      If an alert appears, indicating that the input and output that you set do not match with those identified in code lineage analysis, you can choose to ignore the alert or click Cancel and then modify the dependencies.
    3. Click OK.
    In a workspace in standard mode, you must click Deploy in the upper-right corner after you commit the node. For more information, see Deploy nodes.
  8. Test the node. For more information, see View auto triggered nodes.