Using the SQL-like syntax, ODPS SQL nodes can process terabytes of data in distributed scenarios that do not require real-time processing.

Generally, it takes a long time from preparing to committing a job. You can use ODPS SQL nodes to process thousands to ten thousands of transactions. ODPS SQL nodes are online analytical processing (OLAP) applications designed to deal with large amounts of data.

Create an ODPS SQL node

  1. Log on to the DataWorks console. In the left-side navigation pane, click Workspaces. On the Workspaces page, find the target workspace and click Data Analytics in the Actions column.
  2. Move the pointer over the Create icon and choose MaxCompute > ODPS SQL.

    You can also find the target workflow, right-click MaxCompute, and choose Create > ODPS SQL.

  3. In the Create Node dialog box that appears, enter the node name, select the target folder, and click Commit.
    Note A node name can be up to 128 characters in length.
  4. Edit the code of the ODPS SQL node on the node configuration tab.

    Edit the code of the ODPS SQL node. The code must conform to the required syntax. For more information about SQL syntax, see SQL summary.

    Note Due to the adjustment made by the International Organization for Standardization (ISO) on the UTC+8 time zone, there might be differences between the real time and the output time when you run the related SQL statements through DataWorks. Between the years of 1900 and 1928, the time difference is 352 seconds. Before the year of 1900, the time difference is 9 seconds.
    Currently, you cannot use SET statements independently in the node code. The SET statements must be run together with other SQL statements. For example, you can use a SET statement as follows:
    setproject odps.sql.allow.fullscan=true;
    select 1;
    Example: Create a table, insert data to the table, and query data in the table.
    1. Create a table named test1.
      CREATE TABLE IF NOT EXISTS test1 
      ( id BIGINT COMMENT '' ,
        name STRING COMMENT '' ,
        age BIGINT COMMENT '' ,
        sex STRING COMMENT '');
    2. Insert data to the table.
      INSERT INTO test1 VALUES (1,'Zhang San',43,'Male');
      INSERT INTO test1 VALUES (1,'Li Si',32,'Male');
      INSERT INTO test1 VALUES (1,'Chen Xia',27,'Female');
      INSERT INTO test1 VALUES (1,'Wang Wu',24,'Male');
      INSERT INTO test1 VALUES (1,'Ma Jing',35,'Female');
      INSERT INTO test1 VALUES (1,'Zhao Qian',22,'Female');
      INSERT INTO test1 VALUES (1,'Zhou Zhuang',55,'Male');
    3. Query data in the table.
      select * from test1;
    4. After you enter the preceding SQL statements in the code editor, click the Run icon in the toolbar. DataWorks runs your SQL statements from top to bottom and generates logs.Run
      Note The INSERT INTO statement may result in unexpected data duplication. Although DataWorks does not rerun the INSERT SQL statement, it may rerun corresponding nodes. We recommend that you avoid using the INSERT INTO statement. When DataWorks runs the INSERT INTO statement, the following information appears in logs:
      The INSERT INTO statement in SQL may cause repeated data insertion. Although SQL-level retries have been revoked for the INSERT INTO statement, there is still the possibility of retrying at the task level. We recommend that you avoid the use of the INSERT INTO statement. 
      If you continue to use INSERT INTO statements, we deem that you are aware of the associated risks and are willing to take the consequences of potential data duplication.
    5. Check the returned result and click the Save icon in the toolbar to save the SQL statements.
  5. View the query result.

    DataWorks displays the query result in a workbook.

    You can view or manage the query result in the workbook, or copy the query result to a local Excel file.

    Action Description
    Hide Column Select one or more columns and click Hide Column at the bottom to hide the selected columns.
    Copy Row Select one or more rows and click Copy Row at the bottom to copy the selected rows.
    Copy Column Select one or more columns and click Copy Column at the bottom to copy the selected columns.
    Copy Selected Select one or more cells in the workbook and click Copy Selected at the bottom to copy the selected cells.
    Data Analysis Click Data Analysis at the bottom to go to the workbook editing page.
    Search Click Search at the bottom to search for data in the workbook. After you click the button, a search box appears in the upper right corner of the Results area.
    Download Download the result data in the encoding format of GBK or UTF-8.
  6. Estimate the cost of running the node.

    Click the Expense Estimate icon to estimate the cost of running the ODPS SQL node.

    The estimated cost appears after you click the Expense Estimate icon. If an error appears in the Expense Estimate column, move the pointer over X to view the cause of the error.

  7. Configure the node properties.

    Click the Properties tab in the right-side navigation pane. On the Properties tab that appears, set the relevant parameters. For more information, see Properties.

  8. Commit the node.

    After the node properties are configured, click the Save icon in the upper-left corner. Then, commit or commit and unlock the node to the development environment.

  9. Deploy the node.

    For more information, see Deploy a node.

  10. Test the node in the production environment.

Note

  • You cannot use SET statements, USE statements, or SQL alias statements independently in the code of an ODPS SQL node. They must be run together with other SQL statements. For example, you can use a SET statement as follows:
    set a=b;
    create table name(id string);
  • You cannot add comments to statements containing keywords, including SET statements, USE statements, and SQL alias statements, in the code of an ODPS SQL node. For example, the following comment is not allowed:
    create table name(id string);
    set a=b; // Comment. You cannot add a comment here.
    create table name1(id string);
  • The running of an ODPS SQL node during workflow development and the scheduled running of an ODPS SQL node have the following differences:
    • Running during workflow development: combines all the statements containing keywords, including SET statements, USE statements, and SQL alias statements, in the node code and runs them before running other SQL statements.
    • Scheduled running: runs all SQL statements in sequence.
    set a=b;
    create table name1(id string);
    set c=d;
    create table name2(id string);
    The following table shows the differences between the two running modes for the preceding SQL statements.
    SQL statement Running during workflow development Scheduled running
    First SQL statement
    set a=b;
    set c=d;
    create table name1(id string);
    set a=b;
    create table name1(id string);
    Second SQL statement
    set a=b;
    set c=d;
    create table name2(id string);
    set c=d;
    create table name2(id string);
  • You must specify a scheduling parameter in the format of key=value. Do not add any spaces before or after the equation mark (=). Examples:
    time = {yyyymmdd hh:mm:ss} // Incorrect format.
    a =b // Incorrect format.  
  • If you use keywords such as bizdate and date as scheduling parameters, you must specify the values in the format of yyyymmdd. If you want to use other time formats, do not use the preceding keywords as scheduling parameters. Example:
    bizdate=201908 // Incorrect format.
  • You can only use statements starting with SELECT, READ, or WITH to query the result data for a node during the workflow development. Otherwise, no results are returned.
  • Separate multiple SQL statements with semicolons (;) and place them in different lines.
    • Incorrect example
      create table1;create table2
    • Correct example
      create table1;
      create table2;