The SQL engine of MaxCompute supports Script Mode SQL. If you use Script Mode SQL to compile an SQL script file, all statements in the file are compiled together. The script file is then submitted to MaxCompute to generate an execution plan. This way, the statements in a script file are scheduled in one queue and run all at once. This allows you to fully utilize resources in MaxCompute.

Note If you use Script Mode SQL, you cannot estimate the expense of SQL statements by using the cost SQL command described in cost sql. You can view your bill for the accurate expense. For more information, see View billing details.
Script Mode SQL allows you to compile SQL statements in a way similar to using a common programming language. You do not need to consider how to organize statements.

Syntax

--set
set odps.sql.type.system.odps2=true;
[set odps.stage.reducer.num=xxx;]
[...]
--ddl
create table table1 xxx;
[create table table2 xxx;]
[...]
--dml
@var1 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
        FROM table3
        [WHERE where_condition];
@var2 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
        FROM table4
        [WHERE where_condition];
@var3 := SELECT [ALL | DISTINCT] var1.select_expr, var2.select_expr, ...
        FROM @var1 join @var2 on ... ;
INSERT OVERWRITE|INTO TABLE [PARTITION (partcol1=val1, partcol2=val2 ...)]
        SELECT [ALL | DISTINCT] select_expr, select_expr, ...
        FROM @var3;    
[@var4 := SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var1 
        UNION ALL | UNION 
        SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var2;    
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name 
        AS 
        SELECT [ALL | DISTINCT] select_expr, select_expr, ...
        FROM @var4;]
[...]

Description

  • Script Mode SQL supports SET statements, DML statements, and some DDL statements. The DDL statements that are used to display results, such as DESC and SHOW, are not supported.
  • A complete script file must be a list of SET, DDL, and DML statements in sequence. A number of statements can be provided for each type. Statements of different types can also be empty, but cannot be mixed.
  • The at sign @ indicates a variable.
  • A script file can contain a maximum of one statement that is used to display results, such as a SELECT statement. If a script file contains two or more such statements, an error is returned. We recommend that you do not include SELECT statements in the script file.
  • A script file can contain a maximum of one CREATE TABLE AS statement, and must end with this statement. We recommend that you do not include the CREATE TABLE and INSERT statements in the same script file.
  • If Script Mode SQL is used and a statement in the script file fails to execute, all the other statements in the script file also fail.
  • If Script Mode SQL is used, a job can be generated to process data only after all the statements in a script file is prepared and successfully executed.
  • If Script Mode SQL is used and a table is written and then read, the following error is returned:
    INSERT OVERWRITE table src2 SELECT * FROM src WHERE key > 0;
    @a := SELECT * FROM src2;
    SELECT * FROM @a;
    To avoid this error, you can modify your script file. Sample script after modification:
    @a := SELECT * FROM src WHERE key > 0;
    INSERT OVERWRITE table src2 SELECT * FROM @a;
    SELECT * FROM @a;
  • Scenarios:
    • A single statement with nested subqueries needs to be rewritten, or a complex script must be split into multiple statements.
    • Data from different data sources may be prepared at different time points, and the time gap may be large. For example, the data from one data source is prepared at 01:00, and that from another data source is prepared at 07:00. In this case, table variables are not suitable for packaging statements of different types in a script file.
    • You need to assign a constant value to a variable and then execute the SELECT * FROM Variable statement to convert the variable to a scalar. This scalar is used with other columns for calculation. The constant value can also be stored in a table that has only one row. For more information about the syntax of the SELECT * FROM Variable statement, see Subqueries. Sample statements:
      @a := SELECT 10; -- Assign the constant value 10 to @a. You can also execute the SELECT col1 FROM t1 statement to store the constant value to table t1 that has only one row.
      @b := SELECT key,value+(SELECT * FROM @a) FROM t2 WHERE key >10000; -- Use the value in @a with value in table t2 for calculation.
      SELECT * FROM @b;

Example

CREATE TABLE IF NOT EXISTS dest(key STRING, value BIGINT) PARTITIONED BY (d STRING);
CREATE TABLE IF NOT EXISTS dest2(key STRING, value BIGINT) PARTITIONED  BY (d STRING);
@a := SELECT * FROM src WHERE value >0;
@b := SELECT * FROM src2 WHERE key is not null;
@c := SELECT * FROM src3 WHERE value is not null;
@d := SELECT a.key,b.value FROM @a LEFT OUTER JOIN @b ON a.key=b.key AND b.value>0;
@e := SELECT a.key,c.value FROM @a INNER JOIN @c ON a.key=c.key;
@f := SELECT * FROM @d UNION SELECT * FROM @e UNION SELECT * FROM @a;
INSERT OVERWRITE table dest PARTITION (d='20171111') SELECT * FROM @f;
@g := SELECT e.key,c.value FROM @e JOIN @c ON e.key=c.key;
INSERT OVERWRITE TABLE dest2 PARTITION (d='20171111') SELECT * FROM @g;

Required tools

Procedure

  • Use MaxCompute Studio

    Before you use MaxCompute Studio for Script Mode SQL, make sure that MaxCompute Studio has been installed, the project link has been added, and the MaxCompute SQL script file has been created. For more information, see Install IntelliJ IDEA, Manage project connections, and Create a MaxCompute Script module.

    After you compile and submit the script file, you can view the graphic execution plan. Although the script file contains multiple statements, the graphic execution plan is still shown in a directed acyclic graph (DAG).
  • Use MaxCompute client (odpscmd)

    You must use the MaxCompute client (odpscmd) of a version later than 0.27 to submit the script file. We recommend that you download and install the latest version from odpscmd. After you install the latest version, use the -s parameter to submit the script file.

    Edit the source code file myscript.sql by using the following odpscmd command:
    odpscmd -s myscript.mysql;
    Note Similar to -f and -e, -s is a command line option for the MaxCompute client (odpscmd). -s is not a command in an interactive environment. If the MaxCompute client (odpscmd) is used in an interactive environment, Script Mode SQL and table variables are not supported.
  • Use DataWorks
    In the DataWorks console, you can create an ODPS Script node, as shown in the following figure.ODPS Script node

    After the node is created, use Script Mode SQL to edit the script file and then click the Run icon in the toolbar to submit the script file to MaxCompute. You can then use the Logview URL in the output to view the graphic execution plan and results.