All Products
Search
Document Center

MaxCompute:SQL in script mode

Last Updated:Jun 01, 2023

The SQL engine of MaxCompute supports the script mode. If you use the script mode to compile an SQL script file, all statements in the file are compiled at the same time. You do not need to compile each statement. After the script file is compiled, the script file is submitted to MaxCompute, and an execution plan is generated. This way, the statements in the script file are scheduled in one queue and are executed once. This allows you to fully utilize the resources in MaxCompute.

Note
  • In script mode, you cannot estimate the expense of SQL statements by using the Cost SQL command described in Cost estimation. You can view your bill for the accurate expense in the Alibaba Cloud Management Console. For more information, see View billing details.
  • The number of tables that are referenced in a script file cannot exceed 10,000. If a view is included in a script file, the system calculates the number of tables that are referenced when the view is created. If a table is referenced multiple times in a script file, the number of times the table is referenced is counted into the number of tables that are referenced.
The script mode allows you to compile SQL statements similar to the way you compile SQL statements in a common programming language. You do not need to consider how to organize statements.

Syntax

--set
set odps.sql.type.system.odps2=true;
[set odps.stage.reducer.num=xxx;]
[...]
--ddl
create table table1 xxx;
[create table table2 xxx;]
[...]
--dml
@var1 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
        FROM table3
        [WHERE where_condition];
@var2 := SELECT [ALL | DISTINCT] select_expr, select_expr, ...
        FROM table4
        [WHERE where_condition];
@var3 := SELECT [ALL | DISTINCT] var1.select_expr, var2.select_expr, ...
        FROM @var1 join @var2 on ...;
INSERT OVERWRITE|INTO TABLE [PARTITION (partcol1=val1, partcol2=val2 ...)]
        SELECT [ALL | DISTINCT] select_expr, select_expr, ...
        FROM @var3;    
[@var4 := SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var1 
        UNION ALL | UNION 
        SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var2;    
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name 
        AS 
        SELECT [ALL | DISTINCT] select_expr, select_expr, ...
        FROM @var4;]
[...]

Syntax description

  • The script mode supports three types of statements: SET statements, DML statements, and some DDL statements. The DDL statements that are used to display results, such as DESC and SHOW, are not supported.
  • A complete script file must be a list of SET, DDL, and DML statements in sequence. A script file can contain multiple statements of each type, but statements of the same type must be placed together. You must place these three types of statements in the preceding sequence.
  • The at sign (@) declares a variable.
  • A script file can contain a maximum of one statement that is used to display results, such as a SELECT statement. If a script file contains more than one such a statement, an error is returned. We recommend that you do not include SELECT statements that are used to display results in the script file.
  • A script file can contain a maximum of one CREATE TABLE AS statement and must end with this statement. We recommend that you do not include the statements used to create tables and INSERT statements in the same script command.
  • In script mode, if a statement in the script file fails to be executed, all the other statements in the script file also fail.
  • In script mode, a job can be generated to process data only after all the input data is prepared and inserted.
  • In script mode, if a table is written and then read, the following error is returned:
    INSERT OVERWRITE table src2 SELECT * FROM src WHERE key > 0;
    @a := SELECT * FROM src2;
    SELECT * FROM @a;
    To prevent this error, you can modify your script file based on the following sample script:
    @a := SELECT * FROM src WHERE key > 0;
    INSERT OVERWRITE table src2 SELECT * FROM @a;
    SELECT * FROM @a;
  • You can use the script mode in the following scenarios:
    • A single statement with nested subqueries needs to be rewritten, or a complex script must be split into multiple statements.
    • Data from different data sources are prepared at different time points, and the time gap is large. For example, the data from one data source is prepared at 01:00, and that from another data source is prepared at 07:00. In this case, the table variables are not suitable for packaging the statements of different types in a script file.
    • You need to assign a constant value to a variable and execute the SELECT * FROM Variable statement to convert the variable into a scalar value. This scalar value is used with other columns for calculation. The constant value can also be stored in a table that has only one row. The following statements show an example. For more information about the syntax of the SELECT * FROM Variable statement, see Subqueries.
      @a := SELECT 10; -- Assign the constant value 10 to @a. You can also execute the SELECT col1 FROM t1 statement to store the constant value to table t1 that has only one row. 
      @b := SELECT key,value+(SELECT * FROM @a) FROM t2 WHERE key >10000; -- Use the value of @a with value in table t2 for calculation. 
      SELECT * FROM @b;

Example

The following code shows how to run MaxCompute SQL in script mode:
CREATE TABLE IF NOT EXISTS dest(key STRING, value BIGINT) PARTITIONED BY (d STRING);
CREATE TABLE IF NOT EXISTS dest2(key STRING, value BIGINT) PARTITIONED  BY (d STRING);
@a := SELECT * FROM src WHERE value >0;
@b := SELECT * FROM src2 WHERE key is not null;
@c := SELECT * FROM src3 WHERE value is not null;
@d := SELECT a.key,b.value FROM @a LEFT OUTER JOIN @b ON a.key=b.key AND b.value>0;
@e := SELECT a.key,c.value FROM @a INNER JOIN @c ON a.key=c.key;
@f := SELECT * FROM @d UNION SELECT * FROM @e UNION SELECT * FROM @a;
INSERT OVERWRITE table dest PARTITION (d='20171111') SELECT * FROM @f;
@g := SELECT e.key,c.value FROM @e JOIN @c ON e.key=c.key;
INSERT OVERWRITE TABLE dest2 PARTITION (d='20171111') SELECT * FROM @g;

Required tools

Methods to use the script mode

  • Use MaxCompute Studio
    Before you use MaxCompute Studio, make sure that MaxCompute Studio is installed, the project link is added, and a MaxCompute SQL script file is created. For more information, see Install IntelliJ IDEA, Manage project connections, and Create a MaxCompute script module.
    After you compile and submit the script file, you can view the graphic execution plan. Although the script file contains multiple statements, the graphic execution plan is shown in a directed acyclic graph (DAG).
  • Use the MaxCompute client

    You must use the MaxCompute client of a version later than 0.27 to submit the script file. We recommend that you install the latest version of the MaxCompute client installation package. After you install the latest version of the package, use the -s option to submit the script file.

    Edit the source code file myscript.sql in script code and call odpscmd in the system command line to run the following command. For more information about how to run the MaxCompute client by using the system command line, see Run the MaxCompute client.
    ..\bin>odpscmd -s myscript.sql
    Note Similar to the -f and -e options, the -s option is a command line option for the MaxCompute client. The -s option is not a command in an interactive environment. If the MaxCompute client (odpscmd) is used in an interactive environment, the script mode and table variables are not supported.
  • Use DataWorks
    In the DataWorks console, you can create an ODPS Script node, as shown in the following figure. ODPS Script node

    After the node is created, use the script mode to edit the script file and click the Run icon in the toolbar to submit the script file to MaxCompute. This way, you can use the Logview URL in the output to view the graphic execution plan and results.

  • Use the SDK for Java
    You can create an SQL script file in the SDK for Java. For more information about the SDK for Java, see SDK for Java. The following example shows how to create an SQL script file:
    import java.util.HashMap;
    import java.util.List;
    import java.util.Map;
    
    import com.aliyun.odps.Instance;
    import com.aliyun.odps.Odps;
    import com.aliyun.odps.OdpsException;
    import com.aliyun.odps.account.Account;
    import com.aliyun.odps.account.AliyunAccount;
    import com.aliyun.odps.data.Record;
    import com.aliyun.odps.task.SQLTask;
    
    public class SdkTest {
    
      public static void main(String[] args) throws OdpsException {
    
        Account account = new AliyunAccount("your access_id", "your access_key");
        Odps odps = new Odps(account);
        odps.setDefaultProject("your project_name");
        odps.setEndpoint("your end_point");
    
        String sqlScript = "@a := SELECT * FROM jdbc_test;\n"
                           + "SELECT * FROM @a;";
    
        // Make sure that you add the following configuration to the SQL script file.
        Map<String, String> hints = new HashMap<>();
        hints.put("odps.sql.submit.mode", "script");
    
        Instance instance = SQLTask.run(odps, "your project_name", sqlScript, hints, null);
        instance.waitForSuccess();
    
        List<Record> recordList = SQLTask.getResult(instance);
        for (Record record : recordList) {
          System.out.println(record.get(0));
          System.out.println(record.get(1));
        }
      }
    
    }