The SQL engine of MaxCompute supports the script mode. If you use the script mode to compile an SQL script file, all statements in the file are compiled at the same time. You do not need to compile each statement. After the script file is compiled, it is submitted to MaxCompute, and an execution plan is generated. This way, the statements in the script file are scheduled in one queue and are executed once. This allows you to fully utilize the resources in MaxCompute.
--set set odps.sql.type.system.odps2=true; [set odps.stage.reducer.num=xxx;] [...] --ddl create table table1 xxx; [create table table2 xxx;] [...] --dml @var1 := SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM table3 [WHERE where_condition]; @var2 := SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM table4 [WHERE where_condition]; @var3 := SELECT [ALL | DISTINCT] var1.select_expr, var2.select_expr, ... FROM @var1 join @var2 on ...; INSERT OVERWRITE|INTO TABLE [PARTITION (partcol1=val1, partcol2=val2 ...)] SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM @var3; [@var4 := SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var1 UNION ALL | UNION SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var2; CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name AS SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM @var4;] [...]
- The script mode supports three types of statements: SET statements, DML statements, and some DDL statements. The DDL statements that are used to display results, such as DESC and SHOW, are not supported.
- A complete script file must be a list of SET, DDL, and DML statements in sequence. A script file can contain multiple statements of each type, but statements of the same type must be placed together. You must place these three types of statements in the preceding sequence.
- The at sign (
@) declares a variable.
- A script file can contain a maximum of one statement that is used to display results, such as a SELECT statement. If a script file contains more than one such a statement, an error is returned. We recommend that you do not include SELECT statements that are used to display results in the script file.
- A script file can contain a maximum of one CREATE TABLE AS statement and must end with this statement. We recommend that you do not include the statements used to create tables and INSERT statements in the same script command.
- In script mode, if a statement in the script file fails to be executed, all the other statements in the script file also fail.
- In script mode, a job can be generated to process data only after all the input data is prepared and inserted.
- In script mode, if a table is written and then read, the following error is returned:
INSERT OVERWRITE table src2 SELECT * FROM src WHERE key > 0; @a := SELECT * FROM src2; SELECT * FROM @a;To prevent this error, you can modify your script file based on the following sample script:
@a := SELECT * FROM src WHERE key > 0; INSERT OVERWRITE table src2 SELECT * FROM @a; SELECT * FROM @a;
- You can use the script mode in the following scenarios:
- A single statement with nested subqueries needs to be rewritten, or a complex script must be split into multiple statements.
- Data from different data sources are prepared at different time points, and the time gap is large. For example, the data from one data source is prepared at 01:00, and that from another data source is prepared at 07:00. In this case, the table variables are not suitable for packaging the statements of different types in a script file.
- You need to assign a constant value to a variable and execute the
SELECT * FROM Variablestatement to convert the variable into a scalar value. This scalar value is used with other columns for calculation. The constant value can also be stored in a table that has only one row. The following statements show an example. For more information about the syntax of the SELECT * FROM Variable statement, see Subqueries.
@a := SELECT 10; -- Assign the constant value 10 to @a. You can also execute the SELECT col1 FROM t1 statement to store the constant value to table t1 that has only one row. @b := SELECT key,value+(SELECT * FROM @a) FROM t2 WHERE key >10000; -- Use the value of @a with value in table t2 for calculation. SELECT * FROM @b;
CREATE TABLE IF NOT EXISTS dest(key STRING, value BIGINT) PARTITIONED BY (d STRING); CREATE TABLE IF NOT EXISTS dest2(key STRING, value BIGINT) PARTITIONED BY (d STRING); @a := SELECT * FROM src WHERE value >0; @b := SELECT * FROM src2 WHERE key is not null; @c := SELECT * FROM src3 WHERE value is not null; @d := SELECT a.key,b.value FROM @a LEFT OUTER JOIN @b ON a.key=b.key AND b.value>0; @e := SELECT a.key,c.value FROM @a INNER JOIN @c ON a.key=c.key; @f := SELECT * FROM @d UNION SELECT * FROM @e UNION SELECT * FROM @a; INSERT OVERWRITE table dest PARTITION (d='20171111') SELECT * FROM @f; @g := SELECT e.key,c.value FROM @e JOIN @c ON e.key=c.key; INSERT OVERWRITE TABLE dest2 PARTITION (d='20171111') SELECT * FROM @g;
Methods to use the script mode
- Use MaxCompute Studio
Before you use MaxCompute Studio, make sure that MaxCompute Studio is installed, the project link is added, and a MaxCompute SQL script file is created. For more information, see Install IntelliJ IDEA, Manage project connections, and Create a MaxCompute script module.After you compile and submit the script file, you can view the graphic execution plan. Although the script file contains multiple statements, the graphic execution plan is shown in a directed acyclic graph (DAG).
- Use the MaxCompute client
You must use the MaxCompute client of a version later than 0.27 to submit the script file. We recommend that you install the latest version of the MaxCompute client installation package. After you install the latest version of the package, use the
-soption to submit the script file.Edit the source code file myscript.sql in script code and call odpscmd in the system command line to run the following command. For more information about how to run the MaxCompute client by using the system command line, see Run the MaxCompute client.
..\bin>odpscmd -s myscript.sqlNote Similar to the
-soption is a command line option for the MaxCompute client. The -s option is not a command in an interactive environment. If the MaxCompute client (odpscmd) is used in an interactive environment, the script mode and table variables are not supported.
- Use DataWorks
In the DataWorks console, you can create an ODPS Script node, as shown in the following figure.
After the node is created, use the script mode to edit the script file and click the Run icon in the toolbar to submit the script file to MaxCompute. This way, you can use the Logview URL in the output to view the graphic execution plan and results.