The SQL engine of MaxCompute supports the script mode. If you use the script mode to compile an SQL script file, all statements in the file are compiled together. Then, the script file is submitted to MaxCompute, and an execution plan is generated. This way, the statements in the script file are scheduled in one queue and are executed once. This allows you to fully utilize resources in MaxCompute.
--set set odps.sql.type.system.odps2=true; [set odps.stage.reducer.num=xxx;] [...] --ddl create table table1 xxx; [create table table2 xxx;] [...] --dml @var1 := SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM table3 [WHERE where_condition]; @var2 := SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM table4 [WHERE where_condition]; @var3 := SELECT [ALL | DISTINCT] var1.select_expr, var2.select_expr, ... FROM @var1 join @var2 on ...; INSERT OVERWRITE|INTO TABLE [PARTITION (partcol1=val1, partcol2=val2 ...)] SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM @var3; [@var4 := SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var1 UNION ALL | UNION SELECT [ALL | DISTINCT] var1.select_expr, var.select_expr, ... FROM @var2; CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name AS SELECT [ALL | DISTINCT] select_expr, select_expr, ... FROM @var4;] [...]
- The script mode supports SET statements, DML statements, and some DDL statements. The DDL statements that are used to display results, such as DESC and SHOW, are not supported.
- A complete script file must be a list of SET, DDL, and DML statements in sequence. A script file can contain multiple statements of each type. However, a script file does not need to contain all types of statements. You must place the statements of the same type in the preceding sequence.
- The at sign (
@) declares a variable.
- A script file can contain a maximum of one statement that is used to display results, such as a SELECT statement. If a script file contains two or more such statements, an error is returned. We recommend that you do not include SELECT statements that are used to display results in the script file.
- A script file can contain a maximum of one CREATE TABLE AS statement and must end with this statement. We recommend that you do not include the statements used to create tables and INSERT statements in the same script file.
- If the script mode is used and a statement in the script file fails to be executed, all the other statements in the script file also fail.
- If the script mode is used, a job can be generated to process data only after all the statements in a script file are prepared and successfully executed.
- If the script mode is used and a table is written and then read, the following error is returned:
INSERT OVERWRITE table src2 SELECT * FROM src WHERE key > 0; @a := SELECT * FROM src2; SELECT * FROM @a;To avoid this error, you can modify your script file based on the following sample script:
@a := SELECT * FROM src WHERE key > 0; INSERT OVERWRITE table src2 SELECT * FROM @a; SELECT * FROM @a;
- You can use the script mode in the following scenarios:
- A single statement with nested subqueries needs to be rewritten, or a complex script must be split into multiple statements.
- Data from different data sources are prepared at different time points, and the time gap is large. For example, the data from one data source is prepared at 01:00, and that from another data source is prepared at 07:00. In this case, table variables are not suitable for packaging the statements of different types in a script file.
- You need to assign a constant value to a variable and execute the
SELECT * FROM Variablestatement to convert the variable to a scalar value. This scalar value is used with other columns for calculation. The constant value can also be stored in a table that has only one row. For more information about the syntax of the SELECT * FROM Variable statement, see Subqueries. Sample statements:
@a := SELECT 10; -- Assign the constant value 10 to @a. You can also execute the SELECT col1 FROM t1 statement to store the constant value to table t1 that has only one row. @b := SELECT key,value+(SELECT * FROM @a) FROM t2 WHERE key >10000; -- Use the value of @a with value in table t2 for calculation. SELECT * FROM @b;
CREATE TABLE IF NOT EXISTS dest(key STRING, value BIGINT) PARTITIONED BY (d STRING); CREATE TABLE IF NOT EXISTS dest2(key STRING, value BIGINT) PARTITIONED BY (d STRING); @a := SELECT * FROM src WHERE value >0; @b := SELECT * FROM src2 WHERE key is not null; @c := SELECT * FROM src3 WHERE value is not null; @d := SELECT a.key,b.value FROM @a LEFT OUTER JOIN @b ON a.key=b.key AND b.value>0; @e := SELECT a.key,c.value FROM @a INNER JOIN @c ON a.key=c.key; @f := SELECT * FROM @d UNION SELECT * FROM @e UNION SELECT * FROM @a; INSERT OVERWRITE table dest PARTITION (d='20171111') SELECT * FROM @f; @g := SELECT e.key,c.value FROM @e JOIN @c ON e.key=c.key; INSERT OVERWRITE TABLE dest2 PARTITION (d='20171111') SELECT * FROM @g;
Methods to use the script mode
- Use MaxCompute Studio
Before you use the script mode in MaxCompute Studio, make sure that MaxCompute Studio is installed, connections to projects are configured, and the MaxCompute SQL script file is created. For more information, see Install IntelliJ IDEA, Manage project connections, and Create a MaxCompute script module. The following figure shows the script editor.After you compile and submit the script file, you can view the graphic execution plan. Although the script file contains multiple statements, the graphic execution plan is still shown in a directed acyclic graph (DAG).
- Use the MaxCompute client
You must use the MaxCompute client of a version later than 0.27 to submit the script file. We recommend that you install the latest version of the client. After you install the latest version, use the
-soption to submit the script file.Edit the source code file myscript.sql by using the following command:
..\bin>odpscmd -s myscript.sqlNote Similar to the
-soption is a command line option for the MaxCompute client. The -s option is not a command for an interactive environment. If the MaxCompute client is used in an interactive environment, the script mode and table variables are not supported.
- Use DataWorks
In the DataWorks console, create an ODPS Script node.
After the node is created, use the script mode to edit the script file and click the Run icon in the toolbar to submit the script file to MaxCompute. Then, you can use the Logview URL in the output to view the graphic execution plan and results.