Run MaxCompute SQL Tasks in DataWorks with Automated Scheduling - MaxCompute

Prerequisites

Before you begin, ensure that you have:

Created an ODPS SQL node. For more information, see Create and manage ODPS nodes.

Limits

Constraint	Limit
Node code size	128 KB
SQL statements per node	200
Query result rows displayed	10,000
Query result data returned	10 MB

Comment restrictions:

Do not add comments to a SET or USE statement in isolation.
Do not use semicolons (;) inside comments.
Do not add a comment at the end of a complete SQL statement. A semicolon marks the end of a statement.

SET and USE statement restrictions:

Do not use a SET or USE statement alone in an ODPS SQL node. Execute them together with other SQL statements.
Only SELECT or WITH statements return formatted result sets.

How SET and USE statements execute

SET and USE statements behave differently depending on the environment.

Sample node code:

SET a=b;
CREATE TABLE name1(id STRING);
SET c=d;
CREATE TABLE name2(id STRING);

The following table shows how these statements are grouped and executed in each environment:

SQL statement	DataStudio	Scheduling environment
First	`SET a=b;` `SET c=d;` `CREATE TABLE name1(id STRING);`	`SET a=b;` `CREATE TABLE name1(id STRING);`
Second	`SET a=b;` `SET c=d;` `CREATE TABLE name2(id STRING);`	`SET c=d;` `CREATE TABLE name2(id STRING);`

DataStudio: All SET and USE statements in the node are combined and run before any other SQL statement.
Scheduling environment: All statements run sequentially, in the order they appear.

Write SQL statements

MaxCompute SQL supports DDL, DML, and DQL statements along with MaxCompute-specific syntax. Common SQL statements such as SELECT, INSERT, UPDATE, and DELETE are supported. For a full syntax reference, see Overview of MaxCompute SQL.

Create a table

Use CREATE TABLE to create non-partitioned, partitioned, external, or clustered tables. For full syntax, see CREATE TABLE.

-- Create a partitioned table.
CREATE TABLE IF NOT EXISTS students
  (id BIGINT, name STRING, age BIGINT, birth DATE)
PARTITIONED BY (gender STRING);

Insert data

Use INSERT INTO or INSERT OVERWRITE to load data into a table. For syntax details, see INSERT INTO and INSERT OVERWRITE.

INSERT INTO students PARTITION(gender='boy')  VALUES (1, 'John',  15, DATE '2008-05-15');
INSERT INTO students PARTITION(gender='boy')  VALUES (2, 'Jack',  17, DATE '2006-07-20');
INSERT INTO students PARTITION(gender='girl') VALUES (3, 'Alice', 20, DATE '2003-04-20');
INSERT INTO students PARTITION(gender='girl') VALUES (4, 'Lily',  21, DATE '2002-01-08');
INSERT INTO students PARTITION(gender='boy')  VALUES (5, 'Bob',   17, DATE '2006-09-12');

Important

INSERT INTO can produce duplicate rows. Use INSERT OVERWRITE to replace existing partition data instead.

Query data

Use SELECT for nested queries, sorting, and grouped queries. For syntax details, see SELECT syntax.

-- Enable full table scan for this session.
SET odps.sql.allow.fullscan=true;

-- Query all students in the 'boy' partition, sorted by ID.
SELECT * FROM students WHERE gender='boy' ORDER BY id;

RAM users do not have permission to query MaxCompute tables in the production environment by default. To grant access, request the required permissions in Security Center in the DataWorks console. For details, see Manage permissions on data in a MaxCompute compute engine instance and Manage user permissions by using commands.

Use scheduling parameters

To pass dynamic values into node code at runtime, define variables in the ${Variable name} format, then assign values to them in the Scheduling Parameter section of the Properties tab. For supported formats, see Supported formats of scheduling parameters.

Use MaxCompute V2.0 data types

If your SQL uses functions that rely on MaxCompute V2.0 data types, add SET odps.sql.type.system.odps2=true; before those statements and submit both together. For details, see MaxCompute V2.0 data type edition.

Use SQL functions

MaxCompute supports built-in functions and user-defined functions (UDFs) for data processing and analysis.

Built-in functions

Call built-in functions directly without any setup. The following example uses the dateadd function to add one month to each value in the birth column.

-- Enable full table scan for this session.
SET odps.sql.allow.fullscan=true;

SELECT id, name, age, birth, dateadd(birth, 1, 'mm') AS birth_dateadd
FROM students;

For a full list of built-in functions, see Overview of built-in functions.

User-defined functions (UDFs)

To use a UDF, write the function code, upload it as a resource, and register the function. For step-by-step instructions, see Create and use a MaxCompute UDF.

View running results and logs

After running a query, results appear in the workbook. A maximum of 10,000 rows and 10 MB of result data can be returned.

Sample output — students in the 'boy' partition, sorted by ID:

+----+------+-----+------------+--------+
| id | name | age | birth      | gender |
+----+------+-----+------------+--------+
| 1  | John | 15  | 2008-05-15 | boy    |
| 2  | Jack | 17  | 2006-07-20 | boy    |
| 5  | Bob  | 17  | 2006-09-12 | boy    |
+----+------+-----+------------+--------+

Sample output — birth column with `dateadd` applied:

+----+-------+-----+------------+---------------+
| id | name  | age | birth      | birth_dateadd |
+----+-------+-----+------------+---------------+
| 4  | Lily  | 21  | 2002-01-08 | 2002-02-08    |
| 3  | Alice | 20  | 2003-04-20 | 2003-05-20    |
| 2  | Jack  | 17  | 2006-07-20 | 2006-08-20    |
| 1  | John  | 15  | 2008-05-15 | 2008-06-15    |
| 5  | Bob   | 17  | 2006-09-12 | 2006-10-12    |
+----+-------+-----+------------+---------------+

On the Result tab, sort results by column or download them. For details, see Process query results.

To view operation logs, click the Runtime Log tab. Logs open in LogView. For details, see Use LogView V2.0 to view job information.

Due to an International Organization for Standardization (ISO) adjustment to the UTC+8 time zone, timestamps for years between 1900 and 1928 differ from the actual time by 352 seconds, and timestamps before 1900 differ by 9 seconds.

What's next

Explore more advanced use cases with ODPS SQL nodes: