DataWorks provides the ODPS SQL node, which you can use to periodically schedule MaxCompute SQL tasks and integrate them with other node types for co-scheduling. MaxCompute SQL tasks use an SQL-like syntax and are suitable for distributed processing scenarios that involve large volumes of data (terabytes) but do not require high real-time performance. This topic describes how to develop SQL tasks in DataWorks and provides some precautions.
Node introduction
ODPS SQL is used to process and query data in MaxCompute. It supports common SQL operations, such as SELECT, INSERT, UPDATE, and DELETE, along with syntax and functions specific to MaxCompute. Using ODPS SQL, you can write SQL-like statements to query and process data without needing to write complex data processing logic. For more information about SQL syntax, see SQL overview.
Limits
The following table describes the limits on developing ODPS SQL nodes in DataWorks.
Category | Description |
Comment usage | Supports single-line comments ( For more information, see MaxCompute SQL comments. The following limits also apply to comments.
|
SQL submission | ODPS SQL does not support running SET or USE statements alone. They must be executed with a specific SQL statement. |
SQL development | The SQL code size cannot exceed 128 KB. The number of SQL commands cannot exceed 200. |
Query results | Only SQL statements that start with SELECT or WITH can output a formatted result set. Query results have the following limits:
Note If you encounter limits on query results, you can download the results to your local computer for viewing in one of the following ways:
|
Precautions
Ensure that the account used to run the ODPS SQL task has the required permissions on the corresponding MaxCompute project. For more information, see DataWorks permissions on MaxCompute and MaxCompute permissions.
The execution of MaxCompute SQL tasks consumes quota resources. If a task runs for a long time, go to the MaxCompute console to check the quota resource consumption and ensure that sufficient resources are available to run the task. For more information, see Computing resources - Quota management.
When you develop an ODPS SQL node task, you must enclose special parameters, such as an OSS address, in double quotation marks. Otherwise, the task cannot be parsed and executed.
In some extreme cases, such as an unexpected server power-off or an active-standby switchover, DataWorks may fail to completely stop the related MaxCompute task processes. If this occurs, stop the job in the corresponding project on the MaxCompute console. For more information, see Stop a job.
When you run statements with keywords, such as SET and USE, in different environments in DataWorks, the execution order varies.
In DataStudio: All keyword statements (SET and USE) in the current task code are merged and executed before all other SQL statements.
In the scheduling environment: The statements are executed in the order they are written.
Assume that the code defined in the node is as follows.
SET a=b; CREATE TABLE name1(id string); SET c=d; CREATE TABLE name2(id string);The following table describes the execution order in different environments.
SQL statement to execute
DataStudio
Scheduling environment
First SQL statement
SET a=b; SET c=d; CREATE TABLE name1(id string);SET a=b; CREATE TABLE name1(id string);Second SQL statement
SET a=b; SET c=d; CREATE TABLE name2(id string);SET c=d; CREATE TABLE name2(id string);
Edit code: Simple example
SQL commands
MaxCompute SQL commands use a syntax similar to standard SQL. They support Data Definition Language (DDL), Data Manipulation Language (DML), and Data Query Language (DQL) statements, along with commands specific to MaxCompute. For more information about the syntax requirements and usage examples of each SQL command, see SQL overview. The following example shows how to develop and run a simple SQL command.
When you use a MaxCompute V2.0 extension function that involves new data types, you must add
SET odps.sql.type.system.odps2=true;before the SQL statement of the function. Then, submit the statement and the function together to enable the new data types. For more information about V2.0 data types, see Data type editions (V2.0).DataWorks provides scheduling parameters that allow you to dynamically pass parameters to code in scheduling scenarios. In an ODPS SQL node, you can define variables in the
${Variable name}format within the code. You can then assign a value to the variable in the Parameters section on the Properties tab. For more information about the supported formats of scheduling parameters, see Supported formats of scheduling parameters.
Create a table
You can use the `
CREATE TABLE` statement to create non-partitioned tables, partitioned tables, foreign tables, and clustered tables. For more information, see CREATE TABLE. The following code provides an example:-- Create a partitioned table named students. CREATE TABLE if NOT EXISTS students ( id BIGINT, name STRING, age BIGINT, birth DATE) partitioned BY (gender STRING);Insert data
You can use the `
INSERT INTO` or `INSERT OVERWRITE` statement to insert or update data in a destination table. For more information, see Insert or overwrite data (INSERT INTO | INSERT OVERWRITE). The following code provides an example:-- Insert data. INSERT INTO students PARTITION(gender='boy') VALUES (1,'Zhang San',15,DATE '2008-05-15') ; INSERT INTO students PARTITION(gender='boy') VALUES (2,'Li Si',17,DATE '2006-07-20') ; INSERT INTO students PARTITION(gender='girl') VALUES (3,'Li Xia',20,DATE '2003-04-20') ; INSERT INTO students PARTITION(gender='girl') VALUES (4,'Wang Lan',21,DATE '2002-01-08') ; INSERT INTO students PARTITION(gender='boy') VALUES (5,'Wang Wu',17,DATE '2006-09-12') ;ImportantAvoid using the `
INSERT INTO` statement to insert data because this may cause unexpected data duplication. We recommend that you use the `INSERT OVERWRITE` statement instead. For more information, see Insert or overwrite data.Query data
You can use the `
SELECT` statement to perform operations such as nested queries, group queries, and sorting. For more information, see SELECT syntax. The following code provides an example:-- Enable a full table scan. This setting is valid only for the current session. SET odps.sql.allow.fullscan=true; -- Query the information of all male students and sort the results by ID in ascending order. SELECT * FROM students WHERE gender='boy' ORDER BY id;NoteBy default, Resource Access Management (RAM) users do not have permissions to query data from production tables. To obtain these permissions, you must request them in Security Center. For more information about preset database permissions and access control for MaxCompute in DataWorks, see MaxCompute data permission control. For more information about how to grant permissions using MaxCompute commands, see Manage user permissions using commands.
SQL functions
MaxCompute supports built-in functions and user-defined functions (UDFs) for data development and analysis. For more information about built-in functions, see Built-in functions. For more information about UDFs, see MaxCompute UDFs. The following example shows how to use a simple SQL function.
Built-in functions: Built-in functions are preset in MaxCompute and can be called directly. For example, based on the preceding table, you can use the `
dateadd` function to change the values in the `birth` column by a specified interval. The following code provides an example:-- Enable a full table scan. This setting is valid only for the current session. SET odps.sql.allow.fullscan=true; SELECT id, name, age, birth, dateadd(birth,1,'mm') AS birth_dateadd FROM students;User-defined functions (UDFs): To use a UDF, you must write the function code, upload the code as a resource, and register the function. For more information, see Create and use a UDF.
Running and results
The run results are displayed in a workbook. You can perform operations on the results in DataWorks, open the results in a workbook, or copy and paste the content to a local Excel file. For more information, see Task debugging process.
NoteDue to an adjustment to the China time zone information released by the International Organization for Standardization, a time difference may occur for specific periods when you run related SQL statements in DataWorks. This includes a 5-minute and 52-second difference for dates and times from 1900 to 1928, and a 9-second difference for dates and times before 1900.
Run logs: You can click the Running Log tab to view the Logview details. For more information, see Use Logview V2.0 to view job running information.
Results:
Query the information of all male students and sort the results by ID in ascending order.
+------------+------------+------------+------------+------------+ | id | name | age | birth | gender | +------------+------------+------------+------------+------------+ | 1 | Zhang San | 15 | 2008-05-15 | boy | | 2 | Li Si | 17 | 2006-07-20 | boy | | 5 | Wang Wu | 17 | 2006-09-12 | boy | +------------+------------+------------+------------+------------+Change the values in the birth column by a specified unit and interval.
+------------+------------+------------+------------+---------------+ | id | name | age | birth | birth_dateadd | +------------+------------+------------+------------+---------------+ | 4 | Wang Lan | 21 | 2002-01-08 | 2002-02-08 | | 3 | Li Xia | 20 | 2003-04-20 | 2003-05-20 | | 2 | Li Si | 17 | 2006-07-20 | 2006-08-20 | | 1 | Zhang San | 15 | 2008-05-15 | 2008-06-15 | | 5 | Wang Wu | 17 | 2006-09-12 | 2006-10-12 | +------------+------------+------------+------------+---------------+
On the query results page, you can sort the results by a specified field and perform operations such as downloading the results. For more information, see Process query results.
Edit code: Advanced examples
The following topics provide more complex examples of ODPS SQL tasks:
FAQ
Q: Why is my ODPS SQL task taking a long time to run or stuck in a waiting state?
A: The execution of MaxCompute SQL tasks consumes quota resources. If a task is waiting for resources or runs for a long time, go to the MaxCompute console to check the quota resource consumption and ensure that sufficient resources are available to run the task. For more information, see Computing resources - Quota management.
Q: Why do I receive the "You have No privilege 'odps:xxxx' on xxxx" error when I run an ODPS SQL task?
A: This error occurs because the account used to run the ODPS SQL task does not have the required permissions on the corresponding MaxCompute project. Ensure that the account has the necessary permissions. For more information, see MaxCompute permissions.