A MaxCompute SQL task uses an SQL-like syntax for distributed processing of large-scale data (TB-level) when real-time results are not required. This topic describes how to develop an SQL task in DataWorks and provides important usage notes.
Introduction
ODPS SQL is used to process and query data in MaxCompute. It supports common SQL operations, such as SELECT, INSERT, UPDATE, and DELETE, as well as syntax and functions that are specific to MaxCompute. With ODPS SQL, you can write SQL-like statements to query and process data without writing complex data processing logic. For more information about the SQL syntax, see SQL Overview.
Limitations
ODPS SQL nodes in DataWorks have the following limitations:
Category | Description |
Use of comments | Only single-line comments in the For more information, see MaxCompute SQL comments. The following limitations on comments also apply:
|
SQL submission | You cannot use SET or USE statements alone. You must execute them with other SQL statements. |
SQL development | The size of the SQL code cannot exceed 128 KB, and the number of SQL statements cannot exceed 200. |
Query result | Only SQL statements that start with SELECT or WITH can return a formatted result set. A query result is subject to the following limitations:
Note If you encounter these limitations, you can download the query result for local viewing by using one of the following methods:
|
Usage notes
Ensure that the account used to run the ODPS SQL task has the required permissions on the corresponding MaxCompute project. For more information, see DataWorks On MaxCompute permission control and MaxCompute permissions.
MaxCompute SQL tasks require quota resources to run. If a task runs for a long time, go to the MaxCompute console to check the quota consumption and ensure that sufficient resources are available to run the task. For more information, see Computing Resources - Quota Management.
When you develop a task on an ODPS SQL node, special parameters, such as an OSS address, must be enclosed in double quotation marks ("). Otherwise, a parsing error occurs and the task fails.
-
In some extreme cases, such as a server power outage or a primary/secondary switchover, DataWorks may not be able to completely terminate related MaxCompute tasks. In this situation, go to the corresponding MaxCompute project to terminate the job.
The execution order of keyword statements (SET and USE) varies by DataWorks environment.
In DataStudio: All keyword statements (SET and USE) are combined and executed before other SQL statements.
In the Scheduling environment: Statements are executed in their written order.
Assume that the following code is defined in a node.
SET a=b; CREATE TABLE name1(id string); SET c=d; CREATE TABLE name2(id string);The following table describes the execution order in different environments.
Executed SQL
DataStudio
Scheduling environment
First SQL statement
SET a=b; SET c=d; CREATE TABLE name1(id string);SET a=b; CREATE TABLE name1(id string);Second SQL statement
SET a=b; SET c=d; CREATE TABLE name2(id string);SET c=d; CREATE TABLE name2(id string);
Edit code: Simple examples
SQL commands
MaxCompute SQL commands use a syntax similar to standard SQL and support DDL, DML, and DQL statements, as well as specific MaxCompute syntax. For more information about the syntax requirements and usage examples of each SQL command, see SQL overview. The following simple example shows how to develop and run SQL commands.
When you use new data types with MaxCompute 2.0 extended functions, you need to add
SET odps.sql.type.system.odps2=true;before the SQL statement and submit them together so that the new data types can be used. For more information, see 2.0 Data Types Version.DataWorks provides scheduling parameters to enable dynamic parameter passing in scheduled scenarios. You can define variables in your ODPS SQL node code using the
${variable_name}format and assign values in the Properties > Parameter section. For more information about supported formats for scheduling parameters, see Supported formats for scheduling parameters.
Create a table
You can use the
CREATE TABLEstatement to create a non-partitioned table, partitioned table, external table, or clustered table. For more information, see CREATE TABLE. Sample SQL statement:-- Create a partitioned table named test1. CREATE TABLE if NOT EXISTS students ( id BIGINT, name STRING, age BIGINT, birth DATE) partitioned BY (gender STRING);Insert data
You can use the
INSERT INTOorINSERT OVERWRITEstatement to insert or update data in a destination table. For more information, see Insert or overwrite data (INSERT INTO | INSERT OVERWRITE). Sample SQL statements:-- Insert data. INSERT INTO students PARTITION(gender='boy') VALUES (1,'John',15,DATE '2008-05-15') ; INSERT INTO students PARTITION(gender='boy') VALUES (2,'Jack',17,DATE '2006-07-20') ; INSERT INTO students PARTITION(gender='girl') VALUES (3,'Alice',20,DATE '2003-04-20') ; INSERT INTO students PARTITION(gender='girl') VALUES (4,'Lily',21,DATE '2002-01-08') ; INSERT INTO students PARTITION(gender='boy') VALUES (5,'Bob',17,DATE '2006-09-12') ;ImportantAvoid using the
INSERT INTOstatement to insert data, as it may cause accidental data duplication. We recommend that you use theINSERT OVERWRITEstatement instead. For more information, see Insert or overwrite data.Query data
You can use the
SELECTstatement to perform operations such as nested queries, grouped queries, and sorting. For more information, see SELECT syntax. Sample SQL statements:-- (Optional) Enable full table scan at the project level. This operation requires high-level permissions. -- SETPROJECT odps.sql.allow.fullscan=true; -- Enable full table scan at the session level. This setting is valid only for the current session. SET odps.sql.allow.fullscan=true; -- Query the information about all boys and sort the results by ID in ascending order. SELECT * FROM students WHERE gender='boy' ORDER BY id;NoteBy default, a RAM user does not have query permissions on production tables. To query production tables, submit a request in Security Center. For information about MaxCompute data permission presets and access control in DataWorks, see MaxCompute data permission control details. For information about how to grant permissions by using commands in MaxCompute, see Manage user permissions by using commands.
SQL functions
MaxCompute supports built-in functions and user-defined functions (UDFs) for data development and analysis. For more information about built-in functions, see Overview of built-in functions. For more information about UDFs, see MaxCompute UDF overview. The following example shows how to use SQL functions.
Built-in functions: These functions are pre-installed in MaxCompute and can be called directly. Based on the preceding example, you can use the
dateaddfunction to modify thebirthcolumn by a specified unit and interval. Sample command:-- Enable full table scan at the session level. This setting is valid only for the current session. SET odps.sql.allow.fullscan=true; SELECT id, name, age, birth, dateadd(birth,1,'mm') AS birth_dateadd FROM students;User-defined functions (UDFs): To use a UDF, you must write the function code, upload it as a resource, and register the function. For more information, see Create and use a UDF.
Execution and results
Run results appear directly in a spreadsheet format. You can perform operations in DataWorks, open the results in a spreadsheet, or copy and paste the content into a local Excel file. For more information, see Debug tasks.
NoteDue to an adjustment in the time zone information for China released by the International Organization for Standardization (ISO), a time discrepancy may appear in the date display when you execute relevant SQL queries in DataWorks. The time difference is 5 minutes and 52 seconds for dates between 1900 and 1928, and 9 seconds for dates before 1900.
Runtime Log: Click the Operational Logs tab to view the Logview. For more information, see View job information by using Logview V2.0.
Returned results:
Query the information about all boys and sort the results by ID in ascending order.
+------------+------------+------------+------------+------------+ | id | name | age | birth | gender | +------------+------------+------------+------------+------------+ | 1 | John | 15 | 2008-05-15 | boy | | 2 | Jack | 17 | 2006-07-20 | boy | | 5 | Bob | 17 | 2006-09-12 | boy | +------------+------------+------------+------------+------------+Modify the birth column by a specified unit and interval.
+------------+------------+------------+------------+---------------+ | id | name | age | birth | birth_dateadd | +------------+------------+------------+------------+---------------+ | 4 | Lily | 21 | 2002-01-08 | 2002-02-08 | | 3 | Alice | 20 | 2003-04-20 | 2003-05-20 | | 2 | Jack | 17 | 2006-07-20 | 2006-08-20 | | 1 | John | 15 | 2008-05-15 | 2008-06-15 | | 5 | Bob | 17 | 2006-09-12 | 2006-10-12 | +------------+------------+------------+------------+---------------+
On the query result page, you can sort the results by a specific field and download the results. For more information, see Process query results.
Edit code: Advanced examples
For more advanced examples of ODPS SQL tasks, see the following topics:
FAQ
Q: Why does my ODPS SQL task remain in the waiting state for a long time?
A: ODPS SQL tasks require quota resources to run. If your task remains in the waiting state for a long time, go to the MaxCompute console to check the quota consumption and ensure that sufficient resources are available to run the task. For more information, see Computing Resources - Quota Management.
Q: Why does the error message "You have No privilege 'odps:xxxx' on xxxx" appear when I run an ODPS SQL task?
A: The account used to run the ODPS SQL task must have the required permissions on the corresponding MaxCompute project. For more information, see MaxCompute permissions.