All Products
Search
Document Center

DataWorks:Develop an ODPS SQL task

Last Updated:Mar 26, 2026

DataWorks provides the ODPS SQL node to periodically schedule MaxCompute SQL tasks and integrate them with other node types. MaxCompute SQL tasks use an SQL-like syntax and are suitable for distributed processing scenarios that involve large volumes of data (terabyte-scale) but do not require high real-time performance. This topic describes how to develop SQL tasks in DataWorks and provides important considerations.

Node overview

Use ODPS SQL to process and query data in MaxCompute. It supports common SQL operations such as SELECT, INSERT, UPDATE, and DELETE, along with MaxCompute-specific syntax and functions. Write SQL-like statements to query and process data without writing complex data processing logic. For more information about the SQL syntax, see SQL overview.

Prerequisites

Before you begin, ensure that:

Limits

The following limits apply when you develop ODPS SQL nodes in DataWorks:

CategoryLimit
CommentsOnly single-line comments in the format -- Comment are supported. Multi-line comments in the format /* Comment */ are not supported. For more information, see MaxCompute SQL comments. Additional restrictions: you cannot add a comment after a statement that contains only a keyword (such as SET or USE); you cannot use semicolons (;) in comments; you cannot add a comment at the end of a completed statement (a statement ending with ;).
SQL submissionSET and USE statements cannot run alone. They must be executed together with a specific SQL statement.
SQL code sizeThe SQL code cannot exceed 128 KB in size. The number of SQL commands cannot exceed 200.
Query output formatOnly statements that start with SELECT or WITH can output a formatted result set.
Query result rowsIf a query result contains more than 10,000 rows, a maximum of 10,000 rows are displayed.
Query result sizeIf the size of a query result exceeds 10 MB, the error Result is too large, exceed the limit size: 10MB is reported.
Note

If you hit the query result limits, download the results using one of these methods:

Important considerations

  • Enclose special parameters such as an OSS address in double quotation marks (""). Otherwise, the task may fail to parse and run.

  • In extreme cases such as a server power outage or an active/standby switchover, DataWorks may not fully stop the related MaxCompute task processes. If this happens, go to the computing resources of the corresponding project in MaxCompute to stop the job.

  • SET and USE statements behave differently depending on the DataWorks environment:

    • DataStudio for Data Development: All SET and USE statements in the current task are merged and used as preamble statements for all subsequent SQL statements.

    • Scheduling environment: Statements are executed in the order they are written.

    For example, given the following node code:

    SET a=b;
    CREATE TABLE name1(id string);
    SET c=d;
    CREATE TABLE name2(id string);

    The execution order in each environment is as follows:

    SQL statementDataStudio for Data DevelopmentScheduling O&M
    First SQL statementSET a=b; SET c=d; CREATE TABLE name1(id string);SET a=b; CREATE TABLE name1(id string);
    Second SQL statementSET a=b; SET c=d; CREATE TABLE name2(id string);SET c=d; CREATE TABLE name2(id string);

Edit code: Simple examples

SQL commands

MaxCompute SQL commands use a syntax similar to standard SQL. They support DDL, DML, and DQL statements, along with MaxCompute-specific commands. For more information about syntax requirements and usage examples, see SQL overview.

Note
  • When using new data types in MaxCompute V2.0 extension functions, add SET odps.sql.type.system.odps2=true; before the SQL statement and submit them together as a single job. This makes sure the new data types are processed correctly. For more information, see Data types (V2.0).

  • DataWorks scheduling parameters let you pass dynamic values to node code. Define variables in ${variable_name} format in your code, then assign values on the Scheduling Configuration tab under Parameter. For more information, see Supported formats of scheduling parameters.

Create a table

Use the CREATE TABLE statement to create non-partitioned tables, partitioned tables, foreign tables, and clustered tables. For more information, see CREATE TABLE.

-- Create a partitioned table named students.
CREATE TABLE if NOT EXISTS students
( id BIGINT,
  name STRING,
  age BIGINT,
  birth DATE)
partitioned BY (gender STRING);

Insert data

Use INSERT INTO or INSERT OVERWRITE to insert or overwrite data in a destination table. For more information, see Insert or overwrite data (INSERT INTO | INSERT OVERWRITE).

-- Insert data.
INSERT INTO students PARTITION(gender='boy') VALUES (1,'Zhang San',15,DATE '2008-05-15') ;
INSERT INTO students PARTITION(gender='boy') VALUES (2,'Li Si',17,DATE '2006-07-20') ;
INSERT INTO students PARTITION(gender='girl') VALUES (3,'Li Xia',20,DATE '2003-04-20') ;
INSERT INTO students PARTITION(gender='girl') VALUES (4,'Wang Lan',21,DATE '2002-01-08') ;
INSERT INTO students PARTITION(gender='boy') VALUES (5,'Wang Wu',17,DATE '2006-09-12') ;
Important

Avoid INSERT INTO for bulk loads — it can cause unexpected data duplication. Use INSERT OVERWRITE instead. For more information, see Insert or overwrite data.

Query data

Use the SELECT statement to perform nested queries, group queries, sorting, and more. For more information, see SELECT syntax.

-- (Optional) Enable full table scan at the project level. Requires high-level permissions.
-- SETPROJECT odps.sql.allow.fullscan=true;

-- Enable full table scan at the session level. Valid only for the current session.
SET odps.sql.allow.fullscan=true;
-- Query all male students, sorted by ID in ascending order.
SELECT * FROM students WHERE gender='boy' ORDER BY id;
Note

By default, Resource Access Management (RAM) users do not have permissions to query production tables. Request these permissions from Security Center. For more information, see Details of MaxCompute database permission control and Manage user permissions using commands.

SQL functions

MaxCompute supports built-in functions and user-defined functions (UDFs) for data development and analysis.

  • Built-in functions: Preset in MaxCompute and callable directly. For more information, see Built-in functions. The following example uses the dateadd function to shift values in the birth column by one month:

    -- Enable full table scan at the session level. Valid only for the current session.
    SET odps.sql.allow.fullscan=true;
    SELECT id, name, age, birth, dateadd(birth,1,'mm') AS birth_dateadd FROM students;
  • User-defined functions (UDFs): Write the function code, upload it as a resource, and register the function. For more information, see MaxCompute UDFs and Create and use a UDF.

Runtime and results

After running a query, results appear in a workbook. Open the results in the workbook, or copy and paste into a local Excel file. For more information, see Task debugging process.

Note

Due to an adjustment in China time zone information released by the International Organization for Standardization, a time difference may exist for certain historical dates when running related SQL statements in DataWorks: 5 minutes and 52 seconds for dates between 1900 and 1928, and 9 seconds for dates before 1900.

Click the Operational Log tab to view the log in Logview. For more information, see Use Logview V2.0 to view job running information.

The query results for the examples above are as follows:

  • Query all male students, sorted by ID in ascending order:

    +------------+------------+------------+------------+------------+
    | id         | name       | age        | birth      | gender     |
    +------------+------------+------------+------------+------------+
    | 1          | Zhang San  | 15         | 2008-05-15 | boy        |
    | 2          | Li Si      | 17         | 2006-07-20 | boy        |
    | 5          | Wang Wu    | 17         | 2006-09-12 | boy        |
    +------------+------------+------------+------------+------------+
  • Shift values in the birth column by one month:

    +------------+------------+------------+------------+---------------+
    | id         | name       | age        | birth      | birth_dateadd |
    +------------+------------+------------+------------+---------------+
    | 4          | Wang Lan   | 21         | 2002-01-08 | 2002-02-08    |
    | 3          | Li Xia     | 20         | 2003-04-20 | 2003-05-20    |
    | 2          | Li Si      | 17         | 2006-07-20 | 2006-08-20    |
    | 1          | Zhang San  | 15         | 2008-05-15 | 2008-06-15    |
    | 5          | Wang Wu    | 17         | 2006-09-12 | 2006-10-12    |
    +------------+------------+------------+------------+---------------+

On the query results page, sort results by a specified field and download them. For more information, see Process query results.

Edit code: Advanced examples

The following topics provide more advanced examples of ODPS SQL tasks:

FAQ

Why does my ODPS SQL task run for a long time or stay in the waiting state?

MaxCompute SQL tasks consume quota resources. Check quota usage in the MaxCompute console to confirm sufficient resources are available. For more information, see Computing resources - Quota management.

Why do I get the error "You have No privilege 'odps:xxxx' on xxxx" when running an ODPS SQL task?

The account running the task lacks the required permissions on the MaxCompute project. For more information, see MaxCompute permissions.