All Products
Search
Document Center

DataWorks:MaxCompute SQL node

Last Updated:Jun 09, 2026

The MaxCompute SQL node in DataWorks allows you to periodically schedule MaxCompute SQL tasks. These tasks can be integrated with other node types in a unified workflow. MaxCompute SQL uses a SQL-like syntax, making it ideal for distributed processing of large-scale (TB-level) data where real-time results are not critical. This topic explains how to develop MaxCompute SQL node tasks in DataWorks.

Introduction

MaxCompute SQL is used to process and query data in MaxCompute. It supports common SQL operations, such as SELECT, INSERT, UPDATE, and DELETE, along with specific MaxCompute syntax and functions. You can query and process data by writing SQL-like statements, eliminating the need for complex programming. For more information about the SQL syntax, see SQL overview.

Prerequisites

  • You have bound a MaxCompute compute engine to the DataWorks workspace.

  • (Optional, for RAM users) The RAM user responsible for task development must be a member of the workspace and have the Development or Workspace Administrator role. The Workspace Administrator role includes extensive permissions and should be granted with caution. For more information about how to add a member to a workspace, see Add members to a workspace.

    Note

    If you are using an Alibaba Cloud account, you can skip this step.

Limitations

The MaxCompute SQL node has the following limitations for SQL development:

Category

Description

Comments

Only single-line comments that start with -- are supported. Multi-line comments (/* ... */) are not supported.

For more information, see MaxCompute SQL comments.

The following limitations also apply to comments:

  • You cannot add a comment after a statement that contains only a keyword, such as SET or USE.

  • Do not use a semicolon (;) in a comment.

  • Do not add a comment after a statement that ends with a semicolon (;), which marks the end of the statement.

SQL submission

ODPS SQL does not support using SET and USE statements independently, and they must be executed with a specific SQL statement.

SQL development

The SQL code cannot exceed 128 KB in size, and the number of SQL commands cannot exceed 200.

Query results

Only statements that start with SELECT or WITH produce a formatted result set.

The following limitations apply to query results:

  • If a query result exceeds 10,000 rows, it is truncated to a maximum of 10,000 rows.

  • If a query result exceeds 10 MB in size, the error Result is too large, exceed the limit size: 10MB is reported.

Note

If you encounter these limitations, you can download the query results by using one of the following methods:

Usage notes

  • Ensure that the account used to run the MaxCompute SQL task has the required permissions on the corresponding MaxCompute project. For details, see DataWorks On MaxCompute permission control and MaxCompute permissions.

  • MaxCompute SQL tasks consume computing quota. If a task runs for a long time, go to the MaxCompute console to monitor quota consumption and ensure that sufficient resources are available to run the task. For more information, see Computing resources-quota management.

  • When you develop MaxCompute SQL node tasks, special parameters such as Object Storage Service (OSS) addresses must be enclosed in double quotation marks (""). If you do not add quotation marks, the task may fail to parse and run.

  • The execution order of statements that contain keywords such as SET and USE varies depending on the DataWorks environment. For more information, see Appendix 1: SQL execution order in different environments.

  • In some extreme cases, such as a server power outage or a primary/secondary switchover, DataWorks may not be able to completely terminate related MaxCompute tasks. In this situation, go to the corresponding MaxCompute project to terminate the job.

Create MaxCompute SQL node

For instructions, see Create a MaxCompute SQL node.

Develop MaxCompute SQL node

On the node editing page, perform the following development operations.

Develop SQL code

DataWorks provides scheduling parameters to dynamically pass values to your code at runtime. In a MaxCompute SQL node, you can define variables by using the ${variable_name} format and assign values to them in the Scheduling Parameters section of the Scheduling Settings pane. For information about the supported formats of scheduling parameters, see Sources and expressions for scheduling parameters. MaxCompute SQL commands are similar to standard SQL syntax and support Data Definition Language (DDL), Data Manipulation Language (DML), and Data Query Language (DQL) statements, as well as specific MaxCompute syntax. For detailed syntax and examples, see SQL overview.

The following sections provide examples for different use cases:

Note
  • If you use a new data type in a MaxCompute 2.0 extension function, you must add SET odps.sql.type.system.odps2=true; before the SQL statement of the function and submit them together. This command enables the new data types for the session. For more information about 2.0 data types, see Data type editions (2.0).

  • The execution order of MaxCompute SQL statements varies between Data Studio and Operation Center. For more information, see Appendix 1: SQL execution order in different environments.

Create table

You can use the CREATE TABLE statement to create non-partitioned tables, partitioned tables, external tables, and clustered tables. For more information, see CREATE TABLE. The following SQL statement is an example:

-- Create a partitioned table named students.
CREATE TABLE IF NOT EXISTS students
( id BIGINT,
  name STRING,
  age BIGINT,
  birth DATE)
partitioned BY (gender STRING); 
Insert data

You can use INSERT INTO or INSERT OVERWRITE statements to insert or overwrite data into the target table. For more information, see Insert or overwrite data (INSERT INTO | INSERT OVERWRITE).

Avoid using the INSERT INTO statement, which may cause unexpected data duplication. We recommend that you use INSERT OVERWRITE instead. For more information, see Insert or overwrite data.

The following code is an example:

-- Insert data.
INSERT OVERWRITE students PARTITION(gender='boy') VALUES (1,'ZhangSan',15,DATE '2008-05-15') ;

An INSERT statement can trigger a DDL field comparison to compare the SQL statement in the SELECT clause with the fields of the target table.

This feature is not supported if the three-layer schema model is enabled for the MaxCompute project but not at the tenant level.
-- Compare DDL fields.
INSERT OVERWRITE TABLE dws_user_info_all_di PARTITION (dt='${workflow.var}')
SELECT COALESCE(a.uid, b.uid) AS uid
    , b.gender
    , b.age_range
    , b.zodiac
    , a.region
    , a.device
    , a.identity
    , a.method
    , a.url
    , a.referer
    , a.time
-- The FROM/JOIN clauses are omitted here. Add them based on your business logic.
;
Query data

You can use a SELECT statement to perform operations such as nested queries, grouped queries, and sorting. For more information, see SELECT Syntax. The following is an SQL example:

-- (Optional) Enable full table scan at the project level. This operation requires high privileges.
-- SETPROJECT odps.sql.allow.fullscan=true;  
-- Enable full table scan at the session level. This is valid only for the current session.
SET odps.sql.allow.fullscan=true; 
-- Query the information of all male students and sort the results by ID in ascending order.
SELECT * FROM students WHERE gender='boy' ORDER BY id;
Note

By default, a RAM user cannot query production tables. To obtain these permissions, you must apply for them in Security Center. For information about data permission presets and access control for MaxCompute in DataWorks, see Data permission control details for MaxCompute. For information about how to grant permissions by using commands in MaxCompute, see Manage user permissions by using commands.

Use SQL functions

MaxCompute supports built-in functions and user-defined functions (UDFs) for data development and analysis. You can create and use SQL functions based on your business requirements. For more information about built-in functions, see Built-in functions overview. For more information about UDFs, see MaxCompute UDF overview. The following examples describe how to use SQL functions.

  • Built-in functions: Built-in functions are functions that are provided by MaxCompute. You can call them directly. Based on the preceding examples of creating a table, inserting data, and querying data, the following sample command uses the dateadd function to modify the birth column by a specified unit and amount:

    -- Enable full table scan at the session level. This is valid only for the current session.
    SET odps.sql.allow.fullscan=true; 
    SELECT id, name, age, birth, dateadd(birth,1,'mm') AS birth_dateadd FROM students;
  • User-defined function (UDF): To use a UDF, you must write the function code, upload the code as a resource, and then register the function. For specific steps, see Resource Management.

Debug MaxCompute SQL node

  1. In the node editor, configure the parameters in the Run Configuration pane on the right.

    Parameter

    Description

    Compute engine

    Select the MaxCompute compute engine that is bound to your workspace.

    Computing quota

    Select a computing quota that you created. The quota provides the required computing resources, such as CPU and memory, for the computing job.

    If no computing quota is available, click Create Computing Quota in the drop-down list to go to the MaxCompute console and configure a quota.

    Resource group

    Select a scheduling resource group that can connect to the compute engine. For more information, see Network connection solutions.

  2. In the parameter dialog box on the toolbar, select the created MaxCompute data source and click Run to run the MaxCompute SQL task.

View results

  • You can open the results in a spreadsheet or copy the content to a local file.

    Note

    Due to a historical time zone adjustment for China, SQL queries in DataWorks may return results with a time offset for specific periods. The offset is 5 minutes and 52 seconds for dates and times from 1900 to 1928, and 9 seconds for dates and times before 1900.

  • To view run logs, click the LogView link on the image tab. For more information, see View job information by using Logview 2.0.

  • To sort the results, click the drop-down arrow in a column header, select an ascending or descending order from the Sort list, and then click OK.

  • To view BLOB fields, double-click a BLOB cell in the run results. MaxCompute supports the BLOB data type for storing binary objects such as images and audio files. In the Current Field Value window that appears, you can preview the content. Images are rendered directly, and text is displayed in read-only format. You can switch between Blob View, Text View, and JSON View by using the buttons at the bottom of the window.

Next steps

  • Node scheduling configuration: If a node in the project directory needs to run periodically, you need to set the Scheduling Policy and configure related scheduling properties in the Scheduling Settings section on the right side of the node.

  • Publishing a node: If you need to publish a task to the production environment for execution, click the image icon to start the publishing process. Nodes in the project directory are periodically scheduled only after they are published to the production environment.

Appendix 1: SQL execution order in different environments

The execution order of statements that contain keywords, such as SET and USE, in a MaxCompute SQL node varies depending on the DataWorks environment.

  • In Data Studio: All SET and USE statements in your code are consolidated and executed before other SQL statements.

  • In Operation Center (for scheduling and O&M): The statements are executed in the order in which they are written.

Assume that the following code is defined in the node.

SET a=b;
CREATE TABLE name1(id string);
SET c=d;
CREATE TABLE name2(id string);

The execution order in different environments is as follows:

SQL statement

Data Studio

Scheduling and O&M

First SQL statement

SET a=b;
SET c=d;
CREATE TABLE name1(id string);
SET a=b;
CREATE TABLE name1(id string);

Second SQL statement

SET a=b;
SET c=d;
CREATE TABLE name2(id string);
SET c=d;
CREATE TABLE name2(id string);

Appendix 2: Data lakehouse practices

To read data from or write data to Data Lake Formation (DLF) tables in a MaxCompute SQL task, you can use a MaxCompute External Project. An External Project allows real-time access to metadata and data by mapping a DLF catalog. It delegates permission management to DLF and supports accessing and manipulating metadata for data stored in DLF-managed OSS. For more information, see Paimon DLF External Project.

Related topics

For more examples of MaxCompute SQL tasks, see the following topics: