All Products
Search
Document Center

MaxCompute:Basic differences between MaxCompute SQL and standard SQL and the related solutions

Last Updated:Dec 17, 2024

This topic describes basic differences between MaxCompute SQL and standard SQL and the related solutions

Basic differences between MaxCompute SQL and standard SQL

Item

Issue

Solution

Scenarios

MaxCompute SQL does not support transactional statements, including COMMIT and ROLLBACK. We recommend that you do not use INSERT INTO in MaxCompute SQL.

We recommend that the code be written with idempotence and support re-running. We recommend that you use INSERT OVERWRITE to write data.

MaxCompute SQL does not support indexes and primary key constraints.

N/A.

Specific fields do not support default values or default functions.

If a field has a default value, you can assign a value to the field during data writing. MaxCompute allows you to add default values to fields of the BIGINT, DOUBLE, BOOLEAN, and STRING types when you create a table.

MaxCompute SQL does not support auto-increment fields.

N/A.

Table partitions

A table can contain a maximum of 60,000 partitions. If the number of partitions exceeds 60,000, an error is reported.

Select appropriate partition key columns to reduce the number of partitions.

You cannot query data of more than 10,000 partitions at a time. Otherwise, an error is reported. If a partitioned table contains two levels of partitions and data is filtered based only on level-2 partitions in a query, an error may be reported when the total number of partitions is greater than 10,000.

For more information about the solution to this issue, see What do I do if the "a single instance cannot output data to more than 10000 partitions" error message appears when I execute the INSERT INTO or INSERT OVERWRITE statement?

Precision

The DOUBLE data type does not have high precision.

We recommend that you do not use an equal sign (=) to associate two fields of the DOUBLE type. We recommend that you subtract the values of the two fields. If the difference between the values is less than the preset value, the two values are considered the same. For example, if the result of ABS(a1-a2) is less than 0.000000001, a1 and a2 are considered the same.

MaxCompute supports the high-precision data type DECIMAL. However, higher precision may be required.

If your business requires higher precision, you can store data as the STRING type and then use user-defined functions (UDFs) to perform calculations.

Data type conversions

Various unexpected errors or code maintenance issues occur during data type conversions.

If you want to perform a JOIN operation on two fields of different data types, we recommend that you convert the field data types before you perform the JOIN operation.

Implicit conversion is performed between the DATE and STRING data types

If you want to pass a field of the STRING data type into a function that requires a field of the DATE data type, the conversion between the STRING and DATE data types is performed based on the yyyy-mm-dd hh:mi:ss format.

Differences between DDL and DML and the related solutions

Item

Issue

Solution

Table schema

The value of the partition key column can be changed, but the name of the partition key column cannot be changed.

For more information about the solution to this issue, see What is the difference between partitions and partition key columns?

Columns can be added but cannot be removed. The data types of columns cannot be changed.

For more information about the solution to this issue, see How do I change the data type of a column? and How do I delete a column from a table?

INSERT

For MaxCompute SQL, you must add the keyword TABLE after the INSERT INTO or INSERT OVERWRITE statement.

N/A.

When data is inserted into a table, the field mapping is performed based on the field sequence of the SELECT statement and the field sequence in the table rather than the aliases in the SELECT statement.

N/A.

UPDATE and DELETE

You can execute UPDATE and DELETE statements only on transactional tables.

For more information about the solution to this issue, see How do I delete data from a MaxCompute table or partition? and How do I update data in a MaxCompute table or partition?

SELECT

A MaxCompute SQL statement can perform MAPJOIN on a maximum of six small tables and can consecutively join a maximum of 16 tables.

For more information about the solution to this issue, see What do I do if the "Maximum 16 join inputs allowed" error message appears when I perform a JOIN operation?

IN and NOT IN

The number of partitions returned by the subquery that follows IN, NOT IN, EXISTS, or NOT EXISTS cannot exceed 1,000.

For more information about the solution to this issue, see When I execute a MaxCompute SQL statement with NOT IN that is followed by a subquery, the subquery is expected to return tens of thousands of data records. However, if a subquery that follows IN or NOT IN returns partition data, the maximum number of data records that can be returned is 1,000. How do I ensure that the subquery returns the expected number of data records and the logic of NOT IN is implemented? If your business can ensure uniqueness of the subquery results, you can remove DISTINCT to improve query performance.

10,000 results returned by MaxCompute SQL

MaxCompute limits the maximum number of data records that can be returned by a single SELECT statement.

For more information about the solution to this issue, see LIMIT.

You want to query a large number of data records.

For more information about the solution to this issue, see How do I obtain all data if the number of query results exceeds 10,000 when I use SQLTask to execute an SQL statement?

MAPJOIN

The JOIN clause does not support the Cartesian product.

You must use the ON keyword to configure join conditions in a JOIN clause.

If you want to use small tables as broadcast tables, you must use MAPJOIN HINT.

ORDER BY

ORDER BY must be used together with LIMIT N.

If you want to sort a large amount of data or even sort all data in a table, you can increase the value of N. For more information about the solution to this issue, see After I query data in MaxCompute, how are the query results sorted?

UNION ALL

The number of columns in all tables that are involved in UNION ALL must be consistent. Otherwise, an error is reported.

The number of columns and the names and data types of all columns involved in UNION ALL must be consistent.

A UNION ALL query must be nested within a subquery.

N/A.