This topic describes basic differences between MaxCompute SQL and standard SQL and the related solutions
Basic differences between MaxCompute SQL and standard SQL
Item | Issue | Solution |
Scenarios | MaxCompute SQL does not support transactional statements, including COMMIT and ROLLBACK. We recommend that you do not use INSERT INTO in MaxCompute SQL. | We recommend that the code be written with idempotence and support re-running. We recommend that you use INSERT OVERWRITE to write data. |
MaxCompute SQL does not support indexes and primary key constraints. | N/A. | |
Specific fields do not support default values or default functions. | If a field has a default value, you can assign a value to the field during data writing. MaxCompute allows you to add default values to fields of the BIGINT, DOUBLE, BOOLEAN, and STRING types when you create a table. | |
MaxCompute SQL does not support auto-increment fields. | N/A. | |
Table partitions | A table can contain a maximum of 60,000 partitions. If the number of partitions exceeds 60,000, an error is reported. | Select appropriate partition key columns to reduce the number of partitions. |
You cannot query data of more than 10,000 partitions at a time. Otherwise, an error is reported. If a partitioned table contains two levels of partitions and data is filtered based only on level-2 partitions in a query, an error may be reported when the total number of partitions is greater than 10,000. | For more information about the solution to this issue, see What do I do if the "a single instance cannot output data to more than 10000 partitions" error message appears when I execute the INSERT INTO or INSERT OVERWRITE statement? | |
Precision | The DOUBLE data type does not have high precision. | We recommend that you do not use an equal sign (=) to associate two fields of the DOUBLE type. We recommend that you subtract the values of the two fields. If the difference between the values is less than the preset value, the two values are considered the same. For example, if the result of |
MaxCompute supports the high-precision data type DECIMAL. However, higher precision may be required. | If your business requires higher precision, you can store data as the STRING type and then use user-defined functions (UDFs) to perform calculations. | |
Data type conversions | Various unexpected errors or code maintenance issues occur during data type conversions. | If you want to perform a JOIN operation on two fields of different data types, we recommend that you convert the field data types before you perform the JOIN operation. |
Implicit conversion is performed between the DATE and STRING data types | If you want to pass a field of the STRING data type into a function that requires a field of the DATE data type, the conversion between the STRING and DATE data types is performed based on the |
Differences between DDL and DML and the related solutions
Item | Issue | Solution |
Table schema | The value of the partition key column can be changed, but the name of the partition key column cannot be changed. | For more information about the solution to this issue, see What is the difference between partitions and partition key columns? |
Columns can be added but cannot be removed. The data types of columns cannot be changed. | For more information about the solution to this issue, see How do I change the data type of a column? and How do I delete a column from a table? | |
INSERT | For MaxCompute SQL, you must add the keyword TABLE after the INSERT INTO or INSERT OVERWRITE statement. | N/A. |
When data is inserted into a table, the field mapping is performed based on the field sequence of the SELECT statement and the field sequence in the table rather than the aliases in the SELECT statement. | N/A. | |
UPDATE and DELETE | You can execute UPDATE and DELETE statements only on transactional tables. | For more information about the solution to this issue, see How do I delete data from a MaxCompute table or partition? and How do I update data in a MaxCompute table or partition? |
SELECT | A MaxCompute SQL statement can perform MAPJOIN on a maximum of six small tables and can consecutively join a maximum of 16 tables. | For more information about the solution to this issue, see What do I do if the "Maximum 16 join inputs allowed" error message appears when I perform a JOIN operation? |
IN and NOT IN | The number of partitions returned by the subquery that follows IN, NOT IN, EXISTS, or NOT EXISTS cannot exceed 1,000. | For more information about the solution to this issue, see When I execute a MaxCompute SQL statement with NOT IN that is followed by a subquery, the subquery is expected to return tens of thousands of data records. However, if a subquery that follows IN or NOT IN returns partition data, the maximum number of data records that can be returned is 1,000. How do I ensure that the subquery returns the expected number of data records and the logic of NOT IN is implemented? If your business can ensure uniqueness of the subquery results, you can remove DISTINCT to improve query performance. |
10,000 results returned by MaxCompute SQL | MaxCompute limits the maximum number of data records that can be returned by a single SELECT statement. | For more information about the solution to this issue, see LIMIT. |
You want to query a large number of data records. | For more information about the solution to this issue, see How do I obtain all data if the number of query results exceeds 10,000 when I use SQLTask to execute an SQL statement? | |
MAPJOIN | The JOIN clause does not support the Cartesian product. | You must use the ON keyword to configure join conditions in a JOIN clause. If you want to use small tables as broadcast tables, you must use MAPJOIN HINT. |
ORDER BY | ORDER BY must be used together with LIMIT N. | If you want to sort a large amount of data or even sort all data in a table, you can increase the value of N. For more information about the solution to this issue, see After I query data in MaxCompute, how are the query results sorted? |
UNION ALL | The number of columns in all tables that are involved in UNION ALL must be consistent. Otherwise, an error is reported. | The number of columns and the names and data types of all columns involved in UNION ALL must be consistent. |
A UNION ALL query must be nested within a subquery. | N/A. |