FAQ about DML operations - MaxCompute - Alibaba Cloud Documentation Center

This topic provides answers to some frequently asked questions about DML operations on tables in MaxCompute.


Category	FAQ
Data insertion or update	Is the original data damaged if an error occurs during an INSERT operation? What do I do if the Table xxx has n columns, but query has m columns error message appears when I execute the INSERT INTO or INSERT OVERWRITE statement? What do I do if the "a single instance cannot output data to more than 10000 partitions" error message appears when I execute the INSERT INTO or INSERT OVERWRITE statement? What do I do if the "invalid dynamic partition value" error message appears when I insert a dynamic partition into a MaxCompute table? What do I do if an error occurs when I insert data of the FLOAT type into a MaxCompute table? After I execute the INSERT INTO SELECT statement and the SELECT statement for the same data, the returned results are different. Why? When I insert data into a field of the VARCHAR(10) type in the destination table, is an error returned if the data length overflows? What do I do if the "Transaction timeout because cannot acquire exclusive lock" error message appears when I execute a MaxCompute SQL statement? How do I update data in a MaxCompute table or partition?
Data deletion	How do I delete data from a MaxCompute table or partition? If a non-partitioned table contains a large amount of data, how do I delete duplicate data from the table?

Is the original data damaged if an error occurs during an INSERT operation?

No, the original data is not damaged. MaxCompute is atomic. If the INSERT operation succeeds, data is updated. If the INSERT operation fails, data is rolled back.

What do I do if the `Table xxx has n columns, but query has m columns` error message appears when I execute the INSERT INTO or INSERT OVERWRITE statement?

When you execute the INSERT INTO or INSERT OVERWRITE statement to insert data, you must make sure that the field sequence, field types, and the total number of fields in the returned results of the SELECT statement match those in the destination table. MaxCompute does not allow you to specify a field in an INSERT statement. If you want to insert null values or other default values into some columns, you can specify null or the default values in the SELECT statement. For example, you can use select 'a', null, col_name from table_name; with an INSERT statement to insert the a and null values into the first two columns and insert the return values of the col_name column into the third column.

What do I do if the "a single instance cannot output data to more than 10000 partitions" error message appears when I execute the INSERT INTO or INSERT OVERWRITE statement?

Problem description
When I execute the INSERT INTO or INSERT OVERWRITE statement, the following error message appears:
```
FAILED: ODPS-0123031:Partition exception - a single instance cannot output data to more than 10000 partitions
```
Cause
A MaxCompute table can contain a maximum of 60,000 partitions. However, a maximum of only 10,000 partitions are allowed in the output table for a job. In most cases, this error is caused by the incorrect configuration of partition fields. For example, partitioning based on the ID field causes excessive partitions.
Solution
In most cases, the number of partitions is large if the output table of a job contains thousands of dynamic partitions. If the number of partitions in the table exceeds 10,000, an error may occur in the business logic or SQL syntax. If no logic or syntax error occurs, we recommend that you modify the partition fields of the partitioned table or split the business logic into multiple jobs to avoid this error.

What do I do if the "invalid dynamic partition value" error message appears when I insert a dynamic partition into a MaxCompute table?

Problem description
When I insert a dynamic partition into a MaxCompute table, the following error message appears:
```
FAILED: ODPS-0123031:Partition exception - invalid dynamic partition value: province=上海
```
Cause
The dynamic partition is invalid. Dynamic partitioning is performed based on specified fields. Fields that contain special characters or Chinese characters cannot be used as dynamic partition fields.
Solution
Before you insert dynamic partitions into a MaxCompute table, take note of the following rules:
- When you insert dynamic partitions into a MaxCompute table in a distributed environment, a maximum of 512 dynamic partitions can exist in a single process.
- An SQL statement that is used for dynamic partitioning cannot generate more than 2,000 dynamic partitions.
- Partition values that are dynamically generated cannot be NULL.
- If the destination table contains multi-level partitions, you can specify some partitions as static partitions in an INSERT statement. However, the static partitions must be high-level partitions.

What do I do if an error occurs when I insert data of the FLOAT type into a MaxCompute table?

For more information about the basic data types supported by the MaxCompute V2.0 data type edition, see Data type editions. Data of the FLOAT data type does not contain constants. To insert data of this type, you can first use the CAST function to convert data into the FLOAT data type. For example, you can use cast( 5.1as float) to convert '5.1' of the STRING type into 5.1 of the FLOAT type.

If you want to use new data types, such as TINYINT, SMALLINT, INT, FLOAT, VARCHAR, TIMESTAMP, or BINARY, in MaxCompute SQL, you must run one of the following commands to enable the MaxCompute V2.0 data type edition for a session or project:

To enable the MaxCompute V2.0 data type edition for a session, you must add set odps.sql.type.system.odps2=true; before the SQL statement, and then commit and execute them together.
To enable the MaxCompute V2.0 data type edition for a project, you must run the setproject odps.sql.type.system.odps2=true; command for the project. This command must be run by the project owner.

After I execute the INSERT INTO SELECT statement and the SELECT statement for the same data, the returned results are different. Why?

Problem description
After I separately execute the INSERT INTO SELECT statement and the SELECT statement for the same field of the STRING type, the decimal places in the returned result are different. In the result of the SELECT statement, two decimal places are reserved. In the result of the INSERT INTO SELECT statement, multiple decimal places are displayed.
Cause
If the field in the INSERT INTO SELECT statement is of the STRING type, the field is first converted into the DOUBLE type. Then, the ROUND operation is performed on the field of the DOUBLE type when the field is implicitly converted into the DECIMAL type. Data of the DOUBLE type is inaccurate. Therefore, multiple decimal places may still be displayed even if the ROUND operation is performed.
Solution
We recommend that you explicitly convert data types. You can add the following statement to use CAST to explicitly convert the data type to the DECIMAL type.
```
case when pcm.abc is null then 0 
                    else round(cast(pcm.abc as decimal) ,2) 
                end abc                        
```

When I insert data into a field of the VARCHAR(10) type in the destination table, is an error returned if the data length overflows?

When data is inserted into a field of the VARCHAR(10) type, the data is truncated and no error is returned if the data length overflows.

What do I do if the "Transaction timeout because cannot acquire exclusive lock" error message appears when I execute a MaxCompute SQL statement?

Problem description

When I execute a MaxCompute SQL statement, the following error message appears:

Failed to run ddltask - Modify DDL meta encounter exception : ODPS-0121096:MetaStore transaction conflict - Reached maximum retry times because of OTSStorageTxnLockKeyFail(Inner exception: Transaction timeout because cannot acquire exclusive lock.)

Cause
MaxCompute allows multiple jobs to write data to a single table in multiple jobs at the same time. If multiple jobs commit metadata at the same time, each job needs to lock, write, and then unlock the metadata on a single table. If multiple jobs write data to a table at the same time, the metadata of the table remains in the locking state. In this case, some of the jobs may fail to hold the lock until the locking attempt times out. As a result, the cannot acquire exclusive lock error message is returned. The timeout period of a locking attempt for a job is about 30 seconds. If the locking attempt exceeds the timeout period, the error message is returned. The locking operation takes effect on tables. If an excessive number of jobs write metadata to a table at the same time or a large amount of metadata is written to multiple partitions of a table, the error message is returned when a job that attempts to write metadata to the table fails to lock the table.
Solution
Check whether multiple read and write operations are performed on a table or a partition of the table at the same time. We recommend that you do not perform multiple read and write operations on a table or a partition of the table at the same time.

How do I update data in a MaxCompute table or partition?

MaxCompute allows you to execute the UPDATE statement to update data of specific rows in transactional tables.

If the table is not a transactional table, you must import data from the source partition or the source table to a new partition or table. The update operation is performed during the import. The name of the new partition or new table can be the same as the name of the source partition or source table. This way, the data is updated in the source partition or source table.

How do I delete data from a MaxCompute table or partition?

MaxCompute allows you to execute the DELETE statement to delete data from specific rows in transactional tables.

If the table is not a transactional table, use the following method to delete data:

Run the drop command to delete the table. This way, the data is also deleted.
If the table is a non-partitioned table, you can run the truncate table table_name; command to delete data from the table or run the insert overwrite command to perform a similar operation.
- Example 1: Delete the data records in which the value of Col is 1 from TableA. Sample command:
```
insert overwrite table TableA select a,b,c.... from TableA where Col <> 1;
```
- Example 2: Delete all data. Sample command:
```
insert overwrite table TableA select a,b,c.... from TableA where 1=2;
```
If you want to delete data from a partitioned table, you can run the alter table table_name drop if exists partition(Partition key column='Specific partition value') command to delete the partition. This way, the data in the partition is deleted.
For example, if the partition key column of the testtable table is ds, run the following command to delete the ds='20170520' partition:
```
alter table testtable drop if exists partition (ds='20170520');
```
Execute the INSERT statement and the WHERE clause to import the required data to a new partition or table. If you execute the INSERT statement, the source table and destination table can be the same.
```
insert overwrite table sale_detail select * from sale_detail where name='mengyonghui';
```

If a non-partitioned table contains a large amount of data, how do I delete duplicate data from the table?

If each column in the non-partitioned table contains duplicate data, you can perform the GROUP BY operation on all columns. For example, if the columns of the non-partitioned table table1 are c1, c2, and c3. You can run the following command to delete duplicate data from the table:

insert overwrite table table1 select c1, c2, c3 from table1 group by

 c1, c2, c3;

Note Before you perform this operation, we recommend that you back up data and evaluate whether the cost of this operation is lower than the cost of data import based on the amount of data.