MaxCompute allows you to execute the DELETE or UPDATE statement to delete or update data of specific rows in transactional tables. This feature is in public preview. If you execute the DELETE or UPDATE statement to delete or update the data in a table, you cannot restore the data after it is deleted or updated. Before you use this feature, we recommend that you execute the SELECT and INSERT statements to back up the data to another table. During the public preview, you are not charged for executing the DELETE or UPDATE statement on transactional tables. However, you are charged for executing other query statements on transactional tables. In public preview, Alibaba Cloud does not ensure the usage of this feature in production environments. We recommend that you back up your data in advance.

You can execute the statements that are described in this topic on the following platforms:

Prerequisites

The Select and Update permissions to read data from and update data in a transactional table are granted. For more information, see Authorize users.

Feature description

Similar to traditional SQL statements, the DELETE and UPDATE statements in MaxCompute can be used to delete or update data of specific rows in tables.

Each time you execute the DELETE or UPDATE statement, a delta file is automatically generated, which stores the information about the delete or update operation. This file is invisible. The following part describes how the delta files are generated.
  • DELETE: A delta file contains the txnid and rowid fields, both of which have a value of the BIGINT type. The rowid field indicates the deleted row in the base files of the transactional table. The txnid field indicates the delete operation that is performed on the row.

    For example, a base file of the t1 table is f1 and the content of the base file is a, b, c, a, b. When you execute the delete from t1 where c1='a'; statement, a delta file named f1.delta is generated. Assume that the value of the txnid field is t0. If the content of the f1.delta file is ((0, t0), (3, t0)), the rows whose IDs are 0 and 3 are deleted in the t0 transaction. If you execute another DELETE statement on the t1 table again, another delta file named f2.delta is generated. The file name is generated based on the f1 base file. When you query the data in the t1 table, the system filters the deleted data based on the f1, f1.delta and f2.delta files and returns the data that is not deleted.

  • UPDATE: The logic of an UPDATE statement is converted to the logic of performing the DELETE and INSERT INTO operations.
The DELETE and UPDATE statements have the following benefits:
  • Reduce the amount of data to be read and written

    Before the DELETE and UPDATE statements are provided, MaxCompute allows you to execute only the INSERT INTO or INSERT OVERWRITE statement to update or delete data in tables. For more information, see Insert or update data into a table or static partitions (INSERT INTO and INSERT OVERWRITE). Assume that you want to update a small amount of data in a table or a partition of the table. If you want to perform an INSERT operation, you must first execute a SELECT statement to read all data from the table and update the data as needed. Then, you can perform the INSERT operation to insert all data into the table. This method is inefficient. However, if you use the DELETE or UPDATE statement in the preceding scenario, the system does not need to read all data in the table. This way, the amount of data to be read and written is reduced.

  • Read the table with the latest data

    Before the DELETE and UPDATE statements are provided, MaxCompute allows you to use history tables to update multiple data entries in a table. If you use a history table, you must add auxiliary columns such as start_date and end_date in the table. These columns indicate the lifecycle of a data entry. To query the latest data of a table, the system must identify the latest data from large amounts of data based on the timestamps. This is a time-consuming process. However, you can execute the DELETE or UPDATE statement to delete or update data. When you query the data in a table, the system reads the latest data of the table based on the base files and all delta files.

Notice After you execute the DELETE and UPDATE statements multiple times on a transactional table, the transactional table occupies a larger storage space. In this case, the costs of the storage and subsequent queries on the table are increased. In addition, the efficiency of subsequent queries is reduced. To resolve these issues, we recommend that you execute the ALTER TABLE COMPACT statement to merge the base files with all delta files on a regular basis. For more information, see ALTER TABLE COMPACT.

If multiple jobs are concurrently run on a table, conflicts may occur. For more information, see ACID semantics.

Scenarios

You can execute the DELETE or UPDATE statement to delete or update a small amount of data in tables or partitions of the tables at a low frequency. For example, delete or update less than 5% of the data in a table or a partition of the table on the next day after the data is generated.

The DELETE or UPDATE statement is not applicable if you want to delete or update data at a high frequency, or write data to tables in real time.

Limits

The DELETE statement, UPDATE statement, and transactional tables on which the DELETE or UPDATE statement is executed have the following limits:
  • You can execute the DELETE and UPDATE statements only on transactional tables. For more information about transactional tables, see Table operations.
  • When you create a clustered table or an external table, you cannot specify it as a transactional table.
  • You cannot change transactional tables to MaxCompute internal tables, external tables, or clustered tables, and vice versa.
  • Jobs from other systems, such as MaxCompute Spark, Machine Learning Platform for AI, and Graph, cannot access transactional tables.
  • CLONE TABLE and MERGE PARTITION operations are not supported.
  • Before you perform the UPDATE, DELETE, or INSERT OVERWRITE operation on transactional tables, you must perform SELECT and INSERT operations to back up data in transactional tables to other tables.

Usage notes

Take note of the following items when you execute the DELETE or UPDATE statement to delete or update data in tables or partitions of the tables:
  • In specific scenarios, you may want to perform a large number of delete or update operations on a table or perform subsequent write and read operations at a high frequency. To reduce the storage space that is occupied by the table, we recommend that you use the DELETE or UPDATE statement and merge the base files with all delta files after multiple delete or update operations. For more information, see ALTER TABLE COMPACT.
  • In specific scenarios, you may want to delete or update more than 5% of the data in a table or a partition of the table at a low frequency and perform frequent subsequent queries. We recommend that you execute the INSERT OVERWRITE or INSERT INTO statement in such scenarios. For more information, see Insert or update data into a table or static partitions (INSERT INTO and INSERT OVERWRITE).

    For example, you want to perform 10 delete or update operations each day and delete or update 10% of the data in a table each time. In this case, we recommend that you estimate the total cost and the consumption of the subsequent read performance if you execute the DELETE or UPDATE statement on the table. Then, compare the estimated result with that of executing the INSERT OVERWRITE or INSERT INTO statement. This helps you choose an efficient method.

  • Each time you execute the DELETE statement on a table, a delta file is automatically generated. As a result, the occupied storage space may not be reduced. If you want to execute the DELETE statement to delete data to reduce storage usage, you can merge the base files with all delta files. For more information, see ALTER TABLE COMPACT.

DELETE

You can execute the DELETE statement to delete one or more rows that meet the specified conditions in partitioned transactional tables or non-partitioned transactional tables.

  • Syntax
    delete from <table_name> [where <where_condition>];
  • Parameters
    • table_name: required. Specify the name of the transactional table on which you want to execute the DELETE statement.
    • where_condition: optional. Specify a WHERE clause that is used to filter data based on conditions. For more information, see where_condition. If you execute the DELETE statement on a table without a WHERE clause, all data in the table is deleted.
  • Examples
    • Example 1: Create a non-partitioned transactional table named acid_delete and insert data into the table. Then, execute the DELETE statement to delete the rows that meet the specified conditions from the table. The following statements show an example:
      -- Create a non-partitioned transactional table named acid_delete.
      create table if not exists acid_delete(id bigint) tblproperties ("transactional"="true"); 
      
      -- Insert data into the table.
      insert overwrite table acid_delete values(1),(2),(3),(2); 
      
      -- Query the table to check whether data is inserted.
      select * from acid_delete; 
      
      +------------+
      | id         |
      +------------+
      | 1          |
      | 2          |
      | 3          |
      | 2          |
      +------------+
      
      -- Delete the rows whose value of the id column is 2. If you want to execute the statement on the odpscmd client of MaxCompute, you must enter yes or no to confirm the deletion.
      delete from acid_delete where id = 2; 
      
      -- Query the table to check whether the table contains only the rows whose values of the id column are 1 and 3.
      select * from acid_delete; 
      
      +------------+
      | id         |
      +------------+
      | 1          |
      | 3          |
      +------------+
    • Example 2: Create a partitioned transactional table named acid_delete_pt and insert data into the table. Then, execute the DELETE statement to delete the rows that meet the specified conditions from the table. The following statements show an example:
      -- Create a partitioned transactional table named acid_delete_pt.
      create table if not exists acid_delete_pt(id bigint) partitioned by(ds string) tblproperties ("transactional"="true");
      
      -- Add partitions to the table.
      alter table acid_delete_pt add if not exists partition (ds= '2019');
      alter table acid_delete_pt add if not exists partition (ds= '2018');
      
      -- Insert data into the table.
      insert overwrite table acid_delete_pt partition (ds='2019') values(1),(2),(3);
      insert overwrite table acid_delete_pt partition (ds='2018') values(1),(2),(3);
      
      -- Query the table to check whether data is inserted.
      select * from acid_delete_pt;
      
      +------------+------------+
      | id         | ds         |
      +------------+------------+
      | 1          | 2018       |
      | 2          | 2018       |
      | 3          | 2018       |
      | 1          | 2019       |
      | 2          | 2019       |
      | 3          | 2019       |
      +------------+------------+
      
      -- Delete the rows whose values of the id and ds columns are 2 and 2019. If you want to execute the statement on the odpscmd client of MaxCompute, you must enter yes or no to confirm the deletion.
      delete from acid_delete_pt where ds='2019' and id = 2;
      
      -- Query the table to check whether the rows whose values of the id and ds columns are 2 and 2019 are deleted.
      select * from acid_delete_pt;
      
      +------------+------------+
      | id         | ds         |
      +------------+------------+
      | 1          | 2018       |
      | 2          | 2018       |
      | 3          | 2018       |
      | 1          | 2019       |
      | 3          | 2019       |
      +------------+------------+
    • Example 3: Create a destination table named acid_delete_t and an associated table named acid_delete_s. Then, delete the rows that meet the specified conditions from the destination table based on the associated table. The following statements show an example:
      -- Create a destination table named acid_delete_t and an associated table named acid_delete_s.
      create table if not exists acid_delete_t(id int,value1 int,value2 int) tblproperties ("transactional"="true");
      create table if not exists acid_delete_s(id int,value1 int,value2 int);
      
      -- Insert data into the tables.
      insert overwrite table acid_delete_t values(2,20,21),(3,30,31),(4,40,41);
      insert overwrite table acid_delete_s values(1,100,101),(2,200,201),(3,300,301);
      
      -- Delete the rows in the acid_delete_t table whose value of the id column does not match that of the rows in the acid_delete_s table. If you want to execute the statement on the odpscmd client of MaxCompute, you must enter yes or no to confirm the deletion.
      delete from acid_delete_t where not exists (select * from acid_delete_s where acid_delete_t.id=acid_delete_s.id);
      
      -- Query the acid_delete_t table to check whether the table contains only the rows whose values of the id column are 2 and 3.
      select * from acid_delete_t;
      
      +------------+------------+------------+
      | id         | value1     | value2     |
      +------------+------------+------------+
      | 2          | 20         | 21         |
      | 3          | 30         | 31         |
      +------------+------------+------------+

UPDATE

You can execute the UPDATE statement to update the values of one or more columns of the rows that meet the specified conditions in partitioned transactional tables or non-partitioned transactional tables.

  • Syntax
    update <table_name> set <col1_name> = <value1> [, <col2_name> = <value2> ...] [where <where_condition>];
    update <table_name> set (<col1_name> [, <col2_name> ...]) = (<value1> [, <value2> ...])[where <where_condition>];
  • Parameters
    • table_name: required. Specify the name of the transactional table on which you want to execute the UPDATE statement.
    • col1_name and col2_name: the columns that you want to update. You must specify at least one column.
    • value1 and value2: the new values that you want to assign to the specified columns. You must update the value of at least one column.
    • where_condition: optional. Specify a WHERE clause that is used to filter data based on conditions. For more information, see where_condition. If you execute the UPDATE statement on a table without a WHERE clause, all data in the table is updated.
  • Examples
    • Example 1: Create a non-partitioned table named acid_update and insert data into the table. Then, execute the UPDATE statement to update the columns of the rows that meet the specified conditions in the table. The following statements show an example:
      -- Create a non-partitioned table named acid_update.  
      create table if not exists acid_update(id bigint) tblproperties ("transactional"="true");
      
      -- Insert data into the table.
      insert overwrite table acid_update values(1),(2),(3),(2);
      
      -- Query the table to check whether data is inserted.
      select * from acid_update; 
      
      +------------+
      | id         |
      +------------+
      | 1          |
      | 2          |
      | 3          |
      | 2          |
      +------------+
      
      -- Update the value 2 in the id column to 4.
      update acid_update set id = 4 where id = 2; 
      
      -- Query the table to check whether the value 2 is updated to 4 in the id column.
      select * from acid_update; 
      
      +------------+
      | id         |
      +------------+
      | 1          |
      | 3          |
      | 4          |
      | 4          |
      +------------+
    • Example 2: Create a partitioned table named acid_update and insert data into the table. Then, execute the UPDATE statement to update the columns of the rows that meet the specified conditions in the table. The following statements show an example:
      -- Create a transactional table named acid_update_pt. 
      create table if not exists acid_update_pt(id bigint) partitioned by(ds string) tblproperties ("transactional"="true");
      
      -- Add partitions to the table.
      alter table acid_update_pt add if not exists partition (ds= '2019');
      
      -- Insert data into the table.
      insert overwrite table acid_update_pt partition (ds='2019') values(1),(2),(3);
      -- Query the table to check whether data is inserted.
      select * from acid_update_pt where ds = '2019';
      
      +------------+------------+
      | id         | ds         |
      +------------+------------+
      | 1          | 2019       |
      | 2          | 2019       |
      | 3          | 2019       |
      +------------+------------+
      
      -- Update the values in the id column of the rows whose value of the id column is 2 and whose value of the ds column is 2019. The values are updated to 4.
      update acid_update_pt set id = 4 where ds = '2019' and id = 2; 
      
      -- Query the table to check whether the value 2 is updated to 4 in the id column of the rows that meet the specified conditions.
      select * from acid_update_pt where ds = '2019';
      
      +------------+------------+
      | id         | ds         |
      +------------+------------+
      | 4          | 2019       |
      | 1          | 2019       |
      | 3          | 2019       |
      +------------+------------+
    • Example 3: Create a transactional table named acid_update_t that you want to update and an associated table named acid_update_s. Then, update the values of multiple columns at a time in the acid_update_t table. The following statements show an example:
      -- Create a transactional table named acid_update_t that you want to update and an associated table named acid_update_s.
      create table if not exists acid_update_t(id int,value1 int,value2 int) tblproperties ("transactional"="true");
      create table if not exists acid_update_s(id int,value1 int,value2 int);
      
      -- Insert data into the tables.
      insert overwrite table acid_update_t values(2,20,21),(3,30,31),(4,40,41);
      insert overwrite table acid_update_s values(1,100,101),(2,200,201),(3,300,301);
      --Method 1: Update the values of specific columns to constants.
      update acid_update_t set (value1, value2) = (60,61);
      -- Query the acid_update_t table to check whether data is updated as expected.
      select * from acid_update_t;
      
      +------------+------------+------------+
      | id         | value1     | value2     |
      +------------+------------+------------+
      | 2          | 60         | 61         |
      | 3          | 60         | 61         |
      | 4          | 60         | 61         |
      +------------+------------+------------+
      
      --Method 2: Use the data in the acid_update_s table to update all rows in the acid_update_t table. If specific rows in the acid_update_t table cannot be matched, null values are assigned to the rows.
      update acid_update_t set (value1, value2) = (select value1, value2 from acid_update_s where acid_update_t.id = acid_update_s.id);
      -- Query the acid_update_t table to check whether data is updated as expected.
      select * from acid_update_t;
      
      +------------+------------+------------+
      | id         | value1     | value2     |
      +------------+------------+------------+
      | 2          | 200        | 201        |
      | 3          | 300        | 301        |
      | 4          | NULL       | NULL       |
      +------------+------------+------------+
      
      -- Method 3: Use the data in the acid_update_s table to update the acid_update_t table. Add filter conditions for the acid_update_t table to update only the rows that intersect with those in the acid_update_s table.
      update acid_update_t set (value1, value2) = (select value1, value2 from acid_update_s where acid_update_t.id = acid_update_s.id) where acid_update_t.id in (select id from acid_update_s);
      -- Query the acid_update_t table to check whether data is updated as expected.
      select * from acid_update_t;
      +------------+------------+------------+
      | id         | value1     | value2     |
      +------------+------------+------------+
      | 4          | 40         | 41         |
      | 2          | 200        | 201        |
      | 3          | 300        | 301        |
      +------------+------------+------------+
      
      -- Method 4: Use the aggregate results of the acid_update_t and acid_update_s tables to update the acid_update_t table.
      update acid_update_t set (id, value1, value2) = (select id, max(value1),max(value2) from acid_update_s where acid_update_t.id = acid_update_s.id group by acid_update_s.id) where acid_update_t.id in (select id from acid_update_s);
      -- Query the acid_update_t table to check whether data is updated as expected.
      select * from acid_update_t;
      
      +------------+------------+------------+
      | id         | value1     | value2     |
      +------------+------------+------------+
      | 4          | 40         | 41         |
      | 2          | 200        | 201        |
      | 3          | 300        | 301        |
      +------------+------------+------------+

ALTER TABLE COMPACT

The base files and delta files of a transactional table occupy the physical storage. You cannot directly read such files. If you execute the UPDATE or DELETE statement on a transactional table, the data in the base files is not updated, but a delta file is generated for each operation. In this case, the more delete or update operations are performed, the more storage space the table occupies. As the number of delta files increases, you are charged more for the storage usage and subsequent queries.

If you execute the UPDATE or DELETE statement on a table or a partition of the table multiple times, a large number of delta files are generated. When the system reads data from the table, the system loads the delta files to identify the deleted and updated rows. A large number of delta files reduce the efficiency of reading data. In this case, you can merge the base files with the delta files to reduce storage usage and improve the efficiency.

  • Syntax
    alter table <table_name> [partition (<partition_key> = '<partition_value>' [, ...])] compact {minor|major};
  • Parameters
    • table_name: required. Specify the name of the transactional table for which you want to merge the base files and delta files.
    • partition_key: optional. Specify the name of a partition key column in the partitioned transactional table.
    • partition_value: optional. Specify the value of a partition key column in the partitioned transactional table.
    • major|minor: One of them must be specified. Differences between minor compaction and major compaction:
      • minor: merges each base file with all delta files that are generated based on the base file and deletes the delta files.
      • major: merges each base file with all delta files that are generated based on the base file, deletes the delta files, and merges small base files. If the size of a base file is less than 32 MB or a delta file is generated, the effect of merging files is equivalent to the effect of executing the INSERT OVERWRITE statement. However, if the size of a base file is greater than or equal to 32 MB or no delta files are generated, the data of the table is not overwritten.
  • Examples
    • Example 1: Merge files of the acid_delete table. The following statement shows an example:
      alter table acid_delete compact minor;
      The following result is returned:
      Summary:
      Nothing found to merge, set odps.merge.cross.paths=true if cross path merge is permitted.
      OK
    • Example 2: Merge files of the acid_update_pt table. The following statement shows an example:
      alter table acid_update_pt partition (ds = '2019') compact major;
      The following result is returned:
      Summary:
      table name: acid_update_pt /ds=2019  instance count: 2  run time: 6
        before merge, file count:        8  file size: 2613  file physical size: 7839
         after merge, file count:        2  file size: 679  file physical size: 2037
      
      OK

FAQ

Problem description: The error message ODPS-0010000:System internal error - fuxi job failed, caused by: Data Set should contains exactly one row is returned when you execute the UPDATE statement.

Causes: The rows that you want to update do not match the rows that are queried by subqueries. In this case, the system cannot determine which rows need to be updated. The following statement shows an example:
update store set (s_county, s_manager) = (select d_country, d_manager from store_delta sd where sd.s_store_sk = store.s_store_sk) where s_store_sk in (select s_store_sk from store_delta);
The select d_country, d_manager from store_delta sd where sd.s_store_sk = store.s_store_sk subquery is used to filter the data in the store_delta table. Then, the data that meets the specified condition is used to update the data of the store table. Assume that three rows that have the s_store_sk column in the store table are [1, 2, 3]. If the rows that have the s_store_sk column in the store_delta table are [1, 1] and do not match the rows in the store table, the preceding error message is returned.

Solution: Make sure that the rows that you want to update exactly match the rows that are queried by subqueries.