By Long Qingyun
It is very complex for most storage formats in big data to support random updates, because it needs to scan large files. Therefore, MaxCompute provides the latest feature: transactional tables to update and delete statements.
However, update and delete statements do not apply to frequent updates, data deletion, or real-time writing to target tables. Moreover, the statements are not supported for non-transactional tables. This article describes methods to update data through the insert overwrite statement.
create table update_table(ID int,
tranValue string,
last_update_user string) PARTITIONED by(dt STRING ) LIFECYCLE 1;
INSERT INTO update_table PARTITION (dt="20210510") VALUES
(1, 'value_01', 'creation'),
(2, 'value_02', 'creation'),
(3, 'value_03', 'creation'),
(4, 'value_04', 'creation'),
(5, 'value_05', 'creation'),
(6, 'value_06', 'creation'),
(7, 'value_07', 'creation'),
(8, 'value_08', 'creation'),
(9, 'value_09', 'creation'),
(10, 'value_10','creation');
Update the string to value_011 if the id is 1.
--Update a data record
INSERT OVERWRITE TABLE update_table PARTITION( dt)
SELECT id
,CASE WHEN id=1 THEN "value_011"
ELSE TranValue
END TranValue
,last_update_user
,dt
FROM update_table
WHERE dt = "20210510"
;
Create an incremental table and insert data based on the update.
create table update_table_inc(ID int,
TranValue string,
last_update_user string) LIFECYCLE 1;
INSERT INTO update_table_inc VALUES
(5, 'value_11', 'creation'),
(6, NULL, '20170410'),
(7, 'value22', '20170413');
If the id is 5 and 7, update the TranValue, because the TranValue is null and not updated when the id is 6
INSERT OVERWRITE TABLE update_table PARTITION( dt)
SELECT a.id
,CASE WHEN a.id=b.id and b.TranValue is not null THEN b.TranValue
ELSE a.TranValue
END TranValue
,CASE WHEN a.id=b.id and b.TranValue is not null THEN b.last_update_user
ELSE a.last_update_user
END last_update_user
,dt
FROM update_table a
LEFT JOIN update_table_inc b
ON a.id = b.id
WHERE a.dt = "20210510"
;
--Delete data
INSERT OVERWRITE TABLE update_table PARTITION( dt)
SELECT *
FROM update_table
WHERE dt = "20210510" and id !=4
;
135 posts | 18 followers
FollowAlibaba Cloud MaxCompute - January 15, 2019
Alibaba Cloud MaxCompute - July 20, 2022
Alibaba Cloud MaxCompute - January 18, 2019
Jack008 - June 10, 2020
Alibaba Cloud MaxCompute - December 6, 2021
Alibaba Cloud MaxCompute - May 30, 2019
135 posts | 18 followers
FollowDeploy custom Alibaba Cloud solutions for business-critical scenarios with Quick Start templates.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreBuild a Data Lake with Alibaba Cloud Object Storage Service (OSS) with 99.9999999999% (12 9s) availability, 99.995% SLA, and high scalability
Learn MoreMore Posts by Alibaba Cloud MaxCompute