By digoal
Question:
1. After several current page updates occur in the same row, multiple tuple versions and corresponding ctid are generated: for example, ctid1 ... ctid6
(ctid6 is the latest version).
2. delete from x where ctid=(select ctid from xxx where ...)
- If it is not HOT, ctid6 goes to delete; If it is HOT, ctid6 to ctid1?
3. Before the deletion operation occurs, if vacuum HOT
occurs:
3.1. After vacuum, the ctid link will be contracted. Ctid1 is stored in the index and points to ctid6 in the heap page, and all versions of the garbage tuple will be recycled. All tuples corresponding to ctid1 to ctid5 in the page are the old version (invisible to the current delete statement) and meet the vacuum condition because it is the old version generated before delete is initiated. Vacuum shrinks ctid, and the entry becomes index->ctid1->ctid6->tuple
.
3.2 If it is not vacuum HOT, ctid1-ctid5 will be recycled, only leaving ctid6. It is also easier to understand, and there will be no controversy in this article.
4. Assume that it is HOT, delete from x where ctid = (At this time, ctid1 is returned or not?)
- Is the deletion operation successful or not? Is there a problem?
1. Create tables and indexes:
create table tbl (id int, c1 int, c2 int);
create index idx_tbl_1 on tbl (c1);
2. Disable autovacuum:
alter table tbl set (autovacuum_enabled =false);
3. Insert 1 record:
insert into tbl values (1,1,1);
4. Update 10 times (non-index fields):
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2 returning ctid,*;
5. Session 1:
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_indexscan =on;
set enable_seqscan=off;
alter function pg_sleep(double precision) cost 100000; - Increase the cost, execute select first, then sleep, and then delete.
delete from tbl where ctid=(select ctid from tbl where c1=1 limit 1) and exists (select 1 from pg_sleep(30));
This way, the plan is very clear: select, sleep, and delete:
postgres=# explain delete from tbl where ctid=(select ctid from tbl where c1=1 limit 1) and exists (select 1 from pg_sleep(30));
QUERY PLAN
------------------------------------------------------------------------------------------
Delete on tbl (cost=251.06..252.17 rows=0 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.15..1.05 rows=1 width=6)
-> Index Scan using idx_tbl_1 on tbl tbl_1 (cost=0.15..9.13 rows=10 width=6)
Index Cond: (c1 = 1)
InitPlan 2 (returns $1)
-> Function Scan on pg_sleep (cost=250.00..250.01 rows=1 width=0)
-> Result (cost=0.00..1.11 rows=1 width=6)
One-Time Filter: $1
-> Tid Scan on tbl (cost=0.00..1.11 rows=1 width=6)
TID Cond: (ctid = $0)
(11 rows)
6. Session 2 (within 30 seconds): vacuum table, HOT contraction:
postgres=# vacuum verbose tbl; - Because it is HOT, there's only 1 version of the index, no garbage tuples
INFO: vacuuming "public.tbl"
INFO: table "tbl": found 10 removable, 1 nonremovable row versions in 1 out of 1 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 5431698
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
postgres=# select ctid,* from tbl;
ctid | id | c1 | c2
--------+----+----+----
(0,11) | 1 | 1 | 2
(1 row)
7. What are the results after observing session 1? Delete successfully.
DELETE 1
After garbage collection is performed again, 1 garbage index version and 1 garbage tuple version are found.
postgres=# vacuum verbose tbl;
INFO: vacuuming "public.tbl"
INFO: scanned index "idx_tbl_1" to remove 1 row versions
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: table "tbl": removed 1 dead item identifiers in 1 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: index "idx_tbl_1" now contains 0 row versions in 2 pages
DETAIL: 1 index row versions were removed.
0 index pages were newly deleted.
0 index pages are currently deleted, of which 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: table "tbl": found 1 removable, 0 nonremovable row versions in 1 out of 1 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 5431699
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: table "tbl": truncated 1 to 0 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
VACUUM
In addition, if seqscan is used, there is no doubt that select will directly return ctid6. Step 5 is changed to the following, which will not cause any problems:
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_indexscan =off;
set enable_seqscan=on;
alter function pg_sleep(double precision) cost 100000; - Increase the cost, execute select first, then sleep, and then delete.
delete from tbl where ctid=(select ctid from tbl where c1=1 limit 1) and exists (select 1 from pg_sleep(30));
1. Create tables and indexes:
create table tbl (id int, c1 int, c2 int);
create index idx_tbl_1 on tbl (c1);
2. Disable autovacuum:
alter table tbl set (autovacuum_enabled =false);
3. Insert 1 record:
insert into tbl values (1,1,1);
4. Update 10 times (non-index fields):
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2 returning ctid,*;
5. Session 1:
index scan
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_indexscan =on;
set enable_seqscan=off;
alter function pg_sleep(double precision) cost 100000; - Increase the cost, execute select first, then sleep, and then delete.
select ctid from tbl where c1=1 and exists(select 1 from pg_sleep(30));
or seqscan:
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_indexscan =off;
set enable_seqscan=on;
alter function pg_sleep(double precision) cost 100000; - Increase the cost, execute select first, then sleep, and then delete.
select ctid from tbl where c1=1 and exists(select 1 from pg_sleep(30));
6. Session 2 (within 30 seconds): vacuum table, HOT contraction:
postgres=# vacuum verbose tbl; - Because it is HOT, there's only 1 version of the index, no garbage tuples
INFO: vacuuming "public.tbl"
INFO: table "tbl": found 10 removable, 1 nonremovable row versions in 1 out of 1 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 5431698
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
7. In session 1, the ctids returned by seq scan and index scan are 11:
ctid
--------
(0,11)
(1 row)
The HOT query does not return the ctid1 stored in the index, although the entry is ctid1.
create table tbl (id int, c1 int, c2 int);
create index idx_tbl_1 on tbl (id);
insert into tbl values (1,1,1);
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_indexscan =on;
set enable_seqscan=off;
alter function pg_sleep(double precision) cost 100000; - Increase the cost, execute select first, then sleep, and then delete.
Concurrency problem recurrence:
session1: update tbl set id = 3 where ctid in (select ctid FROM TBL where id=1 and exists (select 1 from pg_sleep(30))); - Halfway through, only subqueries are performed. The result is (0,1)
session2: update tbl set id=2 where id = 1; An updated row is returned.
Session1: Make changes according to the ctid of the subquery, and you will find the line whose current id is already 2. Change the id of that line to 3 to return the update success, and then you modify one line successfully. Because there is a page offset from HOT, ctid(0,1) to ctid(0,2) and then to tuple2.
To solve this problem, you can use recheck, RR isolation level (equivalent to locking or throwing an error), or use for update to lock. Please visit this article for more:
Isolation of PostgreSQL CTID Physical Line Numbers in Concurrent DML Operations
Open-Source PolarDB for PostgreSQL on Shared Storage Block Device: Multi-Machine Deployment Practice
Combine OpenAI to Convert Natural Language Instructions into SQL Queries
digoal - May 16, 2019
digoal - October 18, 2022
digoal - February 3, 2020
digoal - May 18, 2021
digoal - April 15, 2021
digoal - March 25, 2020
ApsaraDB: Faster, Stronger, More Secure
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreMore Posts by digoal
Dikky Ryan Pratama May 5, 2023 at 4:13 am
I really enjoyed reading your article. Your writing style is engaging and kept me interested from start to finish.