×
Community Blog Is PostgreSQL HOT Vacuum Link Contraction Secure for DML Where CTID=ctid?

Is PostgreSQL HOT Vacuum Link Contraction Secure for DML Where CTID=ctid?

This article discusses the background and security of PostgreSQL HOT Vacuum Link Contraction.

By digoal

Background

Question:

1.  After several current page updates occur in the same row, multiple tuple versions and corresponding ctid are generated: for example, ctid1 ... ctid6 (ctid6 is the latest version).

2.  delete from x where ctid=(select ctid from xxx where ...) - If it is not HOT, ctid6 goes to delete; If it is HOT, ctid6 to ctid1?

3.  Before the deletion operation occurs, if vacuum HOT occurs:

3.1. After vacuum, the ctid link will be contracted. Ctid1 is stored in the index and points to ctid6 in the heap page, and all versions of the garbage tuple will be recycled. All tuples corresponding to ctid1 to ctid5 in the page are the old version (invisible to the current delete statement) and meet the vacuum condition because it is the old version generated before delete is initiated. Vacuum shrinks ctid, and the entry becomes index->ctid1->ctid6->tuple.

3.2 If it is not vacuum HOT, ctid1-ctid5 will be recycled, only leaving ctid6. It is also easier to understand, and there will be no controversy in this article.

4.  Assume that it is HOT, delete from x where ctid = (At this time, ctid1 is returned or not?) - Is the deletion operation successful or not? Is there a problem?

Test Methods

1.  Create tables and indexes:

create table tbl (id int, c1 int, c2 int);   
create index idx_tbl_1 on tbl (c1);  

2.  Disable autovacuum:

alter table tbl set (autovacuum_enabled =false);  

3.  Insert 1 record:

insert into tbl values (1,1,1);  

4.  Update 10 times (non-index fields):

update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2 returning ctid,*;  

5.  Session 1:

set enable_indexonlyscan =off;  
set enable_bitmapscan =off;  
set enable_indexscan =on;  
set enable_seqscan=off;  
alter function pg_sleep(double precision) cost 100000; - Increase the cost, execute select first, then sleep, and then delete.   
  
delete from tbl where ctid=(select ctid from tbl where c1=1 limit 1)  and exists (select 1 from pg_sleep(30));  

This way, the plan is very clear: select, sleep, and delete:

postgres=# explain delete from tbl where ctid=(select ctid from tbl where c1=1 limit 1)  and exists (select 1 from pg_sleep(30));  
                                        QUERY PLAN                                          
------------------------------------------------------------------------------------------  
 Delete on tbl  (cost=251.06..252.17 rows=0 width=0)  
   InitPlan 1 (returns $0)  
     ->  Limit  (cost=0.15..1.05 rows=1 width=6)  
           ->  Index Scan using idx_tbl_1 on tbl tbl_1  (cost=0.15..9.13 rows=10 width=6)  
                 Index Cond: (c1 = 1)  
   InitPlan 2 (returns $1)  
     ->  Function Scan on pg_sleep  (cost=250.00..250.01 rows=1 width=0)  
   ->  Result  (cost=0.00..1.11 rows=1 width=6)  
         One-Time Filter: $1  
         ->  Tid Scan on tbl  (cost=0.00..1.11 rows=1 width=6)  
               TID Cond: (ctid = $0)  
(11 rows)  

6.  Session 2 (within 30 seconds): vacuum table, HOT contraction:

postgres=# vacuum verbose tbl; - Because it is HOT, there's only 1 version of the index, no garbage tuples    
INFO:  vacuuming "public.tbl"  
INFO:  table "tbl": found 10 removable, 1 nonremovable row versions in 1 out of 1 pages  
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 5431698  
Skipped 0 pages due to buffer pins, 0 frozen pages.  
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.  
VACUUM  
postgres=# select ctid,* from tbl;  
  ctid  | id | c1 | c2   
--------+----+----+----  
 (0,11) |  1 |  1 |  2  
(1 row)  

7.  What are the results after observing session 1? Delete successfully.

DELETE 1    

After garbage collection is performed again, 1 garbage index version and 1 garbage tuple version are found.

postgres=# vacuum verbose tbl;   
INFO:  vacuuming "public.tbl"  
INFO:  scanned index "idx_tbl_1" to remove 1 row versions  
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s  
INFO:  table "tbl": removed 1 dead item identifiers in 1 pages  
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s  
INFO:  index "idx_tbl_1" now contains 0 row versions in 2 pages  
DETAIL:  1 index row versions were removed.  
0 index pages were newly deleted.  
0 index pages are currently deleted, of which 0 are currently reusable.  
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.  
INFO:  table "tbl": found 1 removable, 0 nonremovable row versions in 1 out of 1 pages  
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 5431699  
Skipped 0 pages due to buffer pins, 0 frozen pages.  
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.  
INFO:  table "tbl": truncated 1 to 0 pages  
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s  
VACUUM  

In addition, if seqscan is used, there is no doubt that select will directly return ctid6. Step 5 is changed to the following, which will not cause any problems:

set enable_indexonlyscan =off;  
set enable_bitmapscan =off;  
set enable_indexscan =off;  
set enable_seqscan=on;  
alter function pg_sleep(double precision) cost 100000; - Increase the cost, execute select first, then sleep, and then delete.   
  
delete from tbl where ctid=(select ctid from tbl where c1=1 limit 1)  and exists (select 1 from pg_sleep(30));  

Observe ctid

1.  Create tables and indexes:

create table tbl (id int, c1 int, c2 int);     
create index idx_tbl_1 on tbl (c1);    

2.  Disable autovacuum:

alter table tbl set (autovacuum_enabled =false);    

3.  Insert 1 record:

insert into tbl values (1,1,1);    

4.  Update 10 times (non-index fields):

update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2 returning ctid,*;    

5.  Session 1:

index scan

set enable_indexonlyscan =off;    
set enable_bitmapscan =off;    
set enable_indexscan =on;    
set enable_seqscan=off;    
alter function pg_sleep(double precision) cost 100000; - Increase the cost, execute select first, then sleep, and then delete.     
    
select ctid from tbl where c1=1 and exists(select 1 from pg_sleep(30));   

or seqscan:

set enable_indexonlyscan =off;    
set enable_bitmapscan =off;    
set enable_indexscan =off;    
set enable_seqscan=on;    
alter function pg_sleep(double precision) cost 100000; - Increase the cost, execute select first, then sleep, and then delete.     
    
select ctid from tbl where c1=1 and exists(select 1 from pg_sleep(30));   

6.  Session 2 (within 30 seconds): vacuum table, HOT contraction:

postgres=# vacuum verbose tbl; - Because it is HOT, there's only 1 version of the index, no garbage tuples      
INFO:  vacuuming "public.tbl"    
INFO:  table "tbl": found 10 removable, 1 nonremovable row versions in 1 out of 1 pages    
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 5431698    
Skipped 0 pages due to buffer pins, 0 frozen pages.    
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.    
VACUUM    

7.  In session 1, the ctids returned by seq scan and index scan are 11:

  ctid    
--------  
 (0,11)  
(1 row)  

The HOT query does not return the ctid1 stored in the index, although the entry is ctid1.

None of the above is important. The important thing is to update the same row concurrently.

create table tbl (id int, c1 int, c2 int);     
create index idx_tbl_1 on tbl (id);    
insert into tbl values (1,1,1);  
  
set enable_indexonlyscan =off;    
set enable_bitmapscan =off;    
set enable_indexscan =on;    
set enable_seqscan=off;    
alter function pg_sleep(double precision) cost 100000; - Increase the cost, execute select first, then sleep, and then delete.     

Concurrency problem recurrence:

session1:  update tbl set id = 3 where ctid in (select ctid FROM TBL where id=1 and exists (select 1 from pg_sleep(30))); - Halfway through, only subqueries are performed. The result is (0,1)     
session2:  update tbl set id=2 where id = 1; An updated row is returned.    
Session1: Make changes according to the ctid of the subquery, and you will find the line whose current id is already 2. Change the id of that line to 3 to return the update success, and then you modify one line successfully. Because there is a page offset from HOT, ctid(0,1) to ctid(0,2) and then to tuple2.   

To solve this problem, you can use recheck, RR isolation level (equivalent to locking or throwing an error), or use for update to lock. Please visit this article for more:

Isolation of PostgreSQL CTID Physical Line Numbers in Concurrent DML Operations

1 2 1
Share on

digoal

277 posts | 24 followers

You may also like

Comments

Dikky Ryan Pratama May 5, 2023 at 4:13 am

I really enjoyed reading your article. Your writing style is engaging and kept me interested from start to finish.