How to Use pageinspect to Diagnose and Optimize Query Performance Degradation in GIN (Inverted) Indexes

In this article, we discuss how the pageinspect plug-in can help diagnose and optimize query performance issues that arise due to the GIN (inverted) index merging delay.

By digoal

Background

A GIN index is a multi-value type inverted index in PostgreSQL. Accordingly, a record may involve keys in multiple GIN indexes. Therefore, real-time index merging in writing would sharply increase I/O and the write RT. To improve the write throughput, PostgreSQL (from now on referred to as PG) allows the GIN index merging delay. After enabling it, the data is written to the pending list first rather than directly to the index page. When the pending list reaches a certain size, or when the autovacuum corresponds to the table, the pending list is merged into the index.

During a query, if a PENDING LIST is not merged into the index, the pending list and the index information will be queried.

When the pending list is huge with a large amount of writes, and the merging (by autovacuum worker) fails to catch up, the query GIN index performance degrades.

Therefore, you can solve the problems after rooting out the causes.

Background Principles

https://www.postgresql.org/docs/11/static/sql-createindex.html

Different Parameters Accepted by GIN Indexes

1) fastupdate

This setting controls the usage of the fast update technique described in Section 66.4.1. It is a Boolean parameter: ON enables fast update, and OFF disables it. Alternative spellings of ON and OFF are allowed as described in Section 19.1. The default is ON.

Note:

Turning fastupdate off via ALTER INDEX prevents future insertions from going into the list of pending index entries but does not flush previous entries. You might want to VACUUM the table or call the gin_clean_pending_list function afterward to ensure the pending list is emptied.

2) gin_pending_list_limit

Custom gin_pending_list_limit parameter. You can specify this value in kilobytes.

Current settings:

postgres=# show gin_pending_list_limit ;    
 gin_pending_list_limit     
------------------------    
 4MB    
(1 row)

Different Parameters Accepted by BRIN Indexes

1) pages_per_range

Defines the number of table blocks that make up one block range for each entry of a BRIN index (see Section 67.1 for more details). The default is 128.

2) autosummarize

Defines whether a summarization run is invoked for the previous page range after detecting an insertion on the next one.

View the Pending List of the Index With pageinspect.

You can learn more at this link: https://www.postgresql.org/docs/11/static/pageinspect.html

postgres=# create extension pageinspect ;    
CREATE EXTENSION

Example

1) Create a table.

postgres=# create table t(id int, arr int[]);    
CREATE TABLE

2) Create an inverted index.

postgres=# create index idx_t_1 on t using gin (arr);    
CREATE INDEX

3) Create a function which generates a random array.

postgres=# create or replace function gen_rand_arr() returns int[] as $$    
  select array(select (100*random())::int from generate_series(1,64));    
$$ language sql strict;    
CREATE FUNCTION

4) Write test data.

postgres=# insert into t select generate_series(1,100000), gen_rand_arr();    
INSERT 0 100000    
postgres=# insert into t select generate_series(1,1000000), gen_rand_arr();    
INSERT 0 1000000

5) Check the size of the pending list of the current GIN index through the pageinspect plug-in. You can see that there are 356 pending pages, with 2,484 pieces of records.

Due to the huge records in the pending list, the query performance would decrease significantly.

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where arr @> array[1,2,3];    
                                                         QUERY PLAN                                                             
----------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t  (cost=82.38..262.28 rows=11373 width=284) (actual time=82.444..141.559 rows=114906 loops=1)    
   Output: id, arr    
   Recheck Cond: (t.arr @> '{1,2,3}'::integer[])    
   Heap Blocks: exact=41304    
   Buffers: shared hit=42043    
   ->  Bitmap Index Scan on idx_t_1  (cost=0.00..79.92 rows=11373 width=0) (actual time=75.902..75.902 rows=114906 loops=1)    
         Index Cond: (t.arr @> '{1,2,3}'::integer[])    
         Buffers: shared hit=739      
 Planning Time: 0.092 ms    
 Execution Time: 152.260 ms    
(10 rows)

6) Query Test One (pending list is greater than 0).

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where arr @> array[1,2,3];    
                                                         QUERY PLAN                                                             
----------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t  (cost=82.38..262.28 rows=11373 width=284) (actual time=82.444..141.559 rows=114906 loops=1)    
   Output: id, arr    
   Recheck Cond: (t.arr @> '{1,2,3}'::integer[])    
   Heap Blocks: exact=41304    
   Buffers: shared hit=42043    
   ->  Bitmap Index Scan on idx_t_1  (cost=0.00..79.92 rows=11373 width=0) (actual time=75.902..75.902 rows=114906 loops=1)    
         Index Cond: (t.arr @> '{1,2,3}'::integer[])    
         Buffers: shared hit=739      
 Planning Time: 0.092 ms    
 Execution Time: 152.260 ms    
(10 rows)

7) Vacuum table, forcibly merging the pending list.

set vacuum_cost_delay=0;    
    
postgres=# vacuum t;    
VACUUM

8) After merging the pending list, n_pending_tuples is 0.

postgres=# SELECT * FROM gin_metapage_info(get_raw_page('idx_t_1', 0));    
 pending_head | pending_tail | tail_free_size | n_pending_pages | n_pending_tuples | n_total_pages | n_entry_pages | n_data_pages | n_entries | version     
--------------+--------------+----------------+-----------------+------------------+---------------+---------------+--------------+-----------+---------    
   4294967295 |   4294967295 |              0 |               0 |                0 |          9978 |            41 |         9421 |       101 |       2    
(1 row)

9) Query Test Two (pending list = 0).

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from t where arr @> array[1,2,3];    
                                                          QUERY PLAN                                                              
------------------------------------------------------------------------------------------------------------------------------    
 Bitmap Heap Scan on public.t  (cost=792.36..1699.10 rows=117244 width=284) (actual time=79.861..139.603 rows=114906 loops=1)    
   Output: id, arr    
   Recheck Cond: (t.arr @> '{1,2,3}'::integer[])    
   Heap Blocks: exact=41304    
   Buffers: shared hit=41687    
   ->  Bitmap Index Scan on idx_t_1  (cost=0.00..766.95 rows=117244 width=0) (actual time=73.360..73.360 rows=114906 loops=1)    
         Index Cond: (t.arr @> '{1,2,3}'::integer[])    
         Buffers: shared hit=383   -- 大幅减少     
 Planning Time: 0.135 ms    
 Execution Time: 150.656 ms    
(10 rows)

The process is also similar to BRIN Indexes.

Cases in Real User Scenarios

PG is used as the business search engine for random field combination query, fuzzy query, and full-text search, among others. A large amount of data is updated according to KEY value with a query before each update (also by KEY value). However, some users place all fields including the key field in the GIN index for convenience.

What are the consequences?

1) The GIN index is used for queries.

2) A large number of update operations would increase the size of gin pending list.

3) With a large number of pending lists, list scanning cosumes the CPU, thus increasing the total time.

4) The GIN index uses the Bitmapscan, which consumes more CPU resources in total than index scan does due to CPU recheck. The CPU would break down quickly in high-concurrency scenarios.

Solution: Extract the key fields from the GIN index and put them in the B-tree index. The database preferentially chooses B-tree index, which queries faster with lower CPU consumption. If the key has other random field combination queries, it can still be retained in the GIN index. That is, the key fields are in both GIN index and B-tree index.

Reproduction

1) Install the plug-in.

create extension btree_gin;  
create extension pg_trgm;  
create extension pageinspect;

2) Create a table and index with problems. Put all fields in the GIN index with 10 million records.

create table test(id int, c1 int, c2 int, c3 int, c4 int, c5 text);  
  
insert into test select generate_series(1,10000000), random()*100, random()*1000, random()*10000, random()*100000, md5(random()::text);  
  
create index idx_test_1 on test using gin (id,c1,c2,c3,c4,c5 gin_trgm_ops);

3) Execute the plan.

explain (analyze,verbose,timing,costs,buffers) select * from test where id=1;  
explain (analyze,verbose,timing,costs,buffers) update test set c1=random()*100, c2=random()*1000, c3=random()*10000, c4=random()*100000 where id=1;  
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test where id=1;  
                                                    QUERY PLAN                                                       
-------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.test  (cost=0.10..0.12 rows=1 width=53) (actual time=0.143..0.143 rows=1 loops=1)  
   Output: id, c1, c2, c3, c4, c5  
   Recheck Cond: (test.id = 1)  
   Heap Blocks: exact=1  
   Buffers: shared hit=5  
   ->  Bitmap Index Scan on idx_test_1  (cost=0.00..0.10 rows=1 width=0) (actual time=0.098..0.098 rows=1 loops=1)  
         Index Cond: (test.id = 1)  
         Buffers: shared hit=4  
 Planning Time: 2.089 ms  
 Execution Time: 0.922 ms  
(10 rows)  
  
postgres=# explain (analyze,verbose,timing,costs,buffers) update test set c1=random()*100, c2=random()*1000, c3=random()*10000, c4=random()*100000 where id=1;  
                                                                                         QUERY PLAN                                                                                           
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Update on public.test  (cost=0.10..0.15 rows=1 width=59) (actual time=1.342..1.343 rows=0 loops=1)  
   Buffers: shared hit=9 read=1 dirtied=1  
   ->  Bitmap Heap Scan on public.test  (cost=0.10..0.15 rows=1 width=59) (actual time=0.053..0.054 rows=1 loops=1)  
         Output: id, (random() * '100'::double precision), (random() * '1000'::double precision), (random() * '10000'::double precision), (random() * '100000'::double precision), c5, ctid  
         Recheck Cond: (test.id = 1)  
         Heap Blocks: exact=1  
         Buffers: shared hit=5  
         ->  Bitmap Index Scan on idx_test_1  (cost=0.00..0.10 rows=1 width=0) (actual time=0.021..0.021 rows=1 loops=1)  
               Index Cond: (test.id = 1)  
               Buffers: shared hit=4  
 Planning Time: 0.149 ms  
 Execution Time: 1.496 ms  
(12 rows)

4) Stress testing and update: id in gin, 4,960 qps.

vi test.sql  
\set id random(1,10000000)  
update test set c1=random()*100, c2=random()*1000, c3=random()*10000, c4=random()*100000 where id=:id;  

pgbench -M prepared -n -r -P 1 -f ./test.sql -c 64 -j 64 -T 120  
  
transaction type: ./test.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 120 s  
number of transactions actually processed: 595656  
latency average = 12.887 ms  
latency stddev = 27.841 ms  
tps = 4960.755350 (including connections establishing)  
tps = 4963.713963 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.002  \set id random(1,10000000)  
        12.924  update test set c1=random()*100, c2=random()*1000, c3=random()*10000, c4=random()*100000 where id=:id;

During the stress testing, you can multiple pending pages for the GIN index.

SELECT * FROM gin_metapage_info(get_raw_page('idx_test_1', 0));    
  
postgres=# SELECT * FROM gin_metapage_info(get_raw_page('idx_test_1', 0));    
-[ RECORD 1 ]----+---------  
pending_head     | 175345  
pending_tail     | 178885  
tail_free_size   | 6640  
n_pending_pages  | 4627  
n_pending_tuples | 30024  
n_total_pages    | 175024  
n_entry_pages    | 76904  
n_data_pages     | 98119  
n_entries        | 10231456  
version          | 2

Optimization Method

Take out the key field and retain the fuzzy query fields in the GIN index. If necessary, you can still retain c1, c2, c3, and c4 in the GIN index.

create table test1(id int, c1 int, c2 int, c3 int, c4 int, c5 text);  
insert into test1 select * from test;  
create index idx_test1_1 on test1 using btree(id);  
create index idx_test1_2 on test1 using btree(c1);  
create index idx_test1_3 on test1 using btree(c2);  
create index idx_test1_4 on test1 using btree(c3);  
create index idx_test1_5 on test1 using btree(c4);  
create index idx_test1_6 on test1 using gin(c5 gin_trgm_ops);  
  
  
或  
create index idx_test1_1 on test1 using btree(id);  
create index idx_test1_2 on test1 using gin(id,c1,c2,c3,c4,c5 gin_trgm_ops);

Execute the plan.

explain (analyze,verbose,timing,costs,buffers) select * from test1 where id=1;  
explain (analyze,verbose,timing,costs,buffers) update test1 set c1=random()*100, c2=random()*1000, c3=random()*10000, c4=random()*100000 where id=1;  
  
postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test1 where id=1;  
                                                        QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------  
 Index Scan using idx_test1_1 on public.test1  (cost=0.38..0.42 rows=1 width=53) (actual time=0.089..0.090 rows=1 loops=1)  
   Output: id, c1, c2, c3, c4, c5  
   Index Cond: (test1.id = 1)  
   Buffers: shared hit=1 read=3  
 Planning Time: 0.696 ms  
 Execution Time: 0.115 ms  
(6 rows)  
  
postgres=#   
postgres=# explain (analyze,verbose,timing,costs,buffers) update test1 set c1=random()*100, c2=random()*1000, c3=random()*10000, c4=random()*100000 where id=1;  
                                                                                         QUERY PLAN                                                                                           
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Update on public.test1  (cost=0.38..0.44 rows=1 width=59) (actual time=1.183..1.183 rows=0 loops=1)  
   Buffers: shared hit=11 read=13 dirtied=6  
   ->  Index Scan using idx_test1_1 on public.test1  (cost=0.38..0.44 rows=1 width=59) (actual time=0.019..0.021 rows=1 loops=1)  
         Output: id, (random() * '100'::double precision), (random() * '1000'::double precision), (random() * '10000'::double precision), (random() * '100000'::double precision), c5, ctid  
         Index Cond: (test1.id = 1)  
         Buffers: shared hit=4  
 Planning Time: 0.135 ms  
 Execution Time: 1.246 ms  
(8 rows)

Stress testing: qps is 43,231 with a 10 times increase.

vi test1.sql  
\set id random(1,10000000)  
update test1 set c1=random()*100, c2=random()*1000, c3=random()*10000, c4=random()*100000 where id=:id;  

pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 64 -j 64 -T 120  
  
transaction type: ./test1.sql  
scaling factor: 1  
query mode: prepared  
number of clients: 64  
number of threads: 64  
duration: 120 s  
number of transactions actually processed: 5187765  
latency average = 1.480 ms  
latency stddev = 30.987 ms  
tps = 43212.076731 (including connections establishing)  
tps = 43231.697380 (excluding connections establishing)  
statement latencies in milliseconds:  
         0.001  \set id random(1,10000000)  
         1.479  update test1 set c1=random()*100, c2=random()*1000, c3=random()*10000, c4=random()*100000 where id=:id;

During the stress testing, the GIN index is still generating pending lists. However, since the GIN index is not used for queries, the update efficiency does not decrease.

postgres=# SELECT * FROM gin_metapage_info(get_raw_page('idx_test1_6', 0));    
-[ RECORD 1 ]----+-------  
pending_head     | 125629  
pending_tail     | 177046  
tail_free_size   | 6840  
n_pending_pages  | 77183  
n_pending_tuples | 483977  
n_total_pages    | 80904  
n_entry_pages    | 23  
n_data_pages     | 80880  
n_entries        | 9248  
version          | 2

Performance Increased Nearly 10 Times

Summary

The database adopts merging delay to reduce the increase of write RT introduced by the index. If the database is under long-term heavy write pressure, the number of unmerged LISTs may be large and may affect the query performance.

You can use the pageinspect plug-in to view the size of the unmerged pending lists. In addition, you can merge the pending lists forcibly using vacuum to improve query performance.

Community

How to Use pageinspect to Diagnose and Optimize Query Performance Degradation in GIN (Inverted) Indexes

Background

Background Principles

Different Parameters Accepted by GIN Indexes

Different Parameters Accepted by BRIN Indexes

View the Pending List of the Index With pageinspect.

Example

Cases in Real User Scenarios

Reproduction

Optimization Method

Performance Increased Nearly 10 Times

Summary

References

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

PolarDB for PostgreSQL

AnalyticDB for PostgreSQL

ApsaraDB RDS for PostgreSQL

Database for FinTech Solution