PostgreSQL v12: NBtree Index v4 Kernel Improvement Analysis

By digoal

Background

PostgreSQL v12 — B-tree index enhanced (duplicate key sort by ctid)

PostgreSQL v12 NBtree has evolved into the fourth version, with three main enhancements:

1) The heap ctid is added to the key value of the index leaf page to ensure that the heap tuples of the duplicate value are stored in the leaf page of the index according to the physical order of the heap rows.

The index scan is now more efficient (index & heap correlate = 1) in improving scan or return of heap tuples with multiple repeated values. Previously, bitmap scan (sort blockid) was used for improvement while introducing the CPU recheck index cond.

2) With ctid in the leaf page, the key value in the leaf page is unique (because ctid is unique). When you insert the duplicate index field values, the rightmost leaf page containing the field value is split, significantly reducing the wasted space of the leaf page. In the earlier PG versions, the leaf page that satisfied the conditions was split at random, which could be the right page or the middle page.

Splitting the leaf page in versions earlier than PG v12:

Splitting the leaf page in PG v12:

3) For the NBtree index of PG v12, the internal page (or branch page) will truncate unnecessary key values, such as multi-field indexes. This function is similar to that of include index. In PG v11, only the columns in the include are not displayed in the index internal page. In PG v12 as well, the key columns may not be displayed in the internal page, making the index smaller.

As described above, PostgreSQL v12 introduced the TID as part of the index key, which would waste an inordinate amount of space in the internal index pages. So the same commit introduced truncation of ““redundant” index attributes from internal pages. The TID is redundant, so are non-key attributes from an INCLUDE clause (these were also removed from the internal index pages in v11). But PostgreSQL v12 can also truncate those index attributes not needed for table row identification.

In our primary key index, bid is a redundant column and is truncated from internal index pages, which saves 8 bytes of space per index entry. Let's examine an internal index page with the pageinspect extension:

PG v11:

drop table abc;  
create table abc (c1 int, c2 int);  
create index idx_abc_1 on abc(c1,c2);  
create index idx_abc_2 on abc(c1) include (c2);  
insert into abc select random()*100, generate_series(1,1000000);  
  
postgres=# \di+ idx_abc_1  
                          List of relations  
 Schema |   Name    | Type  |  Owner   | Table | Size  | Description   
--------+-----------+-------+----------+-------+-------+-------------  
 public | idx_abc_1 | index | postgres | abc   | 40 MB |   
(1 row)  
  
postgres=# \di+ idx_abc_2  
                          List of relations  
 Schema |   Name    | Type  |  Owner   | Table | Size  | Description   
--------+-----------+-------+----------+-------+-------+-------------  
 public | idx_abc_2 | index | postgres | abc   | 27 MB |   
(1 row)

PG v12:

postgres=# \di+ idx_abc_1  
                          List of relations  
 Schema |   Name    | Type  |  Owner   | Table | Size  | Description   
--------+-----------+-------+----------+-------+-------+-------------  
 public | idx_abc_1 | index | postgres | abc   | 23 MB |   
(1 row)  
  
postgres=# \di+ idx_abc_2  
                          List of relations  
 Schema |   Name    | Type  |  Owner   | Table | Size  | Description   
--------+-----------+-------+----------+-------+-------+-------------  
 public | idx_abc_2 | index | postgres | abc   | 20 MB |   
(1 row)

You can see through pageinspect that PG v12 truncates some redundant key values in the internal page.

CREATE TABLE rel (  
   aid bigint NOT NULL,  
   bid bigint NOT NULL  
);  
   
ALTER TABLE rel  
   ADD CONSTRAINT rel_pkey PRIMARY KEY (aid, bid);  
   
CREATE INDEX rel_bid_idx ON rel (bid);  
   
\d rel  
                Table "public.rel"  
 Column |  Type  | Collation | Nullable | Default  
--------+--------+-----------+----------+---------  
 aid    | bigint |           | not null |   
 bid    | bigint |           | not null |   
Indexes:  
    "rel_pkey" PRIMARY KEY, btree (aid, bid)  
    "rel_bid_idx" btree (bid)  
  
INSERT INTO rel (aid, bid)  
   SELECT i, i / 10000  
   FROM generate_series(1, 20000000) AS i;  
   
/* set hint bits and calculate statistics */  
VACUUM (ANALYZE) rel;

PG v11:

SELECT * FROM bt_page_items('rel_pkey', 2550);  
   
 itemoffset |    ctid    | itemlen | nulls | vars |                      data                         
------------+------------+---------+-------+------+-------------------------------------------------  
          1 | (2667,88)  |      24 | f     | f    | cd 8f 0a 00 00 00 00 00 45 00 00 00 00 00 00 00  
          2 | (2462,0)   |       8 | f     | f    |   
          3 | (2463,15)  |      24 | f     | f    | d6 c0 09 00 00 00 00 00 3f 00 00 00 00 00 00 00  
          4 | (2464,91)  |      24 | f     | f    | db c1 09 00 00 00 00 00 3f 00 00 00 00 00 00 00  
          5 | (2465,167) |      24 | f     | f    | e0 c2 09 00 00 00 00 00 3f 00 00 00 00 00 00 00  
          6 | (2466,58)  |      24 | f     | f    | e5 c3 09 00 00 00 00 00 3f 00 00 00 00 00 00 00  
          7 | (2467,134) |      24 | f     | f    | ea c4 09 00 00 00 00 00 40 00 00 00 00 00 00 00  
          8 | (2468,25)  |      24 | f     | f    | ef c5 09 00 00 00 00 00 40 00 00 00 00 00 00 00  
          9 | (2469,101) |      24 | f     | f    | f4 c6 09 00 00 00 00 00 40 00 00 00 00 00 00 00  
         10 | (2470,177) |      24 | f     | f    | f9 c7 09 00 00 00 00 00 40 00 00 00 00 00 00 00  
...  
        205 | (2666,12)  |      24 | f     | f    | c8 8e 0a 00 00 00 00 00 45 00 00 00 00 00 00 00  
(205 rows)

PG v12:

SELECT * FROM bt_page_items('rel_pkey', 2700);  
   
 itemoffset |   ctid   | itemlen | nulls | vars |          data             
------------+----------+---------+-------+------+-------------------------  
          1 | (2862,1) |      16 | f     | f    | ab 59 0b 00 00 00 00 00  
          2 | (2576,0) |       8 | f     | f    |   
          3 | (2577,1) |      16 | f     | f    | 1f 38 0a 00 00 00 00 00  
          4 | (2578,1) |      16 | f     | f    | 24 39 0a 00 00 00 00 00  
          5 | (2579,1) |      16 | f     | f    | 29 3a 0a 00 00 00 00 00  
          6 | (2580,1) |      16 | f     | f    | 2e 3b 0a 00 00 00 00 00  
          7 | (2581,1) |      16 | f     | f    | 33 3c 0a 00 00 00 00 00  
          8 | (2582,1) |      16 | f     | f    | 38 3d 0a 00 00 00 00 00  
          9 | (2583,1) |      16 | f     | f    | 3d 3e 0a 00 00 00 00 00  
         10 | (2584,1) |      16 | f     | f    | 42 3f 0a 00 00 00 00 00  
...  
        286 | (2861,1) |      16 | f     | f    | a6 58 0b 00 00 00 00 00  
(286 rows)

In addition to the preceding three enhancements, PG v12 NBtree also features the following enhancements:

Reduce locking overhead for B-tree index inserts to improve performance.
Introduce REINDEX CONCURRENTLY to simplify rebuilding an index without downtime.
Improve performance for index-only scans on indexes with many attributes.
Add a view PG_stat_progress_create_index to report the progress of CREATE INDEX and REINDEX.

Select PG v12 NBtree v4 among pg_upgrade for updating to PG v12. We recommend using reindex CONCURRENTLY to rebuild indexes (this does not block the dml).

Community

PostgreSQL v12: NBtree Index v4 Kernel Improvement Analysis

Background

References

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

PolarDB for PostgreSQL

AnalyticDB for PostgreSQL

Key Management Service

ApsaraDB RDS for PostgreSQL