On October 3, 2019, Alibaba Cloud released ApsaraDB RDS for PostgreSQL 12, the world's most powerful open-source database system. ApsaraDB RDS for PostgreSQL 12 offers significant improvements in features and performance. For example, it has improved the high-concurrency performance for large partitioned tables by a factor of 100, optimized B-tree index space and performance, implemented the standard JSON-formatted features of SQL Server 2016, and provides support for multi-column statistics on most common values (MCVs), inline common table expressions (CTEs), and pluggable table storage APIs. This article describes some features of ApsaraDB RDS for PostgreSQL 12.
ApsaraDB RDS for PostgreSQL has supported partitioned tables for a long time. Before ApsaraDB RDS for PostgreSQL 10.0, users needed to manually create partitioned tables by inheriting data. Support for declarative partitions was introduced in V10.0. Users create partitioned tables by running SQL statements, which facilitates the use of partitioned tables. ApsaraDB RDS for PostgreSQL 11 supports hash partitions and enhances partition pruning policies in the planning and execution phases to improve the query performance of partitioned tables. ApsaraDB RDS for PostgreSQL 12 further enhances the query and data import performance of partitioned tables. Query performance significantly improves when there are a large number of partitions.
Create an ApsaraDB RDS for PostgreSQL 11 instance and an ApsaraDB RDS for PostgreSQL 12 instance with the same specifications (4-core and 8 GB), and test the performance of copying 100 million lines of data with different partition quantities. As shown in the following figure, the import performance of ApsaraDB RDS for PostgreSQL 12 is always better than that of ApsaraDB RDS for PostgreSQL 11 as the number of partitions increases. The improved performance of data copy is due to the support for batch insertion of partitioned tables in ApsaraDB RDS for PostgreSQL 12. In earlier versions, only one line of data could be inserted each time.
In ApsaraDB RDS for PostgreSQL 10, each partitioned table was checked in sequence for required data. The processing of each partitioned table was similar to that of a common table. ApsaraDB RDS for PostgreSQL 11 introduced the partition pruning feature, allowing users to quickly locate the partitions to be accessed. ApsaraDB RDS for PostgreSQL 12 further optimizes the partition pruning feature in order to avoid loading metadata for each partition and generating internal structures, which decouples the planned query time consumption from irrelevant partitions. This optimization is related to partition-based filtering in query criteria. Better partition-based filtering leads to fewer partitions to process and better optimization results.
The following figure compares the query performance on the partition key (also the primary key) with different partition quantities. As shown in the figure, the more partitions, the greater the improvement in the performance of ApsaraDB RDS for PostgreSQL 12. The performance can improve by up to 150 times. As the number of partitions increases, the performance of ApsaraDB RDS for PostgreSQL 12 remains relatively stable.
Although the performance of partitioned tables is greatly improved, there is still a gap in performance compared with that of a single table. When you design the table structure, you still need to determine whether to perform partitioning and specify the number of partitions based on the scenario requirements.
B-tree indexes are widely used in database systems which effectively reduce the amount of query data to be accessed and improve query performance. Indexes are query optimization policies that exchange space for time. They consume some additional storage space, and their performance is crucial for queries. ApsaraDB RDS for PostgreSQL 12 improves the overall performance of standard B-tree indexes and reduces their disk space usage. For composite indexes, the space usage can be reduced by up to 40%, effectively saving disk space. The performance of B-tree indexes with duplicate entries is also improved. In addition, the REINDEX CONCURRENTLY command is introduced to recreate indexes without affecting your business.
Let's test the space usage of B-tree indexes. Create the following tables and indexes in ApsaraDB RDS for PostgreSQL 11 and ApsaraDB RDS for PostgreSQL 12 respectively, insert 20 million lines of data in each table, and run the VACUUM statement to update the statistics.
CREATE TABLE foo ( aid bigint NOT NULL, bid bigint NOT NULL ); ALTER TABLE foo ADD CONSTRAINT foo_pkey PRIMARY KEY (aid, bid); CREATE INDEX foo_bid_idx ON foo(bid); INSERT INTO foo (aid, bid) SELECT i, i / 10000 FROM generate_series(1, 20000000) AS i; VACUUM (ANALYZE) foo;
Check the size of the foo_bid_idx index in V11 and V12.
# PostgreSQL 11 postgres=> \di+ foo_bid_idx List of relations Schema | Name | Type | Owner | Table | Persistence | Size | Description --------+-------------+-------+-------------+-------+-------------+--------+------------- public | foo_bid_idx | index | postgres | foo | permanent | 544 MB | (1 row) 2 . # PostgreSQL 12 postgres=> \di+ foo_bid_idx List of relations Schema | Name | Type | Owner | Table | Persistence | Size | Description --------+-------------+-------+-------------+-------+-------------+--------+------------- public | foo_bid_idx | index | postgres | foo | permanent | 408 MB | (1 row)
Note that the index size in V11 is 33% greater than that of V12. When there are a large number of indexes, the new version saves a significant amount of storage space.
Non-B-tree indexes are also enhanced. For example, the overhead for generating WAL logs of GiST, GIN, and SP-GiST indexes is reduced. GiST indexes are used to create covering indexes. The distance operator of SP-GiST indexes is used for K-NN queries.
Earlier versions of ApsaraDB RDS for PostgreSQL supported JSON-formatted data types and queries of simple JSON-formatted data. However, if the JSON-formatted data was complex, such as data including a lot of nested data and arrays, it's not possible to query the values easily. Instead, use external plug-ins, such as jsQuery, that support SQL and JSON path languages.
ApsaraDB RDS for PostgreSQL 12 provides further support for unstructured data. It also supports JSON features and various path query methods introduced by the SQL Server 2016 standard, introduces the new data type jsonpath to indicate the path expression, and supports various complex JSON-format queries without relying on plug-ins. For more information about specific usage, see the relevant documentation.
For repeatedly executed PREPARE statements, ApsaraDB RDS for PostgreSQL caches their execution plans. When you run the PREPARE statement, ApsaraDB RDS for PostgreSQL automatically chooses to generate a new plan (a custom plan) or use a cached plan (a generic plan). However, in certain scenarios, the plan selected by the database may not be optimal. ApsaraDB RDS for PostgreSQL 12 provides the plan_cache_mode parameter to select a plan. For example, if the query parameter is always a constant, you may explicitly set this parameter so that the optimizer always uses a generic plan. This optimizes query performance by avoiding SQL parsing and rewriting.
Run the ** PREPARE statement** and use a custom plan for the first 5 execution: postgres=> prepare p(integer) as select aid from foo where aid=$1; PREPARE postgres=> EXPLAIN EXECUTE p(1); QUERY PLAN------------------------------------------------------------------------ Index Only Scan using foo_pkey on foo (cost=0.44..1.56 rows=1 width=8) Index Cond: (aid = 1) (2 rows) 1. # The results of the next four executions are omitted here. Use a generic plan at the sixth execution: > postgres=> EXPLAIN EXECUTE p(1); QUERY PLAN ------------------------------------------------------------------------- Index Only Scan using foo_pkey on foo (cost=0.44..1.56 rows=1 width=8) Index Cond: (aid = $1) (2 rows) Run the PREPARE statement again, set plan_cache_mode to force_generic_plan, and check the plan usage. You can see that a generic plan has been used at the first execution. > postgres=> DEALLOCATE p; DEALLOCATE postgres=> prepare p(integer) as select aid from foo where aid=$1; PREPARE (1)# Set plan_cache_mode to force_generic_plan. postgres=> set plan_cache_mode = force_generic_plan; SET postgres=> EXPLAIN EXECUTE p(1); QUERY PLAN ------------------------------------------------------------------------- Index Only Scan using foo_pkey on foo (cost=0.44..1.56 rows=1 width=8) Index Cond: (aid = $1) (2 rows)
Variable parameterization in the execution plan helps to determine whether to use a generic plan.
For a long time, ApsaraDB RDS for PostgreSQL only supported the heap table storage engine, and its implementation was highly coupled with other modules. Based on its index scalability, ApsaraDB RDS for PostgreSQL 12 abstracts a layer of storage engine access APIs to support multiple storage engines, such as Zheap, columnar store, KVStore, and memory engines.
The following figure shows the architecture of pluggable table storage APIs. On the basis of the original architecture, Table Access Manager is added to provide a unified table access API. Different storage engines can be connected through this API.
Currently, only the heap table storage engine is supported. More storage engines will be supported in the near future. If you are interested in storage engines, try to implement one.
postgres=> select * from pg_am; oid | amname | amhandler | amtype ------+--------+----------------------+-------- 2 | heap | heap_tableam_handler | t 403 | btree | bthandler | i 405 | hash | hashhandler | i 783 | gist | gisthandler | i 2742 | gin | ginhandler | i 4000 | spgist | spghandler | i 3580 | brin | brinhandler | i (7 rows)
ApsaraDB RDS for PostgreSQL 12 supports more plug-ins to meet specific business requirements in a variety of verticals and special scenarios. The following describes some interesting and frequently used plug-ins. For more plug-ins, see the list of supported plug-ins in ApsaraDB RDS for PostgreSQL.
ApsaraDB RDS for PostgreSQL 12 significantly improves the query performance of partitioned tables and B-tree indexes. It uses a parameter to control the selection of the execution plan for PREPARE statements and supports built-in SQL and JSON path languages and more plug-ins. Pluggable table access APIs are very important because they will support multiple storage engines in the future. Currently, only the heap table storage engine is supported, which does not have much effect on users.
In addition to the features described in this article, V12 also provides many other features, such as multi-column MCV statistics and inline CTEs. See the relevant documentation or purchase instances from Alibaba Cloud to learn more.
ApsaraDB - January 22, 2021
Alibaba Clouder - January 10, 2018
ApsaraDB - March 12, 2020
digoal - May 16, 2019
digoal - April 30, 2021
digoal - May 9, 2020
An online MPP warehousing service based on the Greenplum Database open source programLearn More
An on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilitiesLearn More
Fully managed and less trouble database servicesLearn More
ApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.Learn More
More Posts by ApsaraDB