This article gives details about upgrading to PostgreSQL 14.
By Xie Guiqi
According to the Stack Overflow 2021 research report for developers, 72,517 developers worldwide participated in the research. PostgreSQL was selected as the most wanted database management system by global developers, which shows how much developers love it.
PostgreSQL has been popular worldwide for a long time because of its reliability and robustness. In the DB-Engines ranking, PostgreSQL is the first (and only) database to win the DBMS of the Year award for three times. The global development team of PostgreSQL announced the official release of PostgreSQL 14 on September 30 in 2021 after many expectations from users. As the leader in the cloud market, Alibaba Cloud is also conducting R&D and tuning based on PostgreSQL 14 in the first place. Now, users can use PostgreSQL 14 services on Alibaba Cloud.
If you are using ApsaraDB RDS for PostgreSQL 9.4/10/11/12/13, you can easily upgrade to ApsaraDB RDS for PostgreSQL 14 on the console through the Version Update function. You do not need to update the version step by step following the operation guidelines.
ApsaraDB RDS for PostgreSQL is fully compatible with open-source PostgreSQL, bringing the new features of PostgreSQL 14 to you:
In terms of the application performance for businesses, PostgreSQL 14 has been optimized and improved significantly.
- In scenarios with a large number of connections, the throughput of transactions has improved immensely. In some test scenarios, the throughput can be increased by up to two times. The B-tree index updates performance is improved that the B-tree index bloat caused by frequent table updates is reduced.
- The performance and features of partitioned tables continue to be enhanced. The partition pruning capabilities of the optimizer are improved. The creation of subplans corresponding to different partitions and memory usage are decreased, besides, duplicate cached plans is avoided, while the performance of SQL statements that only involve a few partitions is improved significantly. The option ‘CONCURRENTLY’ can be used in the statement to detach partitions without any locks that prevent writes on the tables.
- The pipeline mode is introduced to improve the data write throughput in a poor network environment, which can achieve performance close to the performance of the local network. System performance can be improved substantially both in high workload scenario with frequent updates or poor network environment with high-latency connections. It is especially suitable for time series and IoT services that require high performance for large write throughput even with high access latency across databases, LANs, or WANs.
- TOAST data supports the LZ4 compression to provide a higher compression ratio and better compression performance. It applies to all business scenarios especially for tables with large chunks of data in a field, such as the text type.
- The performance of foreign tables is continuously enhanced to support asynchronous parallel computing. It is especially suitable for data analysis across database instances, shards, databases or tables. This improves the efficiency of parallel computing for multiple instances. Keep connections in session level for foreign servers is supported, which boosts the performance for sharded database. However, there might be errors when handling frequent short transactions in previous version that requires a session restart for each remote access.
In terms of security, PostgreSQL 14 has two highlights:
- Two predefined non-login roles, pg_read_all_data and pg_write_all_data, are added. You can easily set read-only and read/write permissions to improve database management efficiency.
- The SCRAM-SHA-256 password encryption method is used by default to avoid possible key disclosure and protocol attacks of MD5 which is used in previous version. This improves the security of authentication.
Richer Data Types and SQL Statements
PostgreSQL has always had extensive support for data types that the rich extensions allows PostgreSQL to specialize in processing data types, and PostgreSQL 14 continues to optimize on top of that.
- Multirange is added to range type. It is compatible with all existing operators and indexes of range type, which makes it suitable for storing metric fluctuation data of sensors in many time series related scenarios like IoT. The value of a multirange can contain zero or more valid ranges, and most range operators works on multiranges, such as index acceleration and logical queries like containment, intersection, left-of/right-of/adjacent operators. The date_bin() function is added to binning by an arbitrary interval instead of full units. This is suitable for time series and analysis scenarios.
- SEARCH and CYCLE clauses are added to recursive queries, which makes it possible to produce breadth-first or depth-first search orders and detect cycles. This is suitable for social, risk control, graph and other scenarios.
- Heap scanning of a range of TIDs is used to improve the I/O performance of parallel sequential scans and random table scans, this is realized by allocating blocks in groups to parallel workers. And users are allowed to customize the range of data blocks to be scanned. After enabling Repeated Read mode, single-table parallel export and full-table parallel updates without lock conflicts can be easily achieved with the help of snapshot export. This is suitable for most of scenarios, especially scenarios with high concurrency and high throughput.
- JSONB subscripting and OUT parameters in procedures are also supported.
More Convenient Database Management
PostgreSQL 14 has made great efforts to improve the work efficiency and user experience of DBA.
- Garbage collection is enhanced. Indexing in the CONCURRENTLY mode does not cause index bloat, and the probability of garbage expansion for OLTP services is decreased.
- The system view pg_stat_progress_copy is added to report copy progress so that the import progress and speed can be observed.
- The system view pg_stat_replication_slots is added to report replication slot activities. Statistics related to database WAL logs and each slot are observable.
- The metric ‘waitstart’ is added to pg_locks for tracking the start time of lock waiting.
- Multiple metrics are added to the pg_stat_database system view, such as active_time, idle_time, and session_time, and these can be observed to determine whether the database has reached a bottleneck.
- The statistics on the number of SQL hard parse and soft parse are added to facilitate the consumption analysis and optimization at the SQL parsing layer.
- More powerful run-time statistics facilitate the understanding of I/O resource usage and WAL bottleneck analysis. The metrics include wal_buffer_full and wal_sync are added to statistics collectors like pg_stat_wal to records number of times WAL data was written to disk because WAL buffers becames full and number of times WAL files were synced to disk via issue_xlog_fsync request.
Stream Replication and Backup
Logical replication API is enhanced to support streaming large in-progress transactions. Logical replication supports logical decoding and sending without waiting for transaction completion, which reduces the replication latency of large and long transactions. This is suitable for scenarios, such as multi-IDC deployment and architectures with many subscribers. And it supports multi thread, allowing you to receive WAL logical logs while synchronizing full data. This can shorten the time consumed for the first synchronization of large tables to become consistent. Besides, the ALTER SUBSCRIPTION syntax is enhanced, publications can easily be added or removed from a subscription, which makes subscription more convenient to manage.
The section above introduces some important features of open-source PostgreSQL 14. In addition, Alibaba Cloud ApsaraDB RDS for PostgreSQL 14 also has following exclusive capabilities of earlier versions.
The Ganos plugins (a spatio-temporal information engine) are supported to store, index, query, analyze, and compute spatial and spatio-temporal data efficiently. Ganos has been refined by different GIS scenarios and is simple and efficient to use. It is one of the best practices in the geographic information system (GIS) field.
SGX for Fully Encrypted Database
The plugins of fully encrypted databases are supported. After a fully encrypted database is used, data is encrypted on the user side and passed to cloud databases. All data stored in the database, transmitted in the network, and cached in the memory are encrypted. No one except the data owner can access the plaintext data of the user to prevent end-to-end data leakage throughout the procedure. Fully encrypted databases are very suitable for privacy encryption scenarios like sensitive data in the finance, healthcare, government affairs, and marketing fields.
Logical Replication Slot Failover
By default, the replication slots created on the primary instance by PostgreSQL are not synchronized to the secondary instance via the streaming replication protocol. Once failover, the logical replication slots are lost, and consequently, the logical subscriptions are interrupted. The Logical Replication Slot Failover feature can synchronize all logical replication slots from the primary RDS instance to the secondary RDS instance.
Like always, it brings a lot of exciting changes, that’s why PostgreSQL users seem to be eager for new versions. The good news is that Alibaba Cloud ApsaraDB RDS for PostgreSQL 14 is available now. You can click this link to upgrade to the latest version or start a free trial!