If you want to reduce the storage usage and improve the read performance of an ApsaraDB RDS for PostgreSQL instance without the need to change the instance type, you can use the transparent page compression (TPC) feature. The feature compresses and decompresses page data in the buffer pool in real time. This effectively reduces storage costs and improves I/O throughput. However, the feature deteriorates CPU performance.
Prerequisites
Your RDS instance meets the following requirements:
The RDS instance runs PostgreSQL 14 or later.
The RDS instance runs a minor engine version of 20240530 or later. For more information about how to update a minor engine version, see Update the minor engine version.
Feature description
The TPC feature compresses and decompresses page data in the buffer pool in real time. Data is automatically compressed when the data is written to a disk and decompressed when the data is read from the disk. The compression and decompression processes are imperceptible to users.
The TPC feature is developed to reduce the storage usage and improve the read performance of your RDS instance. The feature reduces the disk I/O and the storage usage, improves cache efficiency, and accelerates data transmission by compressing data.
Scenarios
The CPU utilization is less than 50%. The IOPS or I/O throughput frequently reaches a bottleneck.
Benefits
Storage costs are reduced by approximately 50% on average.
The I/O usage is reduced by approximately 50% on average.
In read scenarios, transactions per second (TPS) is increased. In some read scenarios in which the I/O throughout reaches the upper limit, TPS is increased by up to 100%.
Impacts
The feature increases the CPU utilization of your RDS instance. The CPU utilization for data compression is increased by approximately 260%. The CPU utilization for data decompression is increased by approximately 40%.
In write scenarios, TPS is decreased.
The TPC feature performs poor on TOAST data.
Procedure
Create a tablespace for compression to use the TPC feature because the feature depends on tablespaces.
CREATE TABLESPACE rds_compress LOCATION '/data/postgresql/rds_compress' WITH(COMPRESSION='zstd');
ImportantDo not modify the name, path, and compression algorithm of the tablespace that is created in this step.
Create a table or an index for compression.
When you create or modify a table and an index, specify the tablespace for compression to use the TPC feature.
Table compression
-- When you create a table, specify the tablespace for compression to use the TPC feature. CREATE TABLE <tablename> ... TABLESPACE rds_compress; -- When you modify a table, change the tablespace of the table to the tablespace for compression. ALTER TABLE <tablename> SET TABLESPACE rds_compress;
Index compression
-- When you create an index, specify the tablespace for compression to use the TPC feature. CREATE INDEX <indexname> ... TABLESPACE rds_compress; -- When you modify an index, change the tablespace of the index to the tablespace for compression. ALTER INDEX <indexname> SET TABLESPACE rds_compress;
Change the default tablespace to the tablespace for compression. This way, the TPC feature is used for new tables and indexes by default.
-- Set the default tablespace to the tablespace for compression. SET default_tablespace TO 'rds_compress'; -- When you create a table or an index, you do not need to specify a tablespace. By default, the TPC feature is used. CREATE TABLE <tablename> ...; CREATE INDEX <indexname> ...;
Related queries
Check whether the table or index is created in the tablespace for compression.
In the psql command line, enter
\d+ <Table name>
to query the details of a table.If the table and index are created in the tablespace for compression, you can use the TPC feature. The following output shows the schema of the sysbench table.
Table "public.sbtest1" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+----------------+-----------+----------+-------------------------------------+----------+-------------+--------------+------------- id | integer | | not null | nextval('sbtest1_id_seq'::regclass) | plain | | | k | integer | | not null | 0 | plain | | | c | character(120) | | not null | ''::bpchar | extended | | | pad | character(60) | | not null | ''::bpchar | extended | | | Indexes: "sbtest1_pkey" PRIMARY KEY, btree (id), tablespace "rds_compress" "k_1" btree (k), tablespace "rds_compress" Tablespace: "rds_compress" Access method: heap
Check whether data is compressed.
You can use functions such as
pg_database_size
,pg_tablespace_size
,pg_relation_size
,pg_table_size
,pg_index_size
, andpg_total_relation_size
to view the size of data in real time.NoteYou can use sysbench to separately insert the same amount of test data into a compressed table and a non-compressed table to check the disk usage of the tables. The disk usage of the compressed table is approximately 50% of the disk usage of the non-compressed table.
Calculate the compression ratio of a table.
SELECT pg_relation_size('<tablename>')::float / (relpages * 8192) from pg_class WHERE relname = '<tablename>';
NoteIn the preceding statement, the
pg_relation_size('<tablename>')
function is called to query the disk usage of a table in bytes from thepg_class
system directory and convert the obtained value into a floating-point number. Then, the floating-point number is divided by the value of(relpages × 8192)
to obtain the average size of each page in bytes.relpages
specifies the total number of pages in the table.8192
is the default page size in PostgreSQL, which is 8 KB in most cases. The result of the query indicates the average compression ratio for the table. A small value indicates a high compression ratio.
Disable the TPC feature for a table or index
If you no longer use the TPC feature for a table or index, execute the following statements to disable the TPC feature for the table or index.
ALTER TABLE <tablename> SET TABLESPACE pg_default;
ALTER INDEX <indexname> SET TABLESPACE pg_default;
Performance test
In this test, sysbench is used to evaluate the performance before and after you enable the TPC feature for your RDS instance.
The following table describes the configuration of the tested RDS instance.
Specifications | Storage capacity | Buffer pool |
8 cores, 32 GB | PL1 Enterprise SSD (ESSD) with a maximum I/O bandwidth of 350 MB/s | Default value: 8 GB |
Test 1 in extreme conditions (CPU or I/O resources exhausted)
The test is performed on the datasets of 8 GB, 80 GB, and 640 GB. The following code describes the configuration of key parameters for sysbench.
--tables=100 --table-size=<400000, 4000000, or 32000000> --report-interval=1 --time=100 --threads=64
You must configure the --table-size
parameter based on the size of the dataset. You must set the parameter to 400000 for the dataset of 8 GB, 4000000 for the dataset of 80 GB, and 32000000 for the dataset of 640 GB dataset.
Test results
otlp_read_only
Dataset size
Whether to enable TPC
TPS
I/O bandwidth (MB/s)
CPU utilization
8 GB
No
6,878
Read bandwidth: 0
100%
Yes
6,914
Read bandwidth: 0
100%
80 GB
No
5,939
Read bandwidth: 280
100%
Yes
5,945
Read bandwidth: 15
100%
640 GB
No
2,222
Read bandwidth: 350
44%
Yes
4,508
Read bandwidth: 320
100%
otlp_write_only
Dataset size
Whether to enable TPC
TPS
I/O bandwidth (MB/s)
CPU utilization
8 GB
No
22,151
Read bandwidth: 0
Write bandwidth: 100
100%
Yes
22,314
Read bandwidth: 0
Write bandwidth: 50
100%
80 GB
No
7,044
Read bandwidth: 80
Write bandwidth: 270
30%
Yes
5,493
Read bandwidth: 10
Write bandwidth: 180
100%
640 GB
No
2,375
Read bandwidth: 80
Write bandwidth: 270
20%
Yes
1,245
Read bandwidth: 25
Write bandwidth: 210
100%
Test 2 in non-extreme conditions (CPU and I/O resources not exhausted)
In this test, sysbench is used to test a dataset of 640 GB. The following code describes the configuration of key parameters for sysbench.
--tables=100 --table-size=32000000 --report-interval=1 --time=100 --threads=4
Test results
Test method | Whether to enable TPC | TPS | I/O bandwidth (MB/s) | CPU utilization |
otlp_read_only | No | 720 |
| 7.5% |
Yes | 795 |
| 10.6% | |
otlp_write_only | No | 1,497 |
| 6.8% |
Yes | 1,000 |
| 25% |
Conclusions
After you enable the TPC feature, take note of the following points:
The read performance is improved. If you use a large dataset, the read performance is significantly improved.
The write performance is decreased. If you use a large dataset, the write performance is significantly decreased.
The RDS instance consumes more CPU resources and reduces I/O consumption.
The TPC feature is suitable for scenarios in which a large number of read requests and a small number of write requests need to be processed. After you enable the TPC feature for your RDS instance, the performance is significantly improved especially when the CPU resources are sufficient and the usage of I/O resources is high.
FAQ
After I enable the TPC feature for my RDS instance, do pg_dump and pg_basebackup run as expected?
Yes, pg_dump and pg_basebackup run as expected. If you want to restore backup data that is generated by using pg_basebackup to another RDS instance, you must enable the TPC feature for the destination RDS instance.