By digoal
TimescaleDB is a time series data plug-in for PostgreSQL. Its version 1.5 enables automatic compression.
As of version 1.5, TimescaleDB supports native data compression. This functionality does not require you to use any specific file system or external software. Also, it is simple to set up and configure.
For example, a sensor monitoring table has several fields, such as time, device id, monitoring dimension, and monitoring value.
time | device_id | CPU | disk_io | energy_consumption |
---|---|---|---|---|
8/22/2019 0:00 | 1 | 88.2 | 20 | 0.8 |
8/22/2019 0:05 | 2 | 300.5 | 30 | 0.9 |
You can set the compression segment field, sorting field, and compression cycle of the table through an SQL interface.
ALTER TABLE measurements SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'device_id'
);
ALTER TABLE measurements
SET (timescaledb.compress,
timescaledb.compress_orderby = 'device_id, time DESC');
SELECT add_compress_chunks_policy('measurements', INTERVAL '7 days');
There is a significant amount of time series data with high-concurrency and high-throughput recent data write. There may be updates and requirements for detailed queries as well. As time progresses, historical data has gradually become necessary for analysis. Compression helps to solve the storage space problem.
Compression needs to solve the storage space problem without affecting the efficiency of the writing, updating, or detail querying of recent data. In addition, historical data is usually queried on a large scale. Therefore, the compression shouldn’t affect the large-scale query efficiency.
1) Ensure high efficiency in the writing, updating, and detail querying of the recent data
2) Lossless compression
3) Compressed data in certain fields can be efficiently queried
TimescaleDB compression: Select the segment field, sorting field, and time field to determine the records before finalizing data in what time period should be compressed.
For example, compress the data in the time field 7 days ago. The segment field of the compression is device_id, and the sorting field is in reverse chronological order.
The compressed result is as follows. In fact, it uses an array to store multi-values and multiple records. Actually, some cases have been introduced before, about aggregation to improve the efficiency of range retrieval, the vops plug-in to improve the analysis efficiency, and so on. For reference, see the end of this article.
time | device_id | CPU | disk_io | energy_consumption |
---|---|---|---|---|
[12:00:02, 12:00:01] | 1 | [88.2, 88.6] | [20, 25] | [0.8, 0.85] -- a chunk |
[12:00:02, 12:00:01] | 2 | [300.5, 299.1] | [30, 40] | [0.9, 0.95] -- another chunk |
Multiple records are compressed into one record, usually with one chunk for every 1,000 records.
Querying by device_id, the query efficiency is guaranteed.
The timescale v1.5 does not support updating the data compressed to the table (which may be supported in the future).
Why compress previous data?
1) Ensure the efficiency of writing, updating, and querying recent data. Data cannot be written or updated after compression.
2) The arrival order of recent data may not be completely consistent with the time field. For example, data whose ts is 20:00 may not be written until 21:00 (due to program problems or other problems). Therefore, you should compress the data before reaching the timescale (data that does not change).
And consequently, you should compress the previous data rather than the recent data.
1) Each chunk must contain at least a certain number of records to guarantee high compression efficiency. For example, when you compress 1,000 records into one chunk, the compression efficiency is high (compressing the data on the pglz and gzip fields).
Therefore, the selected segment field cannot be pk because pk means only one record, and there is nothing to compress.
Typically, you should select the fields that contain query requirements (where xx) and duplicate values. For example, the device_id.
You typically select the second condition of where, such as time.
Suppose to query the data of the specified device in the specified time period, for example, in a certain period of 5 minutes.
Then, you can select ts as the sorting field.
ALTER TABLE measurements
SET (timescaledb.compress,
timescaledb.compress_orderby = 'device_id, time DESC');
Currently, you can use the built-in column compression (toast pglz) of PostgreSQL. Multiple records are converted into an array to store compressed data.
One of the current limitations of TimescaleDB is that once you convert chunks into compressed columns, any further modifications of the data (e.g., inserts, updates, deletes) or the schema without manual decompression are not allowed.
In other words, chunks are immutable in compressed form. Attempts to modify the chunks' data will either result in an error or fail silently (as preferred by users). Future releases plan to remove this limitation.
Compression tables supporting inserts, updates, and deletes.
PostgreSQL Deferrable Constraints: Unique, Primary Key, Foreign Key, and Exclude
digoal - May 28, 2021
digoal - September 27, 2022
digoal - October 16, 2023
ApsaraDB - July 28, 2021
Alibaba Clouder - July 29, 2019
digoal - May 16, 2019
A cost-effective online time series database service that offers high availability and auto scaling features
Learn MoreTSDB is a stable, reliable, and cost-effective online high-performance time series database service.
Learn MoreAlibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreLeverage cloud-native database solutions dedicated for FinTech.
Learn MoreMore Posts by digoal