×
Community Blog Automatic Compression over Time with PostgreSQL

Automatic Compression over Time with PostgreSQL

In this article, you will learn how to compress data over time with customized PostgreSQL compression algorithms.

The PostgreSQL compression algorithm can be customized. For example, simple mean compression, maximum compression and minimum compression, or compression based on the revolving door compression algorithm.

Take the following table as an example

create table tbl (  
  id serial8 primary key,  -- primary key  
  sid int,                 -- sensor ID  
  hid int,                 -- indicator D  
  val float8,              -- collected value  
  ts timestamp             -- acquisition time 
);   
  
create index idx_tbl on tbl(ts); 

The 5-minute compression table will be like the following one.

create table tbl_5min (  
  id serial8 primary key,  -- primary key 
  sid int,                 -- sensor ID  
  hid int,                 -- indicator ID  
  val float8,              -- inheritance, average, easy to do ring analysis  
  ts timestamp,            -- inheritance, start time, easy to do ring analysis
  val_min float8,              -- minimum  
  val_max float8,              -- maximum  
  val_sum float8,              -- and 
  val_count float8,            -- number of acquisitions 
  ts_start timestamp,      -- interval start time  
  ts_end timestamp         -- interval end time 
);   
  
alter table tbl_5min inherit tbl;  

The 30-minute compression table will be like the following one.

create table tbl_30min (  
  id serial8 primary key,  -- primary key 
  sid int,                 -- sensor ID  
  hid int,                 -- indicator ID  
  val float8,              -- inheritance, average, easy to do ring analysis  
  ts timestamp,            -- inheritance, start time, easy to do ring analysis
  val_min float8,              -- minimum  
  val_max float8,              -- maximum  
  val_sum float8,              -- and 
  val_count float8,            -- number of acquisitions 
  ts_start timestamp,      -- interval start time  
  ts_end timestamp         -- interval end time 

);   
  
alter table tbl_30min inherit tbl;  

Then you can compress the table with the following statement.

5-minute compression statement

with tmp1 as (  
  delete from only tbl where ts <= now()-interval '1 day' returning *  
)  
insert into tbl_5min  
  (sid, hid, val, ts, val_min, val_max, val_sum, val_count, ts_start, ts_end)   
select sid, hid, avg(val) as val, min(ts) as ts, min(val) as val_min, max(val) as val_max, sum(val) as val_sum, count(*) as val_count, min(ts) as ts_start, max(ts) as ts_end from   
tmp1  
group by sid, hid, substring(to_char(ts, 'yyyymmddhh24mi'), 1, 10) || lpad(((substring(to_char(ts, 'yyyymmddhh24mi'), 11, 2)::int / 5) * 5)::text, 2, '0');  

30-minute compression statement

with tmp1 as (  
  delete from only tbl_5min where ts_start <= now()-interval '1 day' returning *  
)  
insert into tbl_30min  
  (sid, hid, val_min, val_max, val_sum, val_count, ts_start, ts_end)  
select sid, hid, min(val_min) as val_min, max(val_max) as val_max, sum(val_sum) as val_sum, sum(val_count) as val_count, min(ts_start) as ts_start, max(ts_end) as ts_end from   
tmp1     
group by sid, hid, substring(to_char(ts_start, 'yyyymmddhh24mi'), 1, 10) || lpad(((substring(to_char(ts_start, 'yyyymmddhh24mi'), 11, 2)::int / 30) * 30)::text, 2, '0');  

For the design, demo and more details, please go to PostgreSQL Time-Series Data Case: Automatic Compression over Time.

Related Blog Posts

Using the Swinging Door Trending Algorithm in PostgreSQL

A good compression algorithm is extremely important when it comes to any IoT scenarios that involve continuous monitoring and large amounts of data points. So the question arises are there compression algorithms available that can help with all of this but also ensure that the data is not distorted in the process? If so, which algorithms are they?

To answer this question, in this blog, we will show you the Swinging Door Trending (SDT) algorithm and how you can use it in PostgreSQL for effective compression in IoT scenarios.

The Swinging Door Trending (SDT) algorithm is a linear trend compression algorithm. In essence, it replaces a series of continuous data points with a straight line determined by the start and end points.

Double 11 Real-Time Monitoring System with Time Series Database

TimescaleDB is based on the traditional relational database Postgres and therefore does not support dynamic schema. But it supports pre-aggregation and downsampling. After Facebook Gorilla is developed, time series compression draws lots of attention. Proper compression can be performed based on the features of time series data. For example, if timestamps are consecutive and data is stable, time series compression by using delta-of-delta or xor is much more efficient that general compression.

With the combination of time series compression and block compression, this volume of data can be compressed to 413 GB, achieving a compression ratio of around 15:1. Compared with block compression alone, the combination of time series compression and block compression increases the compression effectiveness by 40%, which is of great significance to storage cost reduction. Another advantage is that block compression effectiveness will not be affected because time series compression is done before block compression and the size of data after time series compression is already very small. In large query scenarios, RT is reduced by half. When we scan data for a long time range, data has already been highly compressed. Therefore, the I/O efficiency is very high and RT can be reduced by half.

Related Documentation

How to enable column-store and compression in AnalyticDB for PostgreSQL?

Row-store with no compression is used by default when you create a table in AnalyticDB for PostgreSQL. To enable column-store and compression, you must specify the column-store and compression options when creating the table.

Trajectory best practices - ApsaraDB for RDS

LZ4 is an advanced compression algorithm, with a higher compression ratio and execution speed. You will learn how to enable the LZ4 compression algorithm and how to enable the LZ4 compression algorithm for the entire database by default in this document.

Related Products

AnalyticDB for PostgreSQL

ApsaraDB HybridDB for PostgreSQL is an online MPP (Massively Parallel Processing) data warehousing service based on the open source Greenplum Database.

ApsaraDB HybridDB provides online expansion and performance monitoring service to free your team from complicated MPP cluster operations and management (O&M). This enables database administrators, developers and data analysts to focus on upgrading enterprise productivity through SQL development.

Time Series Database (TSDB)

Time Series Database (TSDB) supports high-speed data reading and writing. It offers high compression ratios for cost-efficient data storage. This service also supports visualization of precision reduction, interpolation, multi-metric aggregate computing, and query results.

Get Our Database Celebration Offers

To celebrate Alibaba Cloud Database Service's recognition as A Strong Performer by Forrester, we are now giving out attractive offers exclusively for our customers. Please refer to the Terms and Conditions section for details and rules of the offers.

0 0 0
Share on

Alibaba Clouder

2,605 posts | 747 followers

You may also like

Comments