rule ELT - The Real-Time Conversion of Row Store and Column Store of Time Series Data

By digoal

Background

Time series data can be stored in rows or columns, but how do I choose which one is suitable for me?

The advantage of row storage is that querying rows (or a large number of points in rows) are more efficient because these values are densely stored in centralized blocks.
When a large number of consecutive single points need to be queried, the row store is not suitable, because the points that are not queried in the same row will also be scanned, wasting IO and buffer. In this case, it is better to use column store.
However, column store is more suitable in the OLAP scenario because the OLAP scenario usually involves large amounts of data scanning and calculation for a small number of columns. Column store saves IO, and vectorization technology can be used to accelerate calculation.

How do column stores spell out the same row of records?

You can use PK (common for fixed or variable length columns) or offset (only for fixed length columns) to string columns in the same row.

The storage format of PostgreSQL is the HEAP table by default. How can we convert a wide table into a point table?

Use rule

Example:

Create a wide table (indicating all dimensions (points) collected by the sensor. For example, there are eight dimensions plus a time field).

create table t (ts timestamp primary key, c1 int, c2 int, c3 int, c4 text, c5 int, c6 int, c7 int, c8 int);

Create a point table:

create table t1 (ts timestamp primary key, c1 int);  
create table t2 (ts timestamp primary key, c2 int);  
create table t3 (ts timestamp primary key, c3 int);  
create table t4 (ts timestamp primary key, c4 text);  
create table t5 (ts timestamp primary key, c5 int);  
create table t6 (ts timestamp primary key, c6 int);  
create table t7 (ts timestamp primary key, c7 int);  
create table t8 (ts timestamp primary key, c8 int);

Create a rule. Distribute data to a point table when writing to a wide table, with TS fields to string the columns of the same row:

create rule r1 as on insert to t do instead (  
insert into t1 values (NEW.ts, NEW.c1);  
insert into t2 values (NEW.ts, NEW.c2);  
insert into t3 values (NEW.ts, NEW.c3);  
insert into t4 values (NEW.ts, NEW.c4);  
insert into t5 values (NEW.ts, NEW.c5);  
insert into t6 values (NEW.ts, NEW.c6);  
insert into t7 values (NEW.ts, NEW.c7);  
insert into t8 values (NEW.ts, NEW.c8)  
);

Write test and the wide table, and the data will be automatically converted to the point table:

postgres=# insert into t values (now(),1,1,1,'test',1,1,1,1);  
INSERT 0 1  
  
  
postgres=# select * from t;  
 ts | c1 | c2 | c3 | c4 | c5 | c6 | c7 | c8   
----+----+----+----+----+----+----+----+----  
(0 rows)  
  
postgres=# select * from t1;  
             ts             | c1   
----------------------------+----  
 2022-06-10 15:22:41.916042 |  1  
(1 row)  
  
postgres=# select * from t2;  
             ts             | c2   
----------------------------+----  
 2022-06-10 15:22:41.916042 |  1  
(1 row)  
  
postgres=# select * from t3;  
             ts             | c3   
----------------------------+----  
 2022-06-10 15:22:41.916042 |  1  
(1 row)  
  
postgres=# select * from t4;  
             ts             |  c4    
----------------------------+------  
 2022-06-10 15:22:41.916042 | test  
(1 row)  
  
postgres=# select * from t5;  
             ts             | c5   
----------------------------+----  
 2022-06-10 15:22:41.916042 |  1  
(1 row)  
  
postgres=# select * from t6;  
             ts             | c6   
----------------------------+----  
 2022-06-10 15:22:41.916042 |  1  
(1 row)  
  
postgres=# select * from t7;  
             ts             | c7   
----------------------------+----  
 2022-06-10 15:22:41.916042 |  1  
(1 row)  
  
postgres=# select * from t8;  
             ts             | c8   
----------------------------+----  
 2022-06-10 15:22:41.916042 |  1  
(1 row)

How is the performance?

postgres=# \timing  
Timing is on.  
postgres=# insert into t select clock_timestamp(),1,1,1,'test',1,1,1,1 from generate_series(1,100000);  
INSERT 0 100000  
Time: 791.200 ms  
postgres=# insert into t select clock_timestamp(),1,1,1,'test',1,1,1,1 from generate_series(1,300000);  
INSERT 0 300000  
Time: 2233.316 ms (00:02.233)  
  
  
 public | t                | table | postgres | permanent   | heap          | 8192 bytes |   
 public | t1               | table | postgres | permanent   | heap          | 17 MB      |   
 public | t2               | table | postgres | permanent   | heap          | 17 MB      |   
 public | t3               | table | postgres | permanent   | heap          | 17 MB      |   
 public | t4               | table | postgres | permanent   | heap          | 17 MB      |   
 public | t5               | table | postgres | permanent   | heap          | 17 MB      |   
 public | t6               | table | postgres | permanent   | heap          | 17 MB      |   
 public | t7               | table | postgres | permanent   | heap          | 17 MB      |   
 public | t8               | table | postgres | permanent   | heap          | 17 MB      |

If you convert it by points, the processing performance is 1.01 million points per second.

The hardware is listed below:

Apple Notebook
MacBook Pro (13-inch, 2019, Four Thunderbolt 3 ports)
2.4GHz quad-core Intel Core i5
16GB 2133 MHz LPDDR3
256 GB disk

Community

rule ELT - The Real-Time Conversion of Row Store and Column Store of Time Series Data

Background

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

Time Series Database for InfluxDB®

Time Series Database (TSDB)

PolarDB for PostgreSQL

Application Real-Time Monitoring Service