PostgreSQL Data Rotate Method Introduction: Overwrite History Data by Time

Background

In some business scenarios, data can be either hot or cold. In cases where a business may require only the data of the last day, week, or month, it is feasible to discard historical data. For example, in some monitoring scenarios, the monitoring data from the last week is retained while other historical data is discarded. The dilemma is how to discard historical data or implement rotation? Let's take a quick look at the available two options to resolve this issue.

1) Delete data generated more than one week ago.

delete from table where crt_time<=now()-interval '7 day';

This method incurs additional overhead, including writing redo logs and garbage collection. If a large amount of data needs to be deleted, statistics must be collected again. If the statistics are not collected in time, the information may be inaccurate. This may also lead to merge JOIN problems.

2) Use partition tables in polling mode and use truncate to clear partitions.

This method avoids problems caused by deletion. However, improper use may also lead to new problems. Both truncate and rename table are Data Definition Language (DDL) operations. We recommend adding a lock timeout mechanism for the DDL operations. Otherwise, DDL waiting will block other SQL statements.

Switching table names require a temporary table name that is not currently in use.

The following section uses a partition table as an example to describe how to rotate data.

Example

Assuming data is retained for one week, the following section describes how to rotate data according to the second method stated above.

Note: PostgreSQL supports DDL transactions.

Method

There are nine tables, one primary and eight partition tables, of which seven tables correspond to dow partitions and one table corresponds to the default partition (swap partition). The advantage of using constraints in dow partitions is that partitions are directly filtered according to the constraints during query.

Step 1) Use the following command to create the primary table.

create table test(id int primary key, info text, crt_time timestamp);

Step 2) Create partitions as shown below.

create table test0(like test including all, constraint ck check(extract(dow from crt_time)=0)) inherits(test);  
create table test1(like test including all, constraint ck check(extract(dow from crt_time)=1)) inherits(test);  
create table test2(like test including all, constraint ck check(extract(dow from crt_time)=2)) inherits(test);  
create table test3(like test including all, constraint ck check(extract(dow from crt_time)=3)) inherits(test);  
create table test4(like test including all, constraint ck check(extract(dow from crt_time)=4)) inherits(test);  
create table test5(like test including all, constraint ck check(extract(dow from crt_time)=5)) inherits(test);  
create table test6(like test including all, constraint ck check(extract(dow from crt_time)=6)) inherits(test);  
  
create table test_def(like test including all) inherits(test);

Step 3) Select, Update, and Delete data operations are performed directly in the primary table. Add time conditions to filter partitions.

postgres=#  explain select * from test where crt_time=now() and extract(dow from crt_time)=0; 
                                             QUERY PLAN                                              
-----------------------------------------------------------------------------------------------------
 Append  (cost=0.00..65.20 rows=3 width=44)
   ->  Seq Scan on test  (cost=0.00..0.00 rows=1 width=44)
         Filter: ((crt_time = now()) AND (date_part('dow'::text, crt_time) = '0'::double precision))
   ->  Seq Scan on test0  (cost=0.00..32.60 rows=1 width=44)
         Filter: ((crt_time = now()) AND (date_part('dow'::text, crt_time) = '0'::double precision))
   ->  Seq Scan on test_def  (cost=0.00..32.60 rows=1 width=44)
         Filter: ((crt_time = now()) AND (date_part('dow'::text, crt_time) = '0'::double precision))
(7 rows)

Step 4) Use the program to automatically generate table names according to the dow partitions of the time specified by crt_time to directly insert data into the partition tables.

If the program has trouble inserting data into the partitions, use a trigger or rule as shown in the example below.

create or replace function ins_tg() returns trigger as $$
declare
begin
  case extract(dow from NEW.crt_time)
  when 0 then insert into test0 values (NEW.*);
  when 1 then insert into test1 values (NEW.*);
  when 2 then insert into test2 values (NEW.*);
  when 3 then insert into test3 values (NEW.*);
  when 4 then insert into test4 values (NEW.*);
  when 5 then insert into test5 values (NEW.*);
  when 6 then insert into test6 values (NEW.*);
  end case;
  return null;
end;
$$ language plpgsql strict;

create trigger tg before insert on test for each row execute procedure ins_tg();

insert into test values (1,'test',now()+interval '1 day');
insert into test values (1,'test',now()+interval '2 day');
insert into test values (1,'test',now()+interval '3 day');
insert into test values (1,'test',now()+interval '4 day');
insert into test values (1,'test',now()+interval '5 day');
insert into test values (1,'test',now()+interval '6 day');
insert into test values (1,'test',now()+interval '7 day');

postgres=# select tableoid::regclass , * from test;
 tableoid | id | info |          crt_time          
----------+----+------+----------------------------
 test0    |  1 | test | 2017-03-26 14:40:48.066905
 test1    |  1 | test | 2017-03-27 14:40:50.450942
 test2    |  1 | test | 2017-03-28 14:40:52.271922
 test4    |  1 | test | 2017-03-23 14:40:22.551928
 test5    |  1 | test | 2017-03-24 14:40:24.643933
 test6    |  1 | test | 2017-03-25 14:40:28.138913
 test3    |  1 | test | 2017-03-22 14:40:20.586945
(7 rows)

Rotate Method

Process transactions once each day during idle periods. DDL supports transactions. If a transaction fails, it's recommended to retry it. Complete the following operations in a transaction:

1) Calculate the dow partitions for the next day.
2) Clear test_def constraints.
3) Clear test_def data.
4) Rename test_def to test_def_tmp (a nonexistent table name).
5) Rename the partition for the next day to test_def.
6) Add constraints to test_def_tmp.
7) Rename test_def_tmp to the partition for the next day.

Consider the following example.

1. 开始事务  
begin;  
  
2. 设置锁超时  
set lock_timeout = '60s';  
  
3. 查询明天的dow  
select extract(dow from current_date+1);  
 date_part   
-----------  
         3  
(1 row)  
  
4. 清除test_def约束  
alter table test_def drop constraint IF EXISTS ck;  

5. 清除test_def数据  
truncate test_def;  
  
6. 重命名test_def  
alter table test_def rename to test_def_tmp;  
  
7. 重命名明天的分区表  
alter table test3 rename to test_def;  
  
8. test_def_tmp添加约束(明天)  
alter table test_def_tmp add constraint ck check(extract(dow from crt_time)=3);  
  
9. test_def_tmp重命名为明天的分区  
alter table test_def_tmp rename to test3;  
   
10. 提交或回滚
commit;

如果失败，回滚事务。  
Rollback;

Take the following precautions to avoid any errors:

1) Lock timeout
2) Rollback upon transaction failure
3) Intermediate table name does not exist
4) Unified constraint name

Summary

Using the delete method to delete historical data may incur additional overhead, including writing redo logs and garbage collection. If a large amount of data needs to be deleted, make sure to collect statistics again. The information may be inaccurate if the statistics are not collected in time, in addition to causing merge Join problems. Since PostgreSQL supports DDL encapsulation in transactions, partition tables can be used in polling mode.

Community

PostgreSQL Data Rotate Method Introduction: Overwrite History Data by Time

Background

Example

Method

Rotate Method

Summary

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

PolarDB for PostgreSQL

AnalyticDB for PostgreSQL

Tair

Time Series Database (TSDB)