All Products
Search
Document Center

PolarDB:Schedule the archiving of partitioned table data to cold storage

Last Updated:Aug 09, 2024

This topic describes the best practice for creating a scheduled task to automate the archiving of partitioned table data to cold storage.

Background information

For tables that are partitioned by time range, the frequency of accessing older partitions decreases with time. To reduce storage costs, you can configure a scheduled task to automate the archiving of partitioned table data to cold storage.

PolarDB for PostgreSQL (Compatible with Oracle) clusters allow you to use the pg_corn extension to automate the archiving of partitioned table data to cold storage.

Create the pg_cron extension

CREATE EXTENSION IF NOT EXISTS pg_cron;
Note

The pg_cron extension can be created only in a PostgreSQL database. To create the pg_cron extension in a PostgreSQL database, use a privileged account to connect to the PostgreSQL database. If you do not have a privileged account, log on to the console to create a privileged account.

Procedure

In this example, the db01 database is used to describe how to automatically archive partitioned table data to cold storage.

  1. Prepare data.

    -- Create a partitioned table.
    db01=> CREATE TABLE traj(
      tr_id serial,
      tr_lon float,
      tr_lat float,
      tr_time timestamp(6)
    )PARTITION BY RANGE (tr_time);
    
    db01=> CREATE TABLE traj_202301 PARTITION OF traj
        FOR VALUES FROM ('2023-01-01') TO ('2023-02-01');
    db01=> CREATE TABLE traj_202302 PARTITION OF traj
        FOR VALUES FROM ('2023-02-01') TO ('2023-03-01');
    db01=> CREATE TABLE traj_202303 PARTITION OF traj
        FOR VALUES FROM ('2023-03-01') TO ('2023-04-01');
    db01=> CREATE TABLE traj_202304 PARTITION OF traj
        FOR VALUES FROM ('2023-04-01') TO ('2023-05-01');
        
    -- Write test data to the partitioned table.
    db01=> INSERT INTO traj(tr_lon,tr_lat,tr_time) values(112.35, 37.12, '2023-01-01');
    db01=> INSERT INTO traj(tr_lon,tr_lat,tr_time) values(112.35, 37.12, '2023-02-01');
    db01=> INSERT INTO traj(tr_lon,tr_lat,tr_time) values(112.35, 37.12, '2023-03-01');
    db01=> INSERT INTO traj(tr_lon,tr_lat,tr_time) values(112.35, 37.12, '2023-04-01');
    
    -- Create an index on the partitioned table.
    db01=> CREATE INDEX traj_idx on traj(tr_id);
  2. Create an OSFS tool.

    db01=> CREATE EXTENSION IF NOT EXISTS polar_osfs_toolkit;
  3. Create a scheduled task.

    Use the privileged account to connect to the db01 database, and then execute the following SQL statement to create a task named task1 for the database. The task checks the partitions of the table every minute, automatically archives partitions older than three days to cold storage, and returns the task ID.

    -- Execute the task every minute.
    postgres=> SELECT cron.schedule_in_database('task1', '* * * * *', 'select polar_alter_subpartition_to_oss_interval(''traj'', ''3 days''::interval);', 'db01');
     schedule_in_database 
    ----------------------
                        1
    Note

    For more information about how to use the polar_alter_subpartition_to_oss_interval function, see polar_alter_subpartition_to_oss_interval.

    In addition to configuring a task that is executed every minute, you can use pg_cron to configure tasks to run at specific points in time each day or on specific days each month.

    -- Run the task at 10:00 AM every day. The time is in GMT.
    postgres=> SELECT cron.schedule_in_database('task2', '0 10 * * *', 'select polar_alter_subpartition_to_oss_interval(''traj'', ''3 days''::interval);', 'db01');
     schedule_in_database 
    ----------------------
                        2
    
    -- Run the task on the fourth day of every month.
    postgres=> SELECT cron.schedule_in_database('task3', '* * 4 * *', 'select polar_alter_subpartition_to_oss_interval(''traj'', ''3 days''::interval);', 'db01');
     schedule_in_database 
    ----------------------
                        3
    Note

    For more information about how to use pg_cron, see pg_cron.

  4. View the execution result of the task.

    -- The subpartitioned table is dumped to OSS.
    db01=> \d+ traj_202301
                                                                     Table "public.traj_202301"
     Column  |              Type              | Collation | Nullable |               Default               | Storage | Compression | Stats target | Description 
    ---------+--------------------------------+-----------+----------+-------------------------------------+---------+-------------+--------------+-------------
     tr_id   | integer                        |           | not null | nextval('traj_tr_id_seq'::regclass) | plain   |             |              | 
     tr_lon  | double precision               |           |          |                                     | plain   |             |              | 
     tr_lat  | double precision               |           |          |                                     | plain   |             |              | 
     tr_time | timestamp(6) without time zone |           |          |                                     | plain   |             |              | 
    Partition of: traj FOR VALUES FROM ('2023-01-01 00:00:00') TO ('2023-02-01 00:00:00')
    Partition constraint: ((tr_time IS NOT NULL) AND (tr_time >= '2023-01-01 00:00:00'::timestamp(6) without time zone) AND (tr_time < '2023-02-01 00:00:00'::timestamp(6) without time zone))
    Replica Identity: FULL
    Tablespace: "oss"
    Access method: heap
    Note

    Tablespace: "oss" indicates that the data is dumped to cold storage.

  5. View the execution history of scheduled tasks.

    postgres=> select * from cron.job_run_details ;
     jobid | runid | job_pid | database | username |                      command                       |  status   | return_message |          start_time           |           end_time            
    -------+-------+---------+----------+----------+----------------------------------------------------+-----------+----------------+-------------------------------+-------------------------------
         1 |     1 |  469075 | db01     | user1    | select polar_alter_subpartition_to_oss_interval('traj', '3 days'::interval); | succeeded | 1 row          | 2024-03-10 03:12:00.016068+00 | 2024-03-10 03:12:00.135428+00
         1 |     2 |  469910 | db01     | user1    | select polar_alter_subpartition_to_oss_interval('traj', '3 days'::interval); | succeeded | 1 row          | 2024-03-10 03:13:00.008358+00 | 2024-03-10 03:13:00.014189+00
         1 |     3 |  470746 | db01     | user1    | select polar_alter_subpartition_to_oss_interval('traj', '3 days'::interval); | succeeded | 1 row          | 2024-03-10 03:14:00.013165+00 | 2024-03-10 03:14:00.019002+00
         1 |     4 |  471593 | db01     | user1    | select polar_alter_subpartition_to_oss_interval('traj', '3 days'::interval); | succeeded | 1 row          | 2024-03-10 03:15:00.006494+00 | 2024-03-10 03:15:00.012056+00
    (4 rows)

This way, the expired table partitions are automatically archived to cold storage based on the configured rules. The archived table data is stored in OSS, which no longer occupies disk space and greatly reduces storage costs. In addition, all SQL statements that specify create, read, update, and delete (CRUD) operations are transparent.