All Products
Search
Document Center

Hologres:Tiered storage of hot data and cold data

Last Updated:Nov 16, 2023

Hologres V1.3.37 and later support tiered storage of hot data and cold data to help customers manage data in a scientific manner. This topic describes how to use tiered storage of hot data and cold data in Hologres.

Feature description

The data storage modes of Hologres include standard storage and Infrequent Access (IA) storage.

  • Standard storage is called solid state drive (SSD) hot storage. This is the default storage mode that Hologres uses, and this storage mode meets the requirements for low-latency and high-performance data access. Standard storage is the most efficient and cost-effective choice in most scenarios.

  • IA storage is called hard disk drive (HDD) cold storage. This storage mode meets the requirements for cost-effective storage of IA data. IA storage is the best choice for ultra-large datasets that are insensitive to latency or that are infrequently accessed.

  • If you store data in primary and secondary instances, we recommend that you upgrade your Hologres instances to V1.3.55 or later and use HDD cold storage.

Hologres also supports tiered storage of hot data and cold data in a table. You can use the dynamic partitioning feature to configure the storage mode for child tables of a table. For more information about the dynamic partitioning feature, see Dynamic partitioning.

Hologres V2.1 and later support SSD cache-based acceleration to accelerate access to data in cold storage. Currently, access to data only in the default cache space can be accelerated. Cache acceleration is enabled by default, and the access performance is expected to be improved by more than twice.

Prerequisites

  • Only Hologres V1.3.37 and later support tiered storage of hot data and cold data. If the version of your Hologres instance is earlier than V1.3.37, manually upgrade your Hologres instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see Instance upgrades. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.

  • For an instance that uses the subscription billing method, the default quota for IA storage of the instance is 0 GB after you upgrade the instance to V1.3.37.

Usage notes

  • A Hologres table consists of user data and metadata. In tiered storage of hot data and cold data, you can set the storage mode only for user data. Metadata must be stored in the hot storage tier to ensure performance. Even if the storage mode is set to cold storage for the table, metadata is stored in the hot storage tier. This incurs a small amount of hot storage fees.

  • A low priority is configured for hot data and cold data migration tasks to minimize the impact on user services. After you change the storage mode for user data, the configuration does not immediately take effect. Hologres migrates data at the backend in asynchronous mode. The time required to migrate the data increases with the data volume of the table.

  • When you write new data to the cold storage tier of the table, Hologres first writes the data to the hot storage tier, and then migrates the data to the cold storage tier by using an asynchronous task at the backend. This incurs a certain amount of hot storage fees.

  • Due to the limit of the disk seek time, we recommend that you do not set the storage mode to cold storage if point queries are performed on the tables. For example, we recommend that you do not apply cold storage to Hologres dimension tables in Realtime Compute for Apache Flink and to serving tables. The number of table rows that Hologres reads from the cold storage tier per second is 100 times lower than that Hologres reads from the hot storage tier per second.

  • We recommend that you do not set the storage mode to cold storage for row-oriented tables.

Configure tiered storage of hot data and cold data for a table

When you create a table, you can configure the storage_mode parameter in the SET_TABLE_PROPERTY statement to specify a storage mode for the table. For more information, see Overview.

Non-partitioned table

Specify a storage mode for a non-partitioned table

You can configure the storage_mode parameter in the SET_TABLE_PROPERTY statement to specify a storage mode for a non-partitioned table. Valid values of this parameter:

  • hot

  • cold

For example, you can create a table named tbl1 and set the storage mode of the table to cold storage. Sample statements:

-- Specify the storage mode as cold storage when you create the table.
BEGIN;
CREATE TABLE tbl1 (
 "id" int NOT NULL,
 "name" text NOT NULL
);
CALL set_table_property('tbl1', 'storage_mode', 'cold');
COMMIT;

Change the storage mode of the non-partitioned table

Hologres allows you to change the storage mode after you create the table. For example, you can change the storage mode of the tbl1 table to hot storage. The system asynchronously migrates data to the hot storage tier. Sample statements:

-- Change the storage mode to hot storage after you create the table.
CALL set_table_property('tbl1', 'storage_mode', 'hot');

Partitioned table

Specify a storage mode for a partitioned table

You can specify the SET_TABLE_PROPERTY parameter to configure the storage mode for a partitioned table. Child tables inherit the storage mode of their parent tables by default. You can also specify storage modes for the child tables separately. Valid values:

  • hot

  • cold

For example, you can set the storage mode of the parent table named tbl2 to cold storage. When you write data to the child table named tbl2_v1 of the parent table, the data is written to the cold storage tier. Sample statements:

-- Specify the storage mode as cold storage when you create the table.
BEGIN;
CREATE TABLE tbl2(
  c1 text not null,
  c2 text
)
PARTITION BY LIST(c2);
CALL set_table_property('tbl2', 'storage_mode', 'cold');
create table tbl2_v1 partition of tbl2 for values in('v1');
COMMIT;

Change the storage mode of the partitioned table

Hologres allows you to change the storage mode after you create the table. For example, you can change the storage mode of the tbl2 parent table to hot storage. The system asynchronously migrates data in all child tables of this parent table to the hot storage tier. Sample statements:

-- Change the storage mode to hot storage after you create the table.
CALL set_table_property('tbl2', 'storage_mode', 'hot');

For example, you can change the storage mode of a child table to cold storage by setting the storage_mode parameter to cold for the child table. Sample statements:

-- Create child tables named tbl2_v2 and tbl2_v3 for the tbl2 parent table.

create table tbl2_v2 partition of tbl2 for values in('v2');
create table tbl2_v3 partition of tbl2 for values in('v3');

-- Query the storage mode of the child tables. The child tables inherit the storage mode of their parent table by default.

SELECT * from hg_table_storage_status('public', 'tbl2');

-- Change the storage mode of the tbl2_v3 child table to cold storage.

CALL set_table_property('tbl2_v3', 'storage_mode', 'cold');
                

Dynamically manage partitioned tables

Hologres uses tiered storage of hot data and cold data to manage data in a partitioned table. This achieves a better balance between cost-effectiveness and performance. In most cases, Hologres retains the N partitions that are created most recently in the hot storage tier for frequent access, retains the M partitions that were generated before the N partitions in the cold storage tier to save costs, and deletes earlier partitions. The dynamic management feature in the preceding scenario can be implemented by using a combination of dynamic partitioning and tiered storage of hot data and cold data. For more information about the dynamic partitioning feature, see Dynamic partitioning.

Create a partitioned table and configure a storage mode for the table

Create a partitioned table. The default storage mode of the table is hot storage. Create a partition every day. Configure a storage rule to retain the most recent seven partitions in the hot storage tier, retain the other 23 partitions that were generated in the last 30 days in the cold storage tier, and delete earlier partitions. The following sample code shows the configuration of the partitioned table:

BEGIN;
CREATE TABLE tbl2(
  c1 text not null,
  c2 text
)
PARTITION BY LIST(c2);
CALL set_table_property('tbl2', 'auto_partitioning.enable', 'true');
CALL set_table_property('tbl2', 'auto_partitioning.time_unit', 'DAY');
CALL set_table_property('tbl2', 'auto_partitioning.num_precreate', '3');
CALL set_table_property('tbl2', 'auto_partitioning.num_hot', '7');
CALL set_table_property('tbl2', 'auto_partitioning.num_retention', '30');
COMMIT;

The following figure shows the configuration result.效果

Change the storage mode of the partitioned table

To change the number of partitions in the hot storage tier, change the value of the auto_partitioning.num_hot parameter. After the change is complete, child tables that are already stored in the cold storage tier are not immediately migrated to the hot storage tier. The following sample code shows the DDL statements if the current date is July 1, 2022:

BEGIN;
CREATE TABLE tbl_p(
  c1 text not null,
  c2 text
)
PARTITION BY LIST(c2);
CALL set_table_property('tbl_p', 'auto_partitioning.enable', 'true');
CALL set_table_property('tbl_p', 'auto_partitioning.time_unit', 'DAY');
CALL set_table_property('tbl_p', 'auto_partitioning.num_precreate', '3');
CALL set_table_property('tbl_p', 'auto_partitioning.num_hot', '3');
CALL set_table_property('tbl_p', 'auto_partitioning.num_retention', '10');
COMMIT;

Partition storage may be changed in the following scenarios:

  • Scenario 1: Increase the number of partitions in the hot storage tier

    To increase the number of partitions in the hot storage tier to 4, execute the following statement:

    CALL set_table_property('tbl_p', 'auto_partitioning.num_hot', '4');

    Child tables that are already stored in the cold storage tier are not immediately migrated to the hot storage tier. The following figure shows the configuration result.示例

  • Scenario 2: Decrease the number of partitions in the hot storage tier

    To decrease the number of partitions in the hot storage tier to 2, execute the following statement:

    CALL set_table_property('tbl_p', 'auto_partitioning.num_hot', '2');

    The data that needs to be migrated from the hot storage tier to the cold storage tier is immediately migrated. The following figure shows the configuration result.示例

Retain the storage mode of specific partitions

To retain the storage mode of specific partitions, set the keep_storage parameter to true for the partitions. In most cases, you can retain a partition in the hot storage tier or migrate a partition that is stored in the cold storage tier to the hot storage tier. For example, you can execute the following statements to retain the data generated on November 11, 2021 in a partition in the hot storage tier to ensure query performance:

call set_table_property('tbl1_20211111', 'keep_storage', 'true');
call set_table_property('tbl1_20211111', 'storage_mode', 'hot');

To restore dynamic management of the partition, execute the following statement:

call set_table_property('tbl1_20211111', 'keep_storage', 'false');

To query the partitions that are retained in the database, execute the following statement:

SELECT
  table_namespace as schema_name
  ,table_name
FROM hologres.hg_table_properties
WHERE property_key='keep_storage'
AND property_value='true';
Note
  • The keep_storage parameter must be configured prior to the storage_mode parameter.

  • If the auto_partitioning.num_retention parameter is configured for the parent table, the keep_storage parameter for a child table does not prevent the corresponding partition from being deleted. If you want to retain the partition, set the keep_alive parameter to true for the child table. For example, if you want to retain the data generated on November 11, 2021 in a partition in the hot storage tier, execute the following statements:

    BEGIN;
    CALL set_table_property('tbl1_20211111', 'keep_alive', 'true');
    CALL set_table_property('tbl1_20211111', 'keep_storage', 'true');
    CALL set_table_property('tbl1_20211111', 'storage_mode', 'hot');
    COMMIT;

Convert a partitioned table that uses the cold storage mode into a partitioned table that has dynamic partitions

In Hologres, you can convert a partitioned table that uses the cold storage mode into a partitioned table that contains dynamic partitions. You can also retain the data generated in the previous seven days in partitions in the hot storage tier.

Initial configuration

-- Create a partitioned table and set the storage mode of the table to cold storage.
BEGIN;
CREATE TABLE tbl2(	
  c1 text not null, 
  c2 text
)
PARTITION BY LIST(c2);
CALL set_table_property('tbl2', 'storage_mode', 'cold');
create table tbl2_20230808 partition of tbl2 for values in('20230808');
create table tbl2_20230809 partition of tbl2 for values in('20230809');
create table tbl2_20230810 partition of tbl2 for values in('20230810');
... ...
create table tbl2_20230817 partition of tbl2 for values in('20230817');
COMMIT;

Convert the table partitions into dynamic partitions and retain the data generated in the previous seven days in partitions in the hot storage tier

begin;
CALL set_table_property('tbl2', 'storage_mode', 'hot'); -- Set the storage mode of the parent table to hot storage.
CALL set_table_property('tbl2_20230810', 'storage_mode', 'cold'); -- Set the storage mode of partitions that do not require storage mode conversion to cold storage.
CALL set_table_property('tbl2_20230809', 'storage_mode', 'cold');
CALL set_table_property('tbl2_20230808', 'storage_mode', 'cold');
CALL set_table_property('tbl2', 'auto_partitioning.enable', 'true');
CALL set_table_property('tbl2', 'auto_partitioning.time_unit', 'DAY');
CALL set_table_property('tbl2', 'auto_partitioning.num_precreate', '3');
CALL set_table_property('tbl2', 'auto_partitioning.num_hot', '7');
CALL set_table_property('tbl2', 'auto_partitioning.num_retention', '10');
commit;

Query the storage status of data in a table

You can call the hg_table_storage_status function to query the storage status of data in a table. The returned result of the hg_table_storage_status function includes the storage size of user data, not the storage size of metadata. Sample statements:

SELECT * from hg_table_storage_status('<schema_name>', '<table_name>');

Parameter

Description

schema_name

The name of the schema to which the table belongs.

table_name

The name of the table.

The following table describes columns in the result returned by the function.

Column

Description

table_name

  • For a non-partitioned table, the name of the table is returned.

  • For a parent table, the names of its child tables are returned in separate lines. Each line represents a child table.

  • For a child table, the name of the child table is returned.

hot_size

The size of data in the hot storage tier. Unit: bytes.

cold_size

The size of data in the cold storage tier. Unit: bytes.

status

The storage status of data. Valid values:

  • hot: The data is stored in the hot storage tier.

  • cold: The data is stored in the cold storage tier.

  • transferring: The data is being migrated.

Sample statements:

-- Query the storage status of data in a non-partitioned table.
SELECT * from hg_table_storage_status('public', 'tbl1');-- The returned data size is in bytes.
 table_name | hot_size |   cold_size   | status
------------+----------+---------------+--------
 tbl1       |  145643  |      3685     | transferring

-- Query the storage status of data in a partitioned table.
SELECT * from hg_table_storage_status('public', 'tbl2');-- The returned data size is in bytes.
   table_name    | hot_size | cold_size | status
-----------------+----------+-----------+--------
 tbl2_2022062222 |        0 |         0 | hot
 tbl2_2022062221 |     1125 |         0 | hot
 tbl2_2022062220 |     1245 |         0 | hot
 tbl2_2022062219 |     1358 |         0 | hot
 tbl2_2022062218 |        0 |      1875 | cold
 tbl2_2022062217 |        0 |      1143 | cold
 tbl2_2022062216 |        0 |      1299 | cold

Query the access frequency of table data

Hologres V1.3.37 and later collect statistics on tables in your Hologres instance on a daily basis and store the statistics in the system table named hologres.hg_table_info. This way, you can query and analyze the statistics on your tables and take optimization measures based on the results of statistical analysis. For more information, see Query and analyze table statistics. You can query this system table to obtain information such as the volumes of hot data and cold data stored in the tables, the access frequency of table data, and the access frequency of partition data. Then, you can determine whether to migrate hot data and cold data based on the obtained information. The following sample code provides examples on how to query statistics on tables in an instance by using this system table.

Non-partitioned table

select a.table_name,
       (a.total_read_count - b.total_read_count) as read_count,
       (a.total_write_count - b.total_write_count) as write_count,
       a.hot_storage_size
from (Select * from hologres.hg_table_info
      where type='TABLE' and collect_time::DATE = CURRENT_DATE - interval '1 day') a
join
    (Select * from hologres.hg_table_info
    where type='TABLE' and collect_time::DATE = CURRENT_DATE - interval '${days} day') b
on a.table_name = b.table_name
order by hot_storage_size DESC;

Partitioned table

Query the access frequency of all partitions:

select parent_table_name,count(*) as partition_cnt,
sum(hot_storage_size)/1024/1024/1024 as hot_size_gb
from hologres.hg_table_info
where type = 'PARTITION' and collect_time::DATE = CURRENT_DATE - interval '1 day'
group by parent_table_name
order by hot_size_gb desc;

Query the access frequency of the specified partitions:

select a.table_name,
(a.total_read_count - b.total_read_count) as read_count,
(a.total_write_count - b.total_write_count) as write_count,
a.hot_storage_size
from (select *
      from hologres.hg_table_info
      where type = 'PARTITION'
      and parent_table_name = '${p_table_name}'
      and collect_time::DATE = CURRENT_DATE - interval '1 day') a
join
(select *
     from hologres.hg_table_info
     where type = 'PARTITION'
     and parent_table_name = '${p_table_name}'
     and collect_time::DATE = CURRENT_DATE - interval '${days} day') b
on a.table_name = b.table_name
order by table_name desc;