AnalyticDB for MySQL Edition V3.1.3.3 or later in elastic clusters can separate hot and cold data in tables or partitions.

Definitions of hot and cold data

Cold data is infrequently accessed. Cold data is stored in low-cost HDDs to meet storage requirements.

Hot data is frequently accessed. Hot data is stored in SSDs to meet requirements of high-performance access.

When you execute a CREATE TABLE statement, you can set the storage policy of a table to hot storage, cold storage, or mixed storage. When the hot storage policy is used, the data of all partitions is stored in SSDs. When the cold storage policy is used, the data of all partitions is stored in HDDs. When the mixed storage policy is used, the data of specified partitions is stored in SSDs and the data of the remaining partitions is stored in HDDs. For more information, see CREATE TABLE.

Definitions of hot and cold data

Migration of hot and cold data

Data is originally stored in a real-time engine. After data is built, a storage policy is applied to separately store hot data in SSDs and cold data in HDDs.

Partitions are sorted by size in descending order. The largest N partitions use the hot storage policy and are stored in SSDs. The remaining partitions use the cold storage policy and are stored in HDDs. You can set the value of N when you specify a storage policy. For more information, see CREATE TABLE.

  • When data is added, modified, or deleted, the distribution of hot and cold partitions is adjusted.
  • When the storage policy is changed, the distribution of hot and cold partitions is adjusted. You can change the storage policy of a table in an ALTER TABLE statement. For more information, see ALTER TABLE.
Migrate data from cold partitions to hot partitions
The number of current hot partitions is N. If you want to change the number of hot partitions to an integer M that is greater than N, the data of M-N partitions is migrated from cold partitions to hot partitions.

In the following example, the number of hot partitions is changed from five to six. As a result, the data of the largest partition 20201104 is migrated from cold partitions to hot partitions.

Migrate data from cold partitions to hot partitions
Migrate data from hot partitions to cold partitions
The number of current hot partitions is N. When data is inserted into new partitions, all partitions are sorted so that only N hot partitions exist. If you want to change the number of hot partitions to an integer M that is less than N, the data of N-M partitions is migrated from hot partitions to cold partitions.

In the following example, data is inserted into a new partition 20201110, and this partition is the largest one among all partitions. As a result, the data of the smallest partition 20201105 is migrated from hot partitions to cold partitions and the data of the largest partition 20201110 is migrated to hot partitions.

Migrate data from hot partitions to cold partitions

Diagnostics tables for hot and cold data storage

AnalyticDB for MySQL Edition V3.1.3.5 or later in elastic mode for Cluster Edition can separate hot and cold data. You can query the distribution of hot and cold data storage in a table by querying diagnostics tables. If you change the storage policy in an ALTER TABLE statement, you can also query the change progress of the storage policy by querying diagnostics tables.
Query the distribution of hot and cold data storage
The table_usage table can be used to query the distribution of hot and cold data storage.
  • Execute the following statement to query the distribution of hot and cold data storage for all tables:
    select * from information_schema.table_usage
  • Execute the following statement to query the distribution of hot and cold data storage for a single table:
    select * from information_schema.table_usage where table_schema='$schema_name' and table_name='$table_name'
The following table describes the fields involved in the table_usage table.
Field Description
table_schema The database name.
table_name The table name.
storage_policy The current storage policy. Valid values:
  • HOT
  • COLD
  • MIXED
hot_partition_count The number of current hot partitions.
cold_partition_count The number of current cold partitions.
rt_total_size The total volume of real-time data, which is the sum of rt_data_size and rt_index_size. Unit: byte.
rt_data_size The volume of real-time data. Unit: byte.
rt_index_size The volume of primary key data in real-time data. Unit: byte.
hot_total_size The total volume of data in hot partitions, which is the sum of hot_data_size and hot_index_size. Unit: byte.
hot_data_size The volume of data in hot partitions. Unit: byte.
hot_index_size The volume of primary key and index data in hot partitions. Unit: byte.
cold_total_size The total volume of data in cold partitions, which is the sum of cold_data_size and cold_index_size. Unit: byte.
cold_data_size The volume of data in cold partitions. Unit: byte.
cold_index_size The volume of primary key and index data in cold partitions. Unit: byte.
Note:
  • If the values of hot_total_size and cold_total_size are both 0 after data is loaded, data is still stored in the real-time engine, and the value of rt_total_size is the actual data volume. You can execute the following BUILD statement to convert real-time data to partitioned data and then query the values of hot_total_size and cold_total_size:

    build table $table_name
  • A user-defined hot_partition_count field indicates the number of hot partitions in a single shard after list partitioning, whereas the hot_partition_count field involved in the table_usage table indicates the number of hot partitions after the shards are unioned. If data partitions are differently distributed across shards, the value of the involved hot_partition_count field may be greater than that of the user-defined hot_partition_count field.
    For example, Table A contains Shard 1 and Shard 2, and the value of hot_partition_count is set to 2. The following figure shows the distribution of data partitions in Table A. shard

    Shard 1: P4 and P5 are hot partitions. P1, P2, and P3 are cold partitions.

    Shard 2: P3 and P4 are hot partitions. P1 and P2 are cold partitions.

    The actual number of hot partitions is calculated by using the following formula: (P4, P5) Union (P3, P4) = (P3, P4, P5). Therefore, the actual value of hot_partition_count is 3.

  • The table_usage table is updated in real time. Values of the rt_total_size, rt_data_size, rt_index_size, hot_total_size, hot_data_size, hot_index_size, cold_total_size, cold_data_size, and cold_index_size fields vary based on the execution of INSERT, UPDATE, DELETE, and BUILD statements.
Query the change progress of the storage policy
You can change the storage policy in an ALTER TABLE statement. For more information, see ALTER TABLE. The storage_policy_modify_progress table can be used to query the change progress of the storage policy.
  • Execute the following statement to query the change progress of the storage policy for all tables involved in the current cluster:
    select * from information_schema.storage_policy_modify_progress
  • Execute the following statement to query the change progress of the storage policy for a single table:
    select * from information_schema.storage_policy_modify_progress where table_schema='$schema_name' and table_name='$table_name'
The following table describes the fields involved in the storage_policy_modify_progress table.
Field Description
table_schema The database name.
table_name The table name.
task_id The ID of the storage policy change task.
source_storage_policy The original storage policy.
  • HOT
  • COLD
  • MIXED
source_hot_partition_count The original number of hot partitions.
dest_storage_policy The new storage policy.
  • HOT
  • COLD
  • MIXED
dest_hot_partition_count The new number of hot partitions.
hot_to_cold_partition_count The number of partitions that are changed from hot storage to cold storage.
cold_to_hot_partition_count The number of partitions that are changed from cold storage to hot storage.
hot_to_cold_data_size The volume of data that is changed from hot storage to cold storage. Unit: byte.
cold_to_hot_data_size The volume of data that is changed from cold storage to hot storage. Unit: byte.
hot_data_size_before_change The volume of hot data before the storage policy change. Unit: byte.
cold_data_size_before_change The volume of cold data before the storage policy change. Unit: byte.
hot_data_size_after_change The volume of hot data after the storage policy change. Unit: byte.
cold_data_size_after_change The volume of cold data after the storage policy change. Unit: byte.
start_time The start of the time range for the storage policy change.
update_time The end of the time range for the storage policy change.
progress The change progress of the storage policy. Unit: %.
status The state of the storage policy change. Valid values:
  • INIT: The storage policy change does not start.
  • RUNNING: The storage policy change is in progress.
  • FINISH: The storage policy change is complete.