AnalyticDB for MySQL V3.1.3.3 or later in Elastic mode for Cluster Edition can separate hot and cold data stored in tables or partitions.

Prerequisites

An AnalyticDB for MySQL cluster must meet the following requirements:
  • The edition of the cluster is Elastic mode for Cluster Edition.
  • The minor engine version of the cluster is 3.1.3.3 or later.
    Note
    • For information about how to view the engine version of an AnalyticDB for MySQL cluster, see How can I view the version of an AnalyticDB for MySQL cluster?
    • To update the engine version of a cluster, Submit a ticket with an available time range for the update. An engine version update may take about 20 minutes to complete. During the update, your cluster can still run normally but may be interrupted several times. We recommend that you update the engine version of your cluster during off-peak hours and make sure that your application can be automatically reconnected to your database system.

Storage policies

AnalyticDB for MySQL provides three policies for hot and cold data storage, which are cold storage, hot storage, and mixed storage.
  • Cold storage is a cost-effective storage policy. When this storage policy is used, all data is stored in HDDs.
  • Hot storage is a storage policy that can meet high access performance requirements. When this storage policy is used, all data is stored in SSDs.
  • Mixed storage is a storage policy that allows specified partitions to be stored in SSDs and other partitions to be stored in HDDs.
Definitions of hot and cold data

When you execute the CREATE TABLE statement to create a table, you can use the storage_policy parameter to specify a storage policy for hot and cold data of the table.

Principles of mixed storage

When the mixed storage policy is used, you must specify the number of hot partitions by using the hot_partition_count parameter. For more information about hot_partition_count, see CREATE TABLE.

Partitions are sorted by partition key value in descending order. If the hot_partition_count parameter is set to N, the first N partitions stored on SSDs are hot partitions and the others stored on HDDs are cold partitions.

For example, assume that the number of hot partitions is set to 4, and all partitions are sorted by partition key value in the following way: 20201110, 20201109, 20201108, 20201107, 20201106, 20201105, and 20201104. In this case, the first four partitions are specified as hot partitions and the others are cold partitions.

The distribution of hot and cold partitions may be changed in the following scenarios:

Impact of data changes on the distribution of hot and cold partitions

When a new partition is inserted, all partitions are sorted again so that only N hot partitions exist.

In the following example, a new partition 20201110 is inserted into a table. This partition has the largest partition key value among all partitions of the table. As a result, the data of partition 20201105 that has the smallest partition key value is migrated from hot partitions to cold partitions and the data of partition 20201110 is migrated to hot partitions. Data migration from hot partitions to cold partitions

Impact of hot partition quantity changes on the distribution of hot and cold partitions

Assume that the number of hot partitions is N and you want to change the number of hot partitions to M.
  • If M is greater than N, the data of M-N partitions is migrated from cold partitions to hot partitions.
    In the following example, the number of hot partitions is changed from five to six. As a result, the data of partition 20201104 that has the largest partition key value is migrated from cold partitions to hot partitions. Data migration from cold partitions to hot partitions
  • If M is less than N, the data of N-M partitions is migrated from hot partitions to cold partitions.

    For example, the number of hot partitions is changed from five to four. As a result, the data of the partition that has the smallest partition key value is migrated from hot partitions to cold partitions.

Query the distribution of hot and cold data storage

You can use the table_usage table to query the distribution of hot and cold data storage. Examples:
  • Query the distribution of hot and cold data storage in all tables.
    select * from information_schema.table_usage
  • Query the distribution of hot and cold data storage in a specific table.
    select * from information_schema.table_usage where table_schema='<schema_name>' and table_name='<table_name>'
The following table describes the fields involved in the table_usage table.
Field Description
table_schema The database name
table_name The table name.
storage_policy The storage policy. Valid values:
  • HOT
  • COLD
  • MIXED
hot_partition_count The number of hot partitions.
cold_partition_count The number of cold partitions.
rt_total_size The total volume of real-time data, which is the sum of rt_data_size and rt_index_size. Unit: bytes.
rt_data_size The volume of real-time data. Unit: bytes.
rt_index_size The volume of primary key and index data in real-time data. Unit: bytes.
hot_total_size The total volume of data in hot partitions, which is the sum of hot_data_size and hot_index_size. Unit: bytes.
hot_data_size The volume of data in hot partitions. Unit: bytes.
hot_index_size The volume of primary key and index data in hot partitions. Unit: bytes.
cold_total_size The total volume of data in cold partitions, which is the sum of cold_data_size and cold_index_size. Unit: bytes.
cold_data_size The volume of data in cold partitions. Unit: bytes.
cold_index_size The volume of primary key and index data in cold partitions. Unit: bytes.
Note:
  • The table_usage table is updated in real time. Values of the rt_total_size, rt_data_size, rt_index_size, hot_total_size, hot_data_size, hot_index_size, cold_total_size, cold_data_size, and cold_index_size fields vary based on the execution of INSERT, UPDATE, DELETE, and BUILD statements.
  • If the values of hot_total_size and cold_total_size are both 0 after data is loaded, data is still stored in the real-time engine and the value of rt_total_size indicates the volume of real-time data. You can execute the following BUILD statement to convert real-time data to partitioned data and then query the values of hot_total_size and cold_total_size:
    build table <table_name>
  • A user-defined hot_partition_count field indicates the number of hot partitions in a single shard after list partitioning, whereas the hot_partition_count field queried from the table_usage table indicates the number of hot partitions after shards are unioned. If data partitions are differently distributed across shards, the value of the hot_partition_count field queried from the table_usage table may be greater than that of the user-defined hot_partition_count field.
    For example, assume that Table A contains Shard 1 and Shard 2, and hot_partition_count is set to 2. The following figure shows the distribution of data partitions in Table A. shard

    Shard 1: P4 and P5 are hot partitions. P1, P2, and P3 are cold partitions.

    Shard 2: P3 and P4 are hot partitions. P1 and P2 are cold partitions.

    The actual number of hot partitions is calculated by using the following formula: (P4, P5) Union (P3, P4) = (P3, P4, P5). Therefore, the actual value of hot_partition_count is 3.

Query the change progress of the storage policy

You can change the storage policy in an ALTER TABLE statement. For more information, see ALTER TABLE. The storage_policy_modify_progress table can be used to query the change progress of the storage policy.
  • Query the change progress of the storage policy for all tables involved in the current cluster.
    select * from information_schema.storage_policy_modify_progress
  • Query the change progress of the storage policy for a specific table.
    select * from information_schema.storage_policy_modify_progress where table_schema='<schema_name>' and table_name='<table_name>'
The following table describes the fields involved in the storage_policy_modify_progress table.
Field Description
table_schema The database name
table_name The table name.
task_id The ID of the storage policy change task.
source_storage_policy The original storage policy. Valid values:
  • HOT
  • COLD
  • MIXED
source_hot_partition_count The number of original hot partitions.
dest_storage_policy The new storage policy. Valid values:
  • HOT
  • COLD
  • MIXED
dest_hot_partition_count The number of new hot partitions.
hot_to_cold_partition_count The number of partitions that are changed from hot storage to cold storage.
cold_to_hot_partition_count The number of partitions that are changed from cold storage to hot storage.
hot_to_cold_data_size The volume of data that is changed from hot storage to cold storage. Unit: bytes.
cold_to_hot_data_size The volume of data that is changed from cold storage to hot storage. Unit: bytes.
hot_data_size_before_change The volume of hot data before the storage policy is changed. Unit: bytes.
cold_data_size_before_change The volume of cold data before the storage policy is changed. Unit: bytes.
hot_data_size_after_change The volume of hot data after the storage policy is changed. Unit: bytes.
cold_data_size_after_change The volume of cold data after the storage policy is changed. Unit: bytes.
start_time The start of the time range within which the storage policy is changed.
update_time The end of the time range within which the storage policy is changed.
progress The change progress of the storage policy. Unit: %.
status The change state of the storage policy. Valid values:
  • INIT: No change starts.
  • RUNNING: The change is in progress.
  • FINISH: The change is complete.