All Products
Search
Document Center

AnalyticDB for MySQL:Separation of hot and cold data storage

Last Updated:Jan 23, 2024

AnalyticDB for MySQL V3.1.3.3 or later in elastic mode for Cluster Edition can separate hot and cold data stored in tables or partitions.

Prerequisites

An AnalyticDB for MySQL cluster must meet the following requirements:

  • The cluster is in elastic mode for Cluster Edition.

  • The minor engine version of the cluster is 3.1.3.3 or later.

    Note
    • For information about how to view the version of an AnalyticDB for MySQL cluster, see How can I view the version of an AnalyticDB for MySQL cluster?

    • To update the version of an AnalyticDB for MySQL cluster, submit a ticket and provide a time frame for the update. An engine version update may take about 20 minutes to complete. During the update, your cluster can still run normally but may be interrupted several times. We recommend that you update the engine version of your cluster during off-peak hours and make sure that your application can be automatically reconnected to your database system.

Billing rules

When you use a cluster, you are charged for the storage of hot and cold data based on the pay-as-you-go billing method. For more information, see Billable items of Data Warehouse Edition (V3.0) and Pricing for Data Warehouse Edition (V3.0).

Storage policies

AnalyticDB for MySQL provides three policies for hot and cold data storage, which are cold storage, hot storage, and mixed storage.

  • Cold storage is a cost-effective storage policy. When this storage policy is used, all data is stored in Object Storage Service (OSS).

  • Hot storage is a storage policy that can meet high access performance requirements. When this storage policy is used, all data is stored in SSDs.

  • Mixed storage is a storage policy that allows specified partitions to be stored in SSDs and other partitions to be stored in OSS.

image.png

Specify a storage policy for hot and cold data

When you execute the CREATE TABLE statement to create a table, you can use the storage_policy parameter to specify a storage policy for hot and cold data of the table.

To modify the storage policy for hot and cold data of an existing table, execute the ALTER TABLE table_name storage_policy; statement. For more information, see the "Change the hot and cold data storage policy of a table" section of the ALTER TABLE topic.

Principles of mixed storage

When the mixed storage policy is used, you must specify the number of hot partitions by using the hot_partition_count parameter. For more information about hot_partition_count, see CREATE TABLE.

Partitions are sorted by partition key value in descending order. If the hot_partition_count parameter is set to N, the first N partitions stored on SSDs are hot partitions and the others stored on OSS are cold partitions.

For example, assume that the number of hot partitions is set to 4, and all partitions are sorted by partition key value in the following way: 20201110, 20201109, 20201108, 20201107, 20201106, 20201105, and 20201104. In this case, the first four partitions are specified as hot partitions and the others are cold partitions.

The distribution of hot and cold partitions may be changed in the following scenarios:

Impact of data changes on the distribution of hot and cold partitions

When a new partition is inserted, all partitions are sorted again so that only N hot partitions exist.

In the following example, a new partition 20201110 is inserted into a table. This partition has the largest partition key value among all partitions of the table. As a result, the data of partition 20201105 that has the smallest partition key value is migrated from hot partitions to cold partitions and the data of partition 20201110 is migrated to hot partitions.热冷迁移

Impact of hot partition quantity changes on the distribution of hot and cold partitions

Assume that the number of hot partitions is N and you want to change the number of hot partitions to M.

  • If M is greater than N, the data of M-N partitions is migrated from cold partitions to hot partitions.

    In the following example, the number of hot partitions is changed from five to six. As a result, the data of partition 20201104 that has the largest partition key value is migrated from cold partitions to hot partitions.冷热迁移

  • If M is less than N, the data of N-M partitions is migrated from hot partitions to cold partitions.

    For example, the number of hot partitions is changed from five to four. As a result, the data of the partition that has the smallest partition key value is migrated from hot partitions to cold partitions.

Query the distribution of hot and cold data storage

You can use the table_usage table to query the distribution of hot and cold data storage. Examples:

  • Query the distribution of hot and cold data storage in all tables.

    select * from information_schema.table_usage;
  • Query the distribution of hot and cold data storage in a specific table.

    select * from information_schema.table_usage where table_schema='<schema_name>' and table_name='<table_name>';

The following table describes the fields involved in the table_usage table.

Field

Description

table_schema

The database name.

table_name

The table name.

storage_policy

The storage policy. Valid values:

  • HOT

  • COLD

  • MIXED

hot_partition_count

The number of hot partitions.

cold_partition_count

The number of cold partitions.

rt_total_size

The total volume of real-time data, which is the sum of rt_data_size and rt_index_size. Unit: bytes.

rt_data_size

The volume of real-time data. Unit: bytes.

rt_index_size

The volume of primary key and index data in real-time data. Unit: bytes.

hot_total_size

The total volume of data in hot partitions, which is the sum of hot_data_size and hot_index_size. Unit: bytes.

hot_data_size

The volume of data in hot partitions. Unit: bytes.

hot_index_size

The volume of primary key and index data in hot partitions. Unit: bytes.

cold_total_size

The total volume of data in cold partitions, which is the sum of cold_data_size and cold_index_size. Unit: bytes.

cold_data_size

The volume of data in cold partitions. Unit: bytes.

cold_index_size

The volume of primary key and index data in cold partitions. Unit: bytes.

Note:

  • The table_usage table is updated in real time. Values of the rt_total_size, rt_data_size, rt_index_size, hot_total_size, hot_data_size, hot_index_size, cold_total_size, cold_data_size, and cold_index_size fields vary based on the execution of INSERT, UPDATE, DELETE, and BUILD statements.

  • If the values of hot_total_size and cold_total_size are both 0 after data is loaded, data is still stored in the real-time engine and the value of rt_total_size indicates the volume of real-time data. You can execute the following BUILD statement to convert real-time data to partitioned data and then query the values of hot_total_size and cold_total_size:

    build table <table_name>;
  • A user-defined hot_partition_count field indicates the number of hot partitions in a single shard after list partitioning, whereas the hot_partition_count field queried from the table_usage table indicates the number of hot partitions after shards are unioned. If data partitions are differently distributed across shards, the value of the hot_partition_count field queried from the table_usage table may be greater than that of the user-defined hot_partition_count field.

    For example, assume that Table A contains Shard 1 and Shard 2, and hot_partition_count is set to 2. The following figure shows the distribution of data partitions in Table A.shard

    Shard 1: P4 and P5 are hot partitions. P1, P2, and P3 are cold partitions.

    Shard 2: P3 and P4 are hot partitions. P1 and P2 are cold partitions.

    The actual number of hot partitions is calculated by using the following formula: (P4, P5) Union (P3, P4) = (P3, P4, P5). Therefore, the actual value of hot_partition_count is 3.

Query the change progress of the storage policy

You can change the storage policy in an ALTER TABLE statement. For more information, see ALTER TABLE. The storage_policy_modify_progress table can be used to query the change progress of the storage policy.

  • Query the change progress of the storage policy for all tables involved in the current cluster.

    select * from information_schema.storage_policy_modify_progress;
  • Query the change progress of the storage policy for a specific table.

    select * from information_schema.storage_policy_modify_progress where table_schema='<schema_name>' and table_name='<table_name>';

The following table describes the fields involved in the storage_policy_modify_progress table.

Field

Description

table_schema

The database name.

table_name

The table name.

task_id

The ID of the storage policy change job.

source_storage_policy

The original storage policy. Valid values:

  • HOT

  • COLD

  • MIXED

source_hot_partition_count

The number of original hot partitions.

dest_storage_policy

The new storage policy. Valid values:

  • HOT

  • COLD

  • MIXED

dest_hot_partition_count

The number of new hot partitions.

hot_to_cold_partition_count

The number of partitions that are changed from hot storage to cold storage.

cold_to_hot_partition_count

The number of partitions that are changed from cold storage to hot storage.

hot_to_cold_data_size

The volume of data that is changed from hot storage to cold storage. Unit: bytes.

cold_to_hot_data_size

The volume of data that is changed from cold storage to hot storage. Unit: bytes.

hot_data_size_before_change

The volume of hot data before the storage policy is changed. Unit: bytes.

cold_data_size_before_change

The volume of cold data before the storage policy is changed. Unit: bytes.

hot_data_size_after_change

The volume of hot data after the storage policy is changed. Unit: bytes.

cold_data_size_after_change

The volume of cold data after the storage policy is changed. Unit: bytes.

start_time

The start of the time range within which the storage policy is changed.

update_time

The end of the time range within which the storage policy is changed.

progress

The change progress of the storage policy. Unit: %.

status

The change state of the storage policy. Valid values:

  • INIT: No change starts.

  • RUNNING: The change is in progress.

  • FINISH: The change is complete.