PolarDB-X 2.0 allows you to manage the cleanup jobs for expired data in time to live (TTL) tables. You can monitor the progress of and information about cleanup jobs and change the cleanup speed and cleanup range based on actual conditions to balance cleanup jobs and running business.
Prerequisites
A PolarDB-X instance that meets one of the following requirements is created:
The engine version of the instance is MySQL 5.7 and the instance version is polardb-2.4.0_5.4.19-20240927_xcluster5.4.19-20240920 or later.
The engine version of the instance is MySQL 8.0 and the instance version is polardb-2.4.0_5.4.19-20240927_xcluster8.4.19-20240924 or later.
NoteFor more information about version number, see Release notes.
For more information about how to check the instance version, see View and update the version of an instance.
A partitioned table that belongs to a database in AUTO mode and is not partitioned by using the
LOCAL PARTITION BYclause is used.
Usage notes
When a cleanup job is triggered for a TTL table, the rows whose time column value is NULL are deleted by default. To retain these rows, explicitly set the TTL_FILTER attribute.
Start a cleanup job
To start a cleanup job for expired data in TTL tables, you can use one of the following methods:
Scheduled start: Configure the TTL_JOB attribute to allow the system to automatically start a cleanup job as scheduled.
Manual start: Manually execute the
ALTER TABLE CLEANUP EXPIRED DATAstatement to start a cleanup job.
The system allows cleanup jobs to simultaneously run on up to two TTL tables. Other cleanup jobs must be queued for execution.
Usage
Scheduled start
Examples:
ALTER TABLE `my_ttl_tbl` MODIFY TTL SET TTL_ENABLE = 'ON';NoteThe scheduled cleanup job of the
my_ttl_tblTTL table is enabled. By default, the cleanup job is disabled.ALTER TABLE `my_ttl_tbl` MODIFY TTL SET TTL_JOB = CRON '0 0 2 */1 * ? *' TIMEZONE '+08:00';NoteThe schedule on which the cleanup job of the
my_ttl_tblTTL table runs is modified.The schedule specified by the TTL_JOB attribute is an expected time value. The actual execution time of a cleanup job varies based on whether preceding jobs are complete. If cleanup jobs of multiple TTL tables are running and queued, the cleanup job must wait for the running and queued cleanup jobs to complete.
Manual start
Example:
/*+TDDL:cmd_extra(ENABLE_ASYNC_DDL=true, PURE_ASYNC_DDL_MODE=true)*/ ALTER TABLE `my_ttl_tbl` CLEANUP EXPIRED DATA;Note/*+TDDL:cmd_extra(ENABLE_ASYNC_DDL=true, PURE_ASYNC_DDL_MODE=true)*/is used to configure the DDL statement to be asynchronously executed in the background. You do not need to wait for the result to return in the current session.The
ALTER TABLE CLEANUP EXPIRED DATAstatement is supported only for TTL tables.
Calculate the cleanup time range
A TTL table uses an earliest-to-most recent batch data cleanup algorithm to remove data. The cleanup process starts with the earliest data and removes data within a fixed time interval during each cycle.
Execute the following statement to modify the data retention period of the my_ttl_tbl table in its TTL definition:
ALTER TABLE `my_ttl_tbl`
MODIFY TTL
SET
TTL_EXPR = `time` EXPIRE AFTER 1 MONTH TIMEZONE '+08:00'The data retention period of the table is one month.
The cleanup time range for a cleanup job is three times the data retention period. In this example, the cleanup time range is three months.
For example, the current time is 2023-10-01 and an online table needs to retain only the data of the most recent month. The following figure shows the cleanup process.
Day 1:
In this example, the minimum time value of the TTL-defined time column is 2022-10-05(MinValue), and the data cleanup time interval (CleanupDataInterval) is 3 months. For more information about the data cleanup time interval, see Modify the data cleanup time interval. Based on the minimum time value and the data cleanup time interval, the first cleanup time range is from October 5, 2022 (inclusive) to January 1, 2023 (exclusive). The subsequent cleanup time ranges are from January 1, 2023 (inclusive) to April 1, 2023 (exclusive), from April 1, 2023 (inclusive) to July 1, 2023 (exclusive) and from July 1, 2023 (inclusive) to September 1, 2023 (exclusive).
Day 2:
Cleans up data in the time range of January 1, 2023 (inclusive) to April 1, 2023 (exclusive). April 1, 2023 is the upper boundary (CleanupUpperBound) of the cleanup job.
Day 3:
Cleans up data in the time range of April 1, 2023 (inclusive) to July 1, 2023 (exclusive). July 1, 2023 is the upper boundary (CleanupUpperBound) of the cleanup job.
Day 4:
Cleans up data in the time range of July 1, 2023 (inclusive) to September 1, 2023 (exclusive). The time span is 2 months. In this example, the current date is October 1, 2023, and the TTL table is configured to retain data for the most recent month (ExpiredDataInterval). Therefore, the cleanup job cleans up data earlier than September 1, 2023, which is upper boundary (CleanupUpperBound) of the cleanup job.
The upper boundary (CleanupUpperBound) of each cleanup job can be calculated by using the following formula: CleanupUpperBound = Min (MinValue + CleanupDataInterval, ExpiredDataInterval).
Query the progress of a cleanup job
Query definitions of TTL tables
Execute the following statement to query the definitions and enabling statuses of the cleanup jobs for all TTL tables in the current database from the INFORMATION_SCHEMA.TTL_INFO view:
SELECT *
FROM INFORMATION_SCHEMA.TTL_INFO
WHERE TABLE_SCHEMA='ttldb' AND TABLE_NAME = 'my_ttl_tbl';*************************** 1. row ***************************
TABLE_SCHEMA: ttldb
TABLE_NAME: my_ttl_tbl
TTL_ENABLE: OFF
TTL_COL: date_field
TTL_EXPR: `date_field` EXPIRE AFTER 2 MONTH TIMEZONE '+08:00'
TTL_CRON: 0 0 1 * * ? *
ARCHIVE_TYPE:
ARCHIVE_TABLE_SCHEMA: NULL
ARCHIVE_TABLE_NAME: NULL
ARCHIVE_TABLE_PRE_ALLOCATE: 12
ARCHIVE_TABLE_POST_ALLOCATE: 64
1 row in set (0.07 sec)The following table describes the fields.
Field | Description |
TABLE_SCHEMA | The logical name of the database to which the TTL table belongs. |
TABLE_NAME | The logical name of the TTL table. |
TTL_ENABLE | Indicates whether the scheduled cleanup job of the TTL table is enabled. |
TTL_COL | The time column. |
TTL_EXPR | The data retention period. |
TTL_CRON | The schedule on which the cleanup job of the TTL table runs. The value is a CRON expression of the Quartz framework. |
ARCHIVE_TYPE | The archive type. Default value: Columnar, which indicates that data is archived by row. |
ARCHIVE_TABLE_SCHEMA | The logical name of the database to which the archive table bound to the TTL table belongs. If the TTL table is not bound to an archive table, NULL is returned. |
ARCHIVE_TABLE_NAME | The logical name of the archive table bound to the TTL table. If the TTL table is not bound to an archive table, NULL is returned. |
ARCHIVE_TABLE_PRE_ALLOCATE | The number of partitions that are pre-created for the TTL table based on partitioning rules. |
ARCHIVE_TABLE_POST_ALLOCATE | The number of partitions that are created for the TTL table. |
Query information about the cleanup job of a TTL table
You can query the real-time information about the running cleanup job of a TTL table from the INFORMATION_SCHEMA.TTL_SCHEDULE view.
Execute the following statement to query information about the running cleanup job of the my_ttl_tbl TTL table:
// Manually start the cleanup job.
ALTER TABLE `my_ttl_tbl` CLEANUP EXPIRED DATA;
SELECT *
FROM INFORMATION_SCHEMA.TTL_SCHEDULE
WHERE TABLE_SCHEMA='ttldb' AND TABLE_NAME = 'my_ttl_tbl';+--------------+------------+--------------------------------------+---------------------+
| TABLE_SCHEMA | TABLE_NAME | METRIC_KEY | METRIC_VAL |
+--------------+------------+--------------------------------------+---------------------+
| ttldb | my_ttl_tbl | SCHEDULE_STATUS | DISABLED |
| ttldb | my_ttl_tbl | SCHEDULE_EXPR | 0 0 1 * * ? * |
| ttldb | my_ttl_tbl | SCHEDULE_COMMENT | at 01:00 |
| ttldb | my_ttl_tbl | SCHEDULE_TIMEZONE | +08:00 |
| ttldb | my_ttl_tbl | SCHEDULE_LAST_FIRE_TIME | 1970-01-01 08:00:00 |
| ttldb | my_ttl_tbl | SCHEDULE_NEXT_FIRE_TIME | 1970-01-01 08:00:00 |
| ttldb | my_ttl_tbl | TTL_CURR_TTL_COL_MIN_VAL | null |
| ttldb | my_ttl_tbl | TTL_CURR_CLEANUP_BOUND | 2024-05-01 00:00:00 |
| ttldb | my_ttl_tbl | TTL_CURR_CLEANUP_UPPER_BOUND | 2024-05-01 00:00:00 |
| ttldb | my_ttl_tbl | TTL_CURR_NEW_DATETIME_VAL | 2024-07-18 18:22:54 |
| ttldb | my_ttl_tbl | TTL_CURR_JOB_STAGE | Finished |
| ttldb | my_ttl_tbl | TTL_CURR_JOB_BEGIN_TS | 1721298173321 |
| ttldb | my_ttl_tbl | TTL_CURR_JOB_END_TS | 0 |
| ttldb | my_ttl_tbl | TTL_CURR_JOB_FROM_SCHEDULER | false |
| ttldb | my_ttl_tbl | TTL_CURR_JOB_STOP_BY_MAINTAIN_WINDOW | false |
| ttldb | my_ttl_tbl | TTL_CURR_DN_ROWS_SPEED_LIMIT | 0 |
| ttldb | my_ttl_tbl | TTL_CURR_CLEANED_PHY_PART_COUNT | 8 |
| ttldb | my_ttl_tbl | TTL_CURR_TOTAL_PHY_PART_COUNT | 8 |
| ttldb | my_ttl_tbl | TTL_CURR_JOB_PERCENT | 100 |
| ttldb | my_ttl_tbl | TTL_ACQUIRE_PERMITS_AVG_RT_NANO | 0 |
| ttldb | my_ttl_tbl | TTL_CURR_CLEANUP_TIMECOST | 6044 |
| ttldb | my_ttl_tbl | TTL_CURR_CLEANUP_ROWS | 0 |
| ttldb | my_ttl_tbl | TTL_CURR_CLEANUP_ROWS_SPEED | 0 |
| ttldb | my_ttl_tbl | TTL_CURR_CLEANUP_DATA_LENGTH | 0 |
| ttldb | my_ttl_tbl | TTL_CURR_CLEANUP_SPEED | 0 |
| ttldb | my_ttl_tbl | TTL_CURR_SELECT_SQL_AVG_RT | 844 |
| ttldb | my_ttl_tbl | TTL_CURR_DELETE_SQL_AVG_RT | 2857 |
| ttldb | my_ttl_tbl | TTL_CURR_OPTIMIZE_SQL_AVG_RT | 0 |
| ttldb | my_ttl_tbl | TTL_CURR_ADD_PART_AVG_RT | 0 |
| ttldb | my_ttl_tbl | TTL_CURR_DATA_FREE_PERCENT_LIMIT | -1 |
| ttldb | my_ttl_tbl | TTL_CURR_TTL_TBL_DATA_FREE_PERCENT | 0.00 |
| ttldb | my_ttl_tbl | TTL_CURR_OPTIMIZE_TTL_TBL_PROGRESS | 0 |
| ttldb | my_ttl_tbl | TTL_LAST_JOB_FROM_SCHEDULER | false |
| ttldb | my_ttl_tbl | TTL_LAST_TTL_COL_MIN_VAL | |
| ttldb | my_ttl_tbl | TTL_LAST_CLEANUP_BOUND | |
| ttldb | my_ttl_tbl | TTL_LAST_CLEANUP_UPPER_BOUND | |
| ttldb | my_ttl_tbl | TTL_LAST_JOB_BEGIN_TS | 0 |
| ttldb | my_ttl_tbl | TTL_LAST_JOB_END_TS | 0 |
| ttldb | my_ttl_tbl | TTL_LAST_SELECT_SQL_AVG_RT | 0 |
| ttldb | my_ttl_tbl | TTL_LAST_DELETE_SQL_AVG_RT | 0 |
| ttldb | my_ttl_tbl | TTL_LAST_OPTIMIZE_SQL_AVG_RT | 0 |
| ttldb | my_ttl_tbl | TTL_LAST_ADD_PARTS_SQL_AVG_RT | 0 |
| ttldb | my_ttl_tbl | TTL_LAST_CLEANUP_TIMECOST | 0 |
| ttldb | my_ttl_tbl | TTL_LAST_CLEANUP_ROWS | 0 |
| ttldb | my_ttl_tbl | TTL_LAST_CLEANUP_ROWS_SPEED | 0 |
| ttldb | my_ttl_tbl | TTL_LAST_CLEANUP_DATA_LENGTH | 0 |
| ttldb | my_ttl_tbl | TTL_LAST_CLEANUP_SPEED | 0 |
| ttldb | my_ttl_tbl | TTL_LAST_TTL_TBL_DATA_FREE_PERCENT | 0 |
+--------------+------------+--------------------------------------+---------------------+
48 rows in set (0.03 sec)The following table describes specific fields.
Field | Description |
SCHEDULE_STATUS | Indicates whether the scheduled cleanup job is enabled. Valid values:
|
SCHEDULE_EXPR | The schedule on which the cleanup job runs. You can modify the schedule. For more information, see the "Modify the execution schedule of the cleanup job" section of the Definition and creation of TTL tables topic. |
SCHEDULE_COMMENT | The description of the value of the SCHEDULE_EXPR field. |
SCHEDULE_TIMEZONE | The time zone for scheduling the cleanup job. |
SCHEDULE_LAST_FIRE_TIME | The time when automatic scheduling was last triggered. The initial value is 1970-01-01 08:00:00, which indicates that automatic scheduling is not triggered. |
SCHEDULE_NEXT_FIRE_TIME | The time when automatic scheduling is triggered next time. The initial value is 1970-01-01 08:00:00, which indicates that automatic scheduling is not triggered. |
TTL_CURR_TTL_COL_MIN_VAL | The minimum value in the time column specified in the TTL definition of the cleanup job. |
TTL_CURR_JOB_STAGE | The current phase of the cleanup job. |
TTL_CURR_CLEANUP_BOUND | The value in the time column that is being cleaned up by the cleanup job. |
TTL_CURR_CLEANUP_UPPER_BOUND | The upper bound of the cleanup job. The time values of historical data to be cleaned up are earlier than the upper bound. |
TTL_CURR_NEW_DATETIME_VAL | The current time. |
TTL_CURR_DN_ROWS_SPEED_LIMIT | The speed limit of the cleanup job, which is the speed at which the |
TTL_CURR_CLEANUP_ROWS | The total number of rows that are cleaned up by the cleanup job. |
TTL_CURR_CLEANUP_ROWS_SPEED | The real-time cleanup speed of the cleanup job. The value is the number of rows that are cleaned up per second. |
TTL_CURR_CLEANUP_DATA_LENGTH | The estimated total number of bytes that are cleaned up by the cleanup job. |
TTL_CURR_CLEANUP_SPEED | The real-time cleanup speed of the cleanup job. The value is the number of estimated bytes that are cleaned up per second. |
TTL_CURR_SELECT_SQL_AVG_RT | The average response time of the |
TTL_CURR_DELETE_SQL_AVG_RT | The average response time of the |
TTL_CURR_JOB_PERCENT | The completion percentage of the cleanup job. |
TTL_CURR_CLEANED_PHY_PART_COUNT | The number of physical partitions that are cleaned up by the cleanup job. |
TTL_CURR_TOTAL_PHY_PART_COUNT | The total number of physical partitions that need to be cleaned up by the cleanup job. |
Manage cleanup jobs
You can manage cleanup jobs to clean up data at an appropriate time and speed. This minimizes the impact of cleanup jobs on your business.
Modify the maintenance window
The default maintenance window for background cleanup jobs in PolarDB-X 2.0 is 02:00 to 05:00 every day. You can execute the following statements to change the maintenance window to 01:00 to 06:00 every day:
set global MAINTENANCE_TIME_START = '01:00';
set global MAINTENANCE_TIME_END = '06:00';Modify the cleanup speed limit
By default, the cleanup speed limit for a single storage node is 1,000 rows per second. The total cleanup speed of all cleanup jobs on the node does not exceed the limit. If you increase the number of storage nodes in an instance, the total cleanup speed is increased.
You can execute the following statement to change the cleanup speed limit of a single storage node to 10,000 rows per second:
set global TTL_ENABLE_CLEANUP_ROWS_SPEED_LIMIT=10000If you increase the cleanup speed limit, more CPU and IOPS resources are consumed. We recommend that you specify an appropriate cleanup speed limit based on the actual performance of storage nodes. This prevents cleanup jobs from consuming excessive CPU and IOPS resources on storage nodes and affecting your business.
When the CPU and IOPS resources of a storage node are at full capacity, increasing the cleanup speed limit by executing the preceding statement may not improve the cleanup speed.
Modify the cleanup time range
By default, the time range of data to be cleaned up by the cleanup job of a TTL table is mainly controlled by the TTL_CLEANUP_BOUND_INTERVAL_COUNT parameter.
You can execute the following statement to modify the time range of data to be cleaned up:
set global TTL_CLEANUP_BOUND_INTERVAL_COUNT = 6;The default value of the TTL_CLEANUP_BOUND_INTERVAL_COUNT parameter is 3, which specifies that the cleanup time range for the cleanup job is three times the data retention period. You can execute the preceding statement to set the cleanup time range to six times the data retention period. The default cleanup range varies based on the time unit that you specified in the TTL definition. Examples:
If you specify the time unit in the TTL definition as DAY by setting the TTL_EXPR attribute to
'time' EXPIRE AFTER NUM DAY TIMEZONE '+08:00', the default cleanup range is six days of data each time. Data whose time value meets the following condition is cleaned up: MinValue ≤ Time < (MinValue + 6 Days).If you specify the time unit in the TTL definition as MONTH by setting the TTL_EXPR attribute to
'time' EXPIRE AFTER NUM MONTH TIMEZONE '+08:00', the default cleanup range is six months of data each time. Data whose time value meets the following condition is cleaned up: MinValue ≤ Time < (MinValue + 6 Months).If you specify the time unit in the TTL definition as YEAR by setting the TTL_EXPR attribute to
'time' EXPIRE AFTER NUM YEAR TIMEZONE '+08:00', the default cleanup range is six years of data each time. Data whose time value meets the following condition is cleaned up: MinValue ≤ Time < (MinValue + 6 Years).
Pause a cleanup job
Execute the following
SHOW FULL DDLstatement to query thejob IDsof all cleanup jobs:show full ddl\G;*************************** 1. row *************************** JOB_ID: 1771694362409848832 OBJECT_SCHEMA: ttldb OBJECT_NAME: my_ttl_tbl ENGINE: DAG DDL_TYPE: ALTER_TABLE STATE: RUNNING TOTAL_BACKFILL_PROGRESS: -- CURRENT_PHY_DDL_PROGRESS: 0% PROGRESS: 33% FASTCHECKER_TASK_NUM: 0 FASTCHECKER_TASK_FINISHED: 0 START_TIME: 2024-09-14 15:55:12.991 END_TIME: 2024-09-14 15:55:16.959 ELAPSED_TIME(MS): 3968 PHY_PROCESS: CANCELABLE: true PARENT_JOB_ID: -- RESPONSE_NODE: 10.57.104.154:3067 EXECUTION_NODE: 10.57.104.154:3067 TRACE_ID: 189652a2bc805000 DDL_STMT: alter table my_ttl_tbl cleanup expired data REMARK: -- LEGACY_ENGINE_INFO: -- 1 row in set (0.01 sec)Obtain the
job IDof the TTL table that you want to manage from the results in Step 1. Use the obtained job ID to execute the following statement to pause the cleanup job of the TTL table:pause ddl 1771694362409848832;
A cleanup job can be paused or rolled back in real time. If you roll back a cleanup job, only the job is rolled back. The data that is cleaned up cannot be rolled back. For more information, see DDL management statements.
Restart a cleanup job
Execute the following statement to restart a paused cleanup job:
CONTINUE DDL 1771694362409848832;