BUILD jobs are used to rebuild partitions that have newly written data. This helps improve read performance. During the rebuild process, the system creates indexes, clears redundant data, and asynchronously executes DDL statements.
Overview
During a BUILD job, the system merges the data that is written in real time and the historical partitions that involve the data, creates indexes, and asynchronously executes DDL statements. After INSERT, UPDATE, or DELETE statements are executed on the data in a partition, a BUILD job can be executed to rebuild the partition. BUILD jobs do not rebuild the partitions that have no data changes.
BUILD jobs are scheduled at the table level. BUILD jobs of different tables can be executed in parallel. After a BUILD job starts on a table, the job is split into tasks at the shard level. All three replicas of a shard each execute a task. The BUILD job is complete when all tasks are complete.
Usage notes
During a BUILD job, you cannot execute the
INSERT OVERWRITE SELECT
statement to insert data into tables until the BUILD job is complete. If you want to insert data, you can execute theINSERT INTO
statement.After you submit a BUILD job, you cannot cancel the job.
By default, the number of concurrent BUILD jobs is the rounded-up result of
Number of cores/3
and cannot be modified.For example, if the node specifications of an AnalyticDB for MySQL cluster are 32 cores and 128 GB memory, the number of concurrent BUILD jobs is 11 (rounded-up result of
32/3
).BUILD jobs consume CPU, memory, and I/O resources. As a result, disk-related metrics, such as CPU utilization and disk I/O usage, may increase. After BUILD jobs are complete, the metrics restore to normal values. We recommend that you execute BUILD jobs during off-peak hours.
Automatic trigger of BUILD jobs
A BUILD job is automatically triggered when one of the following conditions is met:
The period of time that has elapsed since the most recent BUILD job reaches the minimum time interval between BUILD jobs and at least 50,000 rows of data are added to a shard of a table.
Time interval for Data Lakehouse Edition: 1.5 hours.
Time interval for Data Warehouse Edition in elastic mode: 1.5 hours.
Time interval for Data Warehouse Edition in reserved mode: 0.5 hours.
Twenty-four hours have elapsed since the most recent BUILD job and at least one row of data is modified.
Manually trigger a BUILD job
You can use one of the following methods to manually trigger a BUILD job:
Execute a BUILD statement on the partitions that have data changes
BUILD TABLE <table_name>;
This method triggers a BUILD job only on the partitions whose data is modified.
Execute a BUILD statement on specific partitions
BUILD TABLE test force partitions='partition1,partition2';
ImportantYou can execute a BUILD statement on specific partitions only for AnalyticDB for MySQL clusters of V3.1.6.0 or later. For information about how to view the minor version of an AnalyticDB for MySQL cluster, see How do I view the minor version of a cluster? To update the minor version of a cluster, contact technical support.
If a table contains large volumes of data, the
BUILD TABLE <table_name> force = true;
statement may require an extended period of time to execute. AnalyticDB for MySQL allows you to specify thepartitions
on which you want to execute a BUILD job. This helps reduce the usage of resources and improves the performance of BUILD jobs.Execute a BUILD statement on an entire table
BUILD TABLE <table_name> force = true;
This method triggers a BUILD job on and re-creates an index for all partitions of a table. By default, this feature is disabled.
ImportantThis method re-creates an index for all existing data of a table, which consumes an extended period of time. We recommend that you execute a BUILD statement on specific partitions. Before you execute a BUILD statement on an entire table, you must evaluate the effects and risks. To enable this feature, submit a ticket.
Configure automatic scheduling for BUILD jobs
Configuration method
SET ADB_CONFIG RC_CSTORE_BUILD_SCHEDULE_PERIOD=`<start>,<end>`;
The
start
andend
parameters specify a time range within which you want to schedule BUILD jobs. Set the parameters in the range of 0 to 24 and enclose the time range in backticks (` `). You can configure multiple time ranges. Separate multiple time ranges with semicolons (;).For example, you can set the
start
parameter to 0 and theend
parameter to 6 to schedule BUILD jobs within 0:00 to 6:59.ImportantThe specified time ranges are used for scheduling jobs, not for executing jobs. Therefore, after jobs are scheduled, the system may not complete executing the jobs when the time ranges end.
Example
Configure BUILD jobs to be scheduled within the following time ranges: 00:00 to 06:59 and 18:00 to 22:59.
SET ADB_CONFIG RC_CSTORE_BUILD_SCHEDULE_PERIOD=`0,6;18,22`;
After you execute the preceding statement, the system schedules BUILD jobs within 00:00 to 06:59 and 18:00 to 22:59.
Configure scheduling priorities for BUILD jobs
The order in which BUILD jobs are scheduled is based on the amount of data that is added to individual shards of tables. The larger the amount of data added to a shard of a table, the earlier the job is scheduled and executed. You can also use a hint or execute the SET ADB_CONFIG
statement to configure the scheduling priorities of BUILD jobs for tables.
You can configure the scheduling priorities of BUILD jobs only for AnalyticDB for MySQL clusters of V3.1.5.0 or later.
To query the minor version of an AnalyticDB for MySQL Data Lakehouse Edition cluster, execute the
SELECT adb_version();
statement. To update the minor version of a cluster, contact technical support.For information about how to view and update the minor version of an AnalyticDB for MySQL Data Warehouse Edition cluster, see Update the minor version of a cluster.
A hint supports only one table. The configured priority takes effect only on the current BUILD job.
The
SET ADB_CONFIG
statement supports one or more tables. The configured priorities are effective until you configure new scheduling priorities of BUILD jobs for the tables.If you use a hint and the
SET ADB_CONFIG
statement to configure different scheduling priorities of BUILD jobs for a table, the hint-based scheduling priority takes precedence in the current job.
The task_priority
parameter specifies the scheduling priority of BUILD jobs. The value must be an integer. The default value is 0. A greater value specifies a higher scheduling priority. After you configure priorities for BUILD jobs, the frontend nodes schedule BUILD jobs based on the priorities. If you set the task_priority
parameter to a value that is less than 0 for a table, automatic scheduling of BUILD jobs is disabled for the table and the frontend nodes do not schedule BUILD jobs for the table.
Configuration methods:
Use a hint
/*build_task_priority = <task_priority> */ BUILD TABLE <db_name>.<table_name>;
For example, set the scheduling priority of BUILD jobs to 30 for the
test
table in theadb_demo
database./*build_task_priority = 30 */ Build TABLE adb_demo.test;
Use the SET ADB_CONFIG statement
Configure scheduling priorities of BUILD jobs for multiple tables in different databases.
SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `<db1_name>.<table1_name>.<task_priority>;<db2_name>.<table2_name>.<task_priority>`;
NoteSeparate multiple tables with semicolons (;).
For example, set the scheduling priority of BUILD jobs to 30 for the
test1
table in theadb_demo1
database and 10 for thetest2
table in theadb_demo2
database.SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `adb_demo1.test1.30;adb_demo2.test2.10`;
Configure the same scheduling priority of BUILD jobs for all tables in a database.
SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `<db1_name>.*.<task_priority>`;
For example, set the scheduling priority of BUILD jobs to 30 for all tables in the
adb_demo1
database.SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `adb_demo1.*.30`;
Configure different scheduling priorities of BUILD jobs between a table and other tables in the same database.
SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `<db1_name>.*.<task_priority>;<db1_name>.<table_name>.<task_priority>`;
For example, set the scheduling priority of BUILD jobs to 30 for the
test1
table in theadb_demo1
database and 10 for all other tables in the database.SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `adb_demo1.*.10;adb_demo1.test1.30`;
You can execute the SHOW ADB_CONFIG statement to query the current scheduling priority of BUILD jobs.
Query the status of a BUILD job
After a BUILD job is automatically or manually triggered, you can execute the following SQL statement to query the status of the BUILD job within the last three days:
SELECT table_name, schema_name, status FROM INFORMATION_SCHEMA.KEPLER_META_BUILD_TASK ORDER BY create_time DESC LIMIT 10;
The following table describes the status of BUILD jobs based on the values of the status
parameter.
Value of | Description |
INIT | The job is being initialized. |
RUNNING | The job is running. |
FINISH | The job is complete. |
FAQ
Why does the automatic scheduling of BUILD jobs fail to take effect?
Cause: The time-related parameters are not enclosed in backticks (` `). As a result, the parameters fail to be parsed.
Solution: When you execute the SET ADB_CONFIG RC_CSTORE_BUILD_SCHEDULE_PERIOD statement to configure automatic scheduling, enclose the start and end parameters in backticks (` `). For example, SET ADB_CONFIG RC_CSTORE_BUILD_SCHEDULE_PERIOD=`0,6`;
specifies that the system schedules BUILD jobs within 0:00 to 6:59
.