Execute BUILD jobs - AnalyticDB - Alibaba Cloud Documentation Center

BUILD jobs are used to rebuild partitions that have newly written data. This helps improve read performance. During the rebuild process, the system creates indexes, clears redundant data, and asynchronously executes DDL statements.

Overview

During a BUILD job, the system merges the data that is written in real time and the historical partitions that involve the data, creates indexes, and asynchronously executes DDL statements. After INSERT, UPDATE, or DELETE statements are executed on the data in a partition, a BUILD job can be executed to rebuild the partition. BUILD jobs do not rebuild the partitions that have no data changes.

BUILD jobs are scheduled at the table level. BUILD jobs of different tables can be executed in parallel. After a BUILD job starts on a table, the job is split into tasks at the shard level. All three replicas of a shard each execute a task. The BUILD job is complete when all tasks are complete.

Usage notes

During a BUILD job, you cannot execute the INSERT OVERWRITE SELECT statement to insert data into tables until the BUILD job is complete. If you want to insert data, you can execute the INSERT INTO statement.
After you submit a BUILD job, you cannot cancel the job.
By default, the number of concurrent BUILD jobs is the rounded-up result of Number of cores/3 and cannot be modified.
For example, if the node specifications of an AnalyticDB for MySQL cluster are 32 cores and 128 GB memory, the number of concurrent BUILD jobs is 11 (rounded-up result of 32/3).
BUILD jobs consume CPU, memory, and I/O resources. As a result, disk-related metrics, such as CPU utilization and disk I/O usage, may increase. After BUILD jobs are complete, the metrics restore to normal values. We recommend that you execute BUILD jobs during off-peak hours.

Automatic trigger of BUILD jobs

A BUILD job is automatically triggered when one of the following conditions is met:

The period of time that has elapsed since the most recent BUILD job reaches the minimum time interval between BUILD jobs and at least 50,000 rows of data are added to a shard of a table.
- Time interval for Enterprise Edition, Basic Edition, or Data Lakehouse Edition: 1.5 hours.
- Time interval for Data Warehouse Edition in elastic mode: 1.5 hours.
- Time interval for Data Warehouse Edition in reserved mode: 0.5 hours.
Twenty-four hours have elapsed since the most recent BUILD job and at least one row of data is modified.

Manually trigger a BUILD job

Execute a BUILD statement on the partitions that have data changes

XUANWU table
```
BUILD TABLE <table_name>;
```
This method triggers a BUILD job only on the partitions whose data is modified.
- XUANWU_V2 table
```
BUILD TABLE <table_name> [BUILD_OPTION];
```
  BUILD_OPTION: supports only the ttl value, which is an optional parameter. If you leave this parameter empty, this method triggers a BUILD job only on the partitions whose data is modified. If you specify this parameter, this method immediately deletes expired partitions after the BUILD job is complete.
Note
For information about how to determine and specify a table engine, see the "Specify a table engine" section of the XUANWU_V2 engine topic.

Execute a BUILD statement on specific partitions

BUILD TABLE test force partitions='partition1,partition2';

Important

You can execute a BUILD statement on specific partitions only for AnalyticDB for MySQL clusters of V3.1.6.0 or later. For information about how to view the minor version of an AnalyticDB for MySQL cluster, see How do I view the minor version of a cluster? To update the minor version of a cluster, contact technical support.

If a table contains large volumes of data, the BUILD TABLE <table_name> force = true; statement may require an extended period of time to execute. AnalyticDB for MySQL allows you to specify the partitions on which you want to execute a BUILD job. This helps reduce the usage of resources and improves the performance of BUILD jobs.

Execute a BUILD statement on an entire table

BUILD TABLE <table_name> force = true;

This method triggers a BUILD job on and re-creates an index for all partitions of a table. By default, this feature is disabled.

Important

This method re-creates an index for all existing data of a table, which consumes an extended period of time. We recommend that you execute a BUILD statement on specific partitions. Before you execute a BUILD statement on an entire table, you must evaluate the effects and risks. To enable this feature, submit a ticket.

Configure automatic scheduling for BUILD jobs

Configuration method
```
SET ADB_CONFIG RC_CSTORE_BUILD_SCHEDULE_PERIOD=`<start>,<end>`;
```
The start and end parameters specify a time range within which you want to schedule BUILD jobs. Set the parameters in the range of 0 to 24 and enclose the time range in backticks (` `). You can configure multiple time ranges. Separate multiple time ranges with semicolons (;).
For example, you can set the start parameter to 0 and the end parameter to 6 to schedule BUILD jobs within 0:00 to 6:59.
Important
The specified time ranges are used for scheduling jobs, not for executing jobs. Therefore, after jobs are scheduled, the system may not complete executing the jobs when the time ranges end.
Example
Configure BUILD jobs to be scheduled within the following time ranges: 00:00 to 06:59 and 18:00 to 22:59.
```
SET ADB_CONFIG RC_CSTORE_BUILD_SCHEDULE_PERIOD=`0,6;18,22`;
```
After you execute the preceding statement, the system schedules BUILD jobs within 00:00 to 06:59 and 18:00 to 22:59.

Configure scheduling priorities for BUILD jobs

The order in which BUILD jobs are scheduled is based on the amount of data that is added to individual shards of tables. The larger the amount of data added to a shard of a table, the earlier the job is scheduled and executed. You can also use a hint or execute the SET ADB_CONFIG statement to configure the scheduling priorities of BUILD jobs for tables.

Important

You can configure the scheduling priorities of BUILD jobs only for AnalyticDB for MySQL clusters of V3.1.5.0 or later.
To view and update the minor version of an AnalyticDB for MySQL cluster, log on to the AnalyticDB for MySQL console and go to the Configuration Information section of the Cluster Information page.
A hint supports only one table. The configured priority takes effect only on the current BUILD job.
The SET ADB_CONFIG statement supports one or more tables. The configured priorities are effective until you configure new scheduling priorities of BUILD jobs for the tables.
If you use a hint and the SET ADB_CONFIG statement to configure different scheduling priorities of BUILD jobs for a table, the hint-based scheduling priority takes precedence in the current job.

The task_priority parameter specifies the scheduling priority of BUILD jobs. The value must be an integer. The default value is 0. A greater value specifies a higher scheduling priority. After you configure priorities for BUILD jobs, the frontend nodes schedule BUILD jobs based on the priorities. If you set the task_priority parameter to a value that is less than 0 for a table, automatic scheduling of BUILD jobs is disabled for the table and the frontend nodes do not schedule BUILD jobs for the table.

Configuration methods:

Use a hint

/*build_task_priority = <task_priority> */ BUILD TABLE <db_name>.<table_name>;

For example, set the scheduling priority of BUILD jobs to 30 for the test table in the adb_demo database.

/*build_task_priority = 30 */ Build TABLE adb_demo.test;

Use the SET ADB_CONFIG statement
- Configure scheduling priorities of BUILD jobs for multiple tables in different databases.
```
SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `<db1_name>.<table1_name>.<task_priority>;<db2_name>.<table2_name>.<task_priority>`;
```
  Note
  Separate multiple tables with semicolons (;).
  For example, set the scheduling priority of BUILD jobs to 30 for the test1 table in the adb_demo1 database and 10 for the test2 table in the adb_demo2 database.
```
SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `adb_demo1.test1.30;adb_demo2.test2.10`;
```
- Configure the same scheduling priority of BUILD jobs for all tables in a database.
```
SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `<db1_name>.*.<task_priority>`;
```
  For example, set the scheduling priority of BUILD jobs to 30 for all tables in the adb_demo1 database.
```
SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `adb_demo1.*.30`;
```
- Configure different scheduling priorities of BUILD jobs between a table and other tables in the same database.
```
SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `<db1_name>.*.<task_priority>;<db1_name>.<table_name>.<task_priority>`;
```
  For example, set the scheduling priority of BUILD jobs to 30 for the test1 table in the adb_demo1 database and 10 for all other tables in the database.
```
SET ADB_CONFIG RC_BUILD_TASK_PRIORITY_LIST = `adb_demo1.*.10;adb_demo1.test1.30`;
```

You can execute the SHOW ADB_CONFIG statement to query the current scheduling priority of BUILD jobs.

Query the status of a BUILD job

After a BUILD job is automatically or manually triggered, you can execute the following SQL statement to query the status of the BUILD job within the last three days:

SELECT table_name, schema_name, status FROM INFORMATION_SCHEMA.KEPLER_META_BUILD_TASK ORDER BY create_time DESC LIMIT 10;

The following table describes the status of BUILD jobs based on the values of the status parameter.

Value of `status`	Description
INIT	The job is being initialized.
RUNNING	The job is running.
FINISH	The job is complete.

FAQ

Why does the automatic scheduling of BUILD jobs fail to take effect?

Cause: The time-related parameters are not enclosed in backticks (` `). As a result, the parameters fail to be parsed.

Solution: When you execute the SET ADB_CONFIG RC_CSTORE_BUILD_SCHEDULE_PERIOD statement to configure automatic scheduling, enclose the start and end parameters in backticks (` `). For example, SET ADB_CONFIG RC_CSTORE_BUILD_SCHEDULE_PERIOD=`0,6`; specifies that the system schedules BUILD jobs within 0:00 to 6:59.