All Products
Search
Document Center

PolarDB:Parallel DDL

Last Updated:Jun 07, 2024

This topic describes how to control the degree of parallelism (DOP) for DDL statements in PolarDB-X.

Prerequisites

  • The kernel version of your PolarDB-X instance is V5.4.15-16715927 or later if you want to control DOP for physically executed DDL statements.

  • The kernel version of your PolarDB-X instance is V5.4.16-16825599 or later if you want to control DOP for logically executed DDL statements.

Usage notes

If you increase the resource usage of DDL statements by using the parameters described in this topic, your business traffic may be affected. Make sure that the DOP adjustment is required based on your business requirements and your cluster resource specifications meet the requirements before you adjust the DOP.

Overview

DDL statements in PolarDB-X can be physically or logically executed. For more information about the characteristics of the preceding two execution modes and the execution modes for different types of DDL statements, see Online DDL. For the DDL statements that are executed in different modes, PolarDB-X provides different parameters that you can use to control the resources occupied by the DDL statements and the throughput of data processing.

  • For physically executed DDL statements, compute nodes (CNs) forward physical DDL statement to be executed on physical tables. The main resource overhead is on data nodes (DNs). PolarDB-X provides the control parameters related to the DOP of forwarding physical DDL statements.

  • For logically executed DDL statements, CNs processes all the logic including data backfilling, multi-write, verification, and metadata management in the entire process. Data backfilling occupies most of the resources and time. PolarDB-X provides the control parameters related to the data backfilling DOP during the logical execution.

DOP control for physically executed DDL statements

For physically executed DDL statements, you can control the DOP of forwarding physical DDL statements by using the parameters described in the following table.

Parameter

Type

Level

Description

Default value

Recommended maximum configuration

MERGE_DDL_CONCURRENT

bool

Session or global

Specifies whether to send physical DDL statements in parallel.

false

We recommend that you set this parameter to true.

MERGE_CONCURRENT

bool

Session or global

Specifies whether to establish connections in parallel to database shards.

false

We recommend that you set this parameter to true.

PREFETCH_SHARDS

int

Session or global

The maximum DOP of sending physical DDL statements to all physical databases.

1 × Number of DNs

We recommend that you set the maximum value to the result produced by multiplying the number of CPU cores on a DN by the number of DNs.

By default, a single physical DDL statement can achieve high usage on a single CPU core. Therefore, if the value of the PREFETCH_SHARDS parameter exceeds the number of CPU cores on all DNs, the effect of increasing the value is not significant.

For example, an instance has two CNs and two DNs, each of which has 16 CPU cores and 64 GB of memory, and you want to add a local index to the bmsql_order_line table in the Transaction Processing Performance Council Benchmark C (TPC-C) 10000 warehouse. If the instance has sufficient resources, you can configure the parameters in the following way:

/*+TDDL:cmd_extra(MERGE_DDL_CONCURRENT=true,MERGE_CONCURRENT=true,PREFETCH_SHARDS=32)*/ 
ALTER TABLE `bmsql_order_line` ADD LOCAL INDEX `i_ol_d_id`(`ol_d_id`);

You can run the show physical processlist where info like "%i_ol_d_id%"; command to view the concurrent execution of physical DDL statements on DNs.

DOP control for logically executed DDL statements

For logically executed DDL statements, you can control the DOP for data backfilling and resource limit by using the parameters described in the following table. You can also run the show ddl status command to view the logical execution progress.

The following table describes the DOP control parameters for DDL statements that are executed to add GSIs, repartition tables, and perform OMC.

Parameter

Type

Level

Description

Default value

Recommended maximum configuration

BACKFILL_PARALLELISM

int

Global

The maximum number of concurrent backfilling threads that are globally allowed.

max(CPU_cores, 8) * 4

Generally, you do not need to adjust this parameter.

SLIDE_WINDOW_TIME_INTERVAL

int

Session or global

The delay in starting concurrent backfilling for large physical tables. Unit: seconds.

60

You can set the value to 0 if the resources are sufficient.

GSI_BACKFILL_PARALLELISM

int

Session or global

The maximum number of concurrent backfilling threads that are allowed in a single DDL task.

-1

Generally, you do not need to adjust this parameter.

PHYSICAL_TABLE_BACKFILL_PARALLELISM

int

Session or global

The maximum number of concurrent backfilling threads that are allowed for a single physical table.

4

We recommend that you set the maximum value to the result produced by multiplying the number of CPU cores on a CN by 2 based on the load.

GSI_BACKFILL_SPEED_LIMITATION

int

Session or global

The maximum backfilling rate that is allowed. Unit: row/s.

150000

We recommend that you set the maximum value to -1 based on the load.

The following table describes the DOP control parameters for the DDL statements that are executed to move partitions and scale tables.

Parameter

Type

Level

Description

Default value

Recommended maximum configuration

BACKFILL_PARALLELISM

int

Global

The maximum number of concurrent backfilling threads that are globally allowed.

max(CPU_cores, 8) * 4

Generally, you do not need to adjust this parameter.

SLIDE_WINDOW_TIME_INTERVAL

int

Session or global

The delay in starting concurrent backfilling for large physical tables. Unit: seconds.

60

You can set the value to 0 if the resources are sufficient.

SCALEOUT_BACKFILL_PARALLELISM

int

Session or global

The maximum number of concurrent backfilling threads that are allowed in a single DDL task.

-1

Generally, you do not need to adjust this parameter.

PHYSICAL_TABLE_BACKFILL_PARALLELISM

int

Session or global

The maximum number of concurrent backfilling threads that are allowed for a single physical table.

4

We recommend that you set the maximum value to the result produced by multiplying the number of CPU cores on a CN by 2 based on the load.

SCALEOUT_BACKFILL_SPEED_LIMITATION

int

Session or global

The maximum backfilling rate that is allowed. Unit: row/s.

300000

We recommend that you set the maximum value to -1 based on the load.

For example, an instance has two CNs and two DNs, each of which has 16 CPU cores and 64 GB of memory, and you want to add a global index to the bmsql_order_line table in the TPC-C 10000 warehouse. The following statement is executed if the default parameter settings are used:

/*+TDDL:cmd_extra(PURE_ASYNC_DDL_MODE=true)*/
ALTER TABLE `bmsql_order_line` ADD GLOBAL INDEX `g_i_ol_d_id`(`ol_d_id`) partition by hash(`ol_d_id`) partitions 32;

You can run the show ddl status command to view the execution status of the DDL statement.

show ddl status;
+----------------------------------+-----------+
| METRIC                           | VALUE     |
+----------------------------------+-----------+
| BACKFILL_PARALLELISM             |        32 |
| BACKFILL_ROWS_FINISHED           | 288679866 |
| BACKFILL_ROWS_SPEED              |    150186 |
| BACKFILL_TASK_FAILED             |         0 |
| BACKFILL_TASK_FINISHED           |         1 |
| BACKFILL_TASK_TOTAL              |         0 |
| BACKFILL_TIME_MILLIS             |      3450 |
| CHANGESET_APPLY_PARALLELISM      |         0 |
| CHANGESET_APPLY_ROWS_SPEED       |         0 |
| CHECKER_ROWS_FINISHED            |         0 |
| CHECKER_TIME_MILLIS              |         0 |
| DDL_EXECUTION_TIME_MILLIS        |    696412 |
| DDL_JOBS_FINISHED                |         5 |
| DDL_JOBS_TOTAL                   |         6 |
| DDL_TASK_FAILED                  |         4 |
| DDL_TASK_FINISHED                |       107 |
| DDL_TASK_TOTAL                   |        63 |
| FASTCHECKER_TASK_RUNNING         |         0 |
| FASTCHECKER_TASK_WAITING         |         0 |
| FASTCHECKER_THREAD_POOL_MAX_SIZE |         1 |
| FASTCHECKER_THREAD_POOL_NOW_SIZE |         0 |
| FASTCHECKER_THREAD_POOL_NUM      |         2 |
| THROTTLE_RATE                    |    150000 |
+----------------------------------+-----------+

Parameters that you need to pay attention to:

  • BACKFILL_PARALLESIM: the total number of backfilling threads that are active.

  • BACKFILL_ROWS_SPEED: the current backfilling rate by row.

  • THROTTLE_RATE: the rate limit.

In this case, the DDL backfilling rate is subject to the value specified by the THROTTLE_RATE parameter and is limited to about 150,000 rows/s. If you want to use the maximum DOP and cancel the throttling, you can send the DDL statement for logical execution in the following way:

/*+TDDL:cmd_extra(SLIDE_WINDOW_TIME_INTERVAL=0,PURE_ASYNC_DDL_MODE=true,PHYSICAL_TABLE_BACKFILL_PARALLELISM=8,GSI_BACKFILL_PARALLELISM=32,GSI_BACKFILL_SPEED_LIMITATION=-1)*/ 
ALTER TABLE bmsql_order_line ADD GLOBAL INDEX g_i_ol_d_id(`ol_d_id`) partition by hash(`ol_d_id`) partitions 32;

You can run the show ddl status command to view the execution status of the DDL statement:

show ddl status;
+----------------------------------+----------+
| METRIC                           | VALUE    |
+----------------------------------+----------+
| BACKFILL_PARALLELISM             |       32 |
| BACKFILL_ROWS_FINISHED           | 37272038 |
| BACKFILL_ROWS_SPEED              |   867669 |
| BACKFILL_TASK_FAILED             |        0 |
| BACKFILL_TASK_FINISHED           |        1 |
| BACKFILL_TASK_TOTAL              |        0 |
| BACKFILL_TIME_MILLIS             |     3450 |
| CHANGESET_APPLY_PARALLELISM      |        0 |
| CHANGESET_APPLY_ROWS_SPEED       |        0 |
| CHECKER_ROWS_FINISHED            |        0 |
| CHECKER_TIME_MILLIS              |        0 |
| DDL_EXECUTION_TIME_MILLIS        |   112397 |
| DDL_JOBS_FINISHED                |        3 |
| DDL_JOBS_TOTAL                   |        4 |
| DDL_TASK_FAILED                  |        2 |
| DDL_TASK_FINISHED                |       75 |
| DDL_TASK_TOTAL                   |       42 |
| FASTCHECKER_TASK_RUNNING         |        0 |
| FASTCHECKER_TASK_WAITING         |        0 |
| FASTCHECKER_THREAD_POOL_MAX_SIZE |        1 |
| FASTCHECKER_THREAD_POOL_NOW_SIZE |        0 |
| FASTCHECKER_THREAD_POOL_NUM      |        2 |
| THROTTLE_RATE                    |       -1 |
+----------------------------------+----------+
23 rows in set (0.02 sec)

In this case, the DDL backfilling rate is not limited. If all the 32 threads are running, the backfilling rate reaches 860,000 rows/s.

Performance testing

  • Specifications of the PolarDB-X instance for testing: two CNs and two DNs, each of which has 16 CPU cores and 64 GB of memory.

  • Table schema: the bmsql_order_line table in a database in AUTO mode from TPC-C.

    CREATE TABLE `bmsql_order_line` (
            `ol_w_id` int(11) NOT NULL,
            `ol_d_id` tinyint(4) NOT NULL,
            `ol_o_id` int(11) NOT NULL,
            `ol_number` tinyint(4) NOT NULL,
            `ol_i_id` int(11) NOT NULL,
            `ol_delivery_d` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
            `ol_amount` decimal(6, 2) DEFAULT NULL,
            `ol_supply_w_id` int(11) DEFAULT NULL,
            `ol_quantity` int(11) DEFAULT NULL,
            `ol_dist_info` char(24) COLLATE latin1_bin DEFAULT NULL,
            PRIMARY KEY (`ol_w_id`, `ol_d_id`, `ol_o_id`, `ol_number`)
    ) ENGINE = InnoDB DEFAULT CHARSET = latin1 DEFAULT COLLATE = latin1_bin
    PARTITION BY KEY(`ol_w_id`)
    PARTITIONS 32
  • Test data: You can import the data of TPC-C 10000 warehouse by using Benchmark Boot. A total of about three billion rows of data are used, whose size is 230 GB.

Test 1: Add a local index

Configure DOP parameters and execute the following statement. Run the show ddl status command to view the execution status of the DDL statement.

/*+TDDL:cmd_extra(PURE_ASYNC_DDL_MODE=true,ENABLE_TWO_PHASE_DDL=false,ENBALE_DRDS_MULTI_PHASE_DDL=false,MERGE_DDL_CONCURRENT=true,MERGE_CONCURRENT=true,PREFETCH_SHARDS=xxx)*/ 

ALTER TABLE `bmsql_order_line` ADD LOCAL INDEX i_ol_o_id(ol_o_id);

The following table describes the execution time and resource usage for different parameter settings.

Parameter setting

Execution time (seconds)

Concurrency speedup

Average DN CPU usage (≤ number of CPU cores)

Average DN IOPS usage (≤ 1)

PREFETCH_SHARDS=2 (default setting, DOP per DN: 1)

6,880

1

135%

55%

PREFETCH_SHARDS=4 (DOP per DN: 2)

3,503

1.96

240%

95%

PREFETCH_SHARDS=8 (DOP per DN: 4)

2,514

2.69

470%

100%

PREFETCH_SHARDS=16 (DOP per DN: 8)

2,152

3.19

896%

100%

PREFETCH_SHARDS=32 (DOP per DN: 16)

1,954

3.52

1,590%

100%

Test 2: Add a global index

Configure DOP parameters and execute the following statement. Run the show ddl status command to view the execution status of the DDL statement.

/*+TDDL:cmd_extra(PURE_ASYNC_DDL_MODE=true,SLIDE_WINDOW_TIME_INTERVAL=0,PHYSICAL_TABLE_BACKFILL_PARALLELISM=xxx,GSI_BACKFILL_SPEED_LIMITATION=xxx)*/ ALTER TABLE `bmsql_order_line` ADD GLOBAL INDEX `g_i_ol_o_id`(`ol_o_id`) PARTITION BY HASH(`ol_o_id`) PARTITIONS 32;

Parameter setting

Execution time (seconds)

Concurrency speedup

Average DN CPU usage (≤ number of CPU cores)

Average DN IOPS usage (≤ 1)

Average CN CPU usage (≤ number of CPU cores)

PHYSICAL_TABLE_BACKFILL_PARALLELISM=1, GSI_BACKFILL_SPEED_LIMITATION=150000 (no concurrency per physical table, automatic throttling)

17,441

1

190%

8%

357%

SLIDE_WINDOW_TIME_INTERVAL=0, PHYSICAL_TABLE_BACKFILL_PARALLELISM=4, GSI_BACKFILL_SPEED_LIMITATION=-1 (default settings, no throttling)

7,124

2.45

594%

23%

723%

SLIDE_WINDOW_TIME_INTERVAL=0, PHYSICAL_TABLE_BACKFILL_PARALLELISM=8, GSI_BACKFILL_SPEED_LIMITATION=-1 (eight concurrent threads per physical table, no throttling)

4,020

4.34

1,175%

54%

1,476%