All Products
Search
Document Center

ApsaraDB for SelectDB:Pipeline execution engine

Last Updated:May 31, 2024

ApsaraDB for SelectDB supports the Pipeline execution engine in push mode. The Pipeline execution engine reduces the execution overheads that are caused by thread switching and thread blocking, improves CPU utilization, and reduces the resource congestion of large queries on small queries. This improves the efficiency of CPU execution on mixed-load SQL queries and the performance of SQL queries.

Overview

The Pipeline execution engine is a new core feature of ApsaraDB for SelectDB of the new version. The Pipeline execution engine is designed to replace the old execution engine of ApsaraDB for SelectDB, make full use of the computing power of multi-core CPUs, and limit the number of query threads in ApsaraDB for SelectDB to solve the problem of execution thread bloat in ApsaraDB for SelectDB. For more information about the design, implementation, and effects of the Pipeline execution engine, see Support Pipeline Exec Engine.

How it works

The old SQL execution engine of ApsaraDB for SelectDB is designed based on the traditional volcano model, which has the following issues in a single-machine multi-core scenario:

  • Query performance cannot be improved by exploiting the computing power of multi-core CPUs. In most cases, you need to manually set the degree of parallelism to optimize query performance. However, it is difficult to set the degree of parallelism in a production environment.

  • Each instance of a standalone query corresponds to one thread of the thread pool, which causes the following issues:

    • If the thread pool reaches the maximum capacity, the query engine of SelectDB enters a pseudo-deadlock and cannot respond to subsequent queries. At the same time, a logical deadlock may occur. For example, all threads are executing a probe task on an instance.

    • Blocking operators occupy thread resources. As a result, thread resources cannot be allocated to schedulable instances and the overall resource utilization is low.

  • Blocking operators rely on the thread scheduling mechanism of the operating system, and high overheads are caused by thread switching, especially in system mixing scenarios.

To solve the preceding issues, ApsaraDB for SelectDB provides an execution engine that fits the architecture of modern multi-core CPUs.

Based on the features of multi-core CPUs, the Pipeline execution engine is redesigned and becomes a data-driven execution engine. This improves the efficiency of CPU execution on mixed-load SQL queries and the performance of SQL queries. The following figure shows the principle of the Pipeline execution engine.

pipeline-execution-engine-2067b3a862de84a561eabefd41d48a56.png

  1. The traditional pull execution engine is transformed into a push execution engine.

  2. Blocking operations become asynchronous operations, which reduces the execution overhead caused by thread switching and thread blocking and improves CPU utilization.

  3. The maximum number of threads to be executed is limited. The resource congestion of large queries on small queries is reduced in mixed-load query scenarios by controlling time slice switching.

How to use the Pipeline execution engine

Configure the session variables

  • enable_pipeline_engine

    If you set the enable_pipeline_engine session variable to true, the backend (BE) node of a cluster uses the Pipeline execution engine during query execution. By default, this session variable is set to true.

    set enable_pipeline_engine = true;
  • parallel_pipeline_task_num

    The parallel_pipeline_task_num session variable specifies the number of Pipeline tasks that are concurrently executed for a SQL query. By default, this session variable is set to 0. In this case, ApsaraDB for SelectDB automatically sets the number of Pipeline tasks to half the number of CPU cores. You can also modify the session variable based on your business requirements.

    set parallel_pipeline_task_num = 0;

    You can set the max_instance_num session variable to specify the maximum concurrency that is automatically set. By default, this session variable is set to 64.