All Products
Search
Document Center

ApsaraDB for SelectDB:Query acceleration

Last Updated:Apr 10, 2025

ApsaraDB for SelectDB uses the Nereids optimizer and the Pipeline execution engine to automatically optimize queries. To meet requirements for high-performance queries, you can manually optimize queries. For example, you can use index-based acceleration, high-concurrency point queries, materialized views, and join optimization.

Automatic query optimization planning

In SelectDB, the Nereids optimizer and the Pipeline execution engine are core query processing technologies that can be used for deep optimization in the query optimization and execution phases. The technologies significantly improve complex query performance and resource usage. The following figure shows how the technologies optimize an SQL statement in SelectDB.

image

The following table describes the main features of each technology. For more information, see relevant documentation.

Technology

Feature description

Nereids

  • Supports complex queries such as multi-level nested subqueries and multi-table join queries.

  • Reduces the possibility of logic errors in optimization rules.

Pipeline execution engine

  • Improves the CPU utilization.

  • Mitigates resource contention between large and small queries.

Manual query optimization

If the Nereids optimizer and the Pipeline execution engine of SelectDB do not meet your query requirements, you can use statistics to analyze the query data and select an appripriate policy to optimize queries.

Optimization policy

Scenario

Limit

High-concurrency point query

You want to optimize high-concurrency point queries.

Note

A point query is to retrieve a small amount of data from a database that meets specific conditions. In most cases, primary keys or high cardinality columns are used to perform the retrieving.

  • Enabling row storage mode consumes additional storage.

  • PreparedStatement only supports primary key-based point queries.

Materialized views

You want to optimize complex queries that are repeative and time-consuming.

  • In a table that uses the Unique Key model, you can use materialized views to change the column order but cannot use them as aggregate functions. Therefore, you cannot create materialized views for tables that use the Unique Key model to perform coarse-grained aggregate operations.

  • If an excessively large number of materialized views are created in a table, data import efficiency is affected.

For more information, see Limits.

Index-based acceleration

You want to quickly query or position data in any scenario.

The limits vary based on the index type. For more information, see Index-based acceleration.

Bucket Shuffle Join

You want to optimize join queries.

The join condition must contain the distributed column of the left table, and the left table only uses data from a single partition during the execution.

Colocation join

You want to optimize join queries.

The join condition must contain the distributed column of the left table, and the left and right tables belong to the same Colocate Group.

Runtime filter

Scenarios in which the process for a large table to join a small table needs to be optimized.

When you use a Runtime filter, the JOIN statement must meet the following requirements:

  • The left table is large, and the right table is small. When you build a Runtime filter, you must assume computing costs, which include memory overhead.

  • The join result set size is small. A small result set size indicates that the JOIN statement filters out most of the data from the left table.

Bitmap-based accurate deduplication

  • Scenarios in which deduplication results must be accurate and millions of data records are involved.

  • Scenarios in which storage resources are sufficient.

  • Limits on data types:

    • Integers of the TINYINT, SMALLINT, INT, and BIGINT types can be directly used.

    • Non-integers, such as strings, can be used after they are mapped to integers by using a global dictionary. This increases maintenance costs.

  • Limits on the memory: If the data cardinality is high, such as billions of data records, Bitmap needs to store a large bit array. This significantly increases memory usage and may lead to insufficient memory resources.

Approximate deduplication by using the HLL feature

  • Scenarios in which massive amounts of data, such as billions of data records, needs to be processed and a specific margin of error, such as an inaccurate analytical metric value, is allowed in the result.

  • Scenarios in which efficient result merging is required in distributed systems.

  • Accuracy loss: You cannot obtain accurate results. The error rate decreases as data cardinality increases. For example, the higher the data cardinality, the relatively lower the error rate.

  • Appropriate parameter configuration: The HyperLogLog (HLL) accuracy is determined by parameters, such as parameters that specify the numbers of registers and hash bits. Improper parameter settings may affect the result accuracy.