All Products
Search
Document Center

:Use the Elastic Parallel Query feature to run analytical queries

Last Updated:Nov 27, 2023

PolarDB for PostgreSQL allows you to use the Elastic Parallel Query feature to run analytical queries so that you can use some hybrid transaction/analytical processing (HTAP) capabilities. This topic describes how to use the feature to improve the performance of analytical queries.

How the Elastic Parallel Query feature works

In Elastic Parallel Query, a query is run on multiple nodes in parallel. The node that coordinates how the query is run is called the query coordinator (QC) node, and the other nodes are called parallel execution (PX) nodes. After a query request is initiated, the QC node divides the execution plan of the query into shards and routes the shards to the PX nodes. Each PX node runs the assigned part of the execution plan and sends the query result to the QC node. Then, the QC node aggregates the query results.

How the cross-node parallel execution feature works

In the preceding figure, RO1 is the QC node and RO2, RO3, and RO4 are the PX nodes. After RO1 receives a query request, RO1 divides the execution plan of the query into three shards and routes the shards to RO2, RO3, and RO4. Each PX node runs its assigned part of the execution plan, obtains the required data blocks from the shared storage system Polar File System (PolarFS), and then sends the query results to the QC node. Then, the QC node combines the query results and returns the final result.

Precautions

Because multiple read-only nodes must be used to run each query in parallel, Elastic Parallel Query is suitable only for analytical queries that are run infrequency.

Parameters

By default, the Elastic Parallel Query feature is disabled in PolarDB for PostgreSQL. To use this feature, configure the parameters described in the following figure.

Parameter

Parameters

polar_cluster_map

Queries the names of all the read-only nodes in PolarDB for PostgreSQL. You cannot configure this parameter. Each time you add a read-only node, the list of node names returned for the parameter is automatically updated.

Note

This parameter is available only for clusters that use the minor kernel version 1.1.20 (released in January 2022) or earlier.

polar_px_nodes

Specifies the read-only nodes in which your query is run on multiple nodes. This parameter is empty by default, which indicates that all read-only nodes are used for parallel execution. If you want to use specific nodes for parallel execution, set the parameter to the names of the nodes that you want to use. Use commas (,) to separate the node names. Example:

SHOW polar_px_nodes ;
 polar_px_nodes
----------------

(1 row)
SET polar_px_nodes='node1,node2';
SHOW polar_px_nodes ;
 polar_px_nodes
----------------
 node1,node2
(1 row)

polar_px_enable_replay_wait

Specifies whether to enable strong consistency. Valid values: on and off. A delay exists between the primary node and read-only nodes in your PolarDB for PostgreSQL cluster. For example, after the primary node executes a CREATE TABLE statement, this DDL statement is replayed on the read-only nodes so that the read-only nodes are synchronized with the primary node. This replay causes the delay. If you set the polar_px_enable_replay_wait parameter to on, strong consistency is enabled. After a request is initiated to run a query on multiple read-only nodes in parallel, the request is routed to the read-only nodes. To ensure strong consistency, the read-only nodes replay the latest log record generated before the query request is initiated and all the previous log records. Then, the read-only nodes start to run the query.

The default value is off, which indicates that strong consistency is disabled. If you set the parameter to off, a long delay exists between the primary node and read-only nodes due to log replay and the read-only nodes may fail to read the latest log records of DDL operations. If you set the polar_px_enable_replay_wait parameter to on, the performance of Elastic Parallel Query is reduced.

You can also set this parameter to on for a specified database role.

polar_px_max_workers_number

Specifies the maximum number of workers processes that can be run on a node for Elastic Parallel Query. The default value is 30. This parameter is used to limit the degree of parallelism on each node. The number of workers processes for each node in all sessions for Elastic Parallel Query cannot exceed the value of this parameter.

polar_enable_px

Specifies whether to enable the Elastic Parallel Query feature. The default value is off, which indicates that the feature is disabled.

polar_px_dop_per_node

Specifies the degree of parallelism for Elastic Parallel Query on the current session. The default value is 1. We recommend that you set the value to the number of cores in your operating system. If you set polar_px_dop_per_node to N, each node uses N px workers processes for a session to run your query in parallel.

px_workers

Specifies whether the feature is applied to the specified tables. By default, the feature is applied to no tables. The feature is computing resource-consuming. To reduce the consumed computing resources, you can configure this parameter to specify the tables to which the feature applies. The following code provides an example to describe how to configure this parameter:

--Enable the feature for the table named t1.
ALTER TABLE t1 SET(px_workers=1);

--Disable the feature for the table named t1.
ALTER TABLE t1 SET(px_workers=-1);

--Do not apply the feature to the table named t1. This is the default setting.
ALTER TABLE t1 SET(px_workers=0);

synchronous_commit

Specifies whether a transaction needs to wait for WAL logs to be written to the disk before it returns a success message to the client when the database commits the transaction. Default value: on. Valid values:

  • off: A transaction does not wait for WAL logs to be written to the disk before it returns a success message to the client. 0, false or no can be used to indicate off.

  • on: A transaction waits for WAL logs to be written to the disk before it returns a success message to the client. true, yes, or 1 can be used to indicate on.

  • local: WAL logs are written to the local disk.

  • remote_write: After WAL logs are sent to the remote host, the remote host confirms writes.

  • remote_apply: The local host can commit another transaction until the transaction received by the remote host is replayed.

Note

The parameter is set to on in PX mode.

Examples

The following section describes the effect of the Elastic Parallel Query feature. In the example, a single-table query is performed.

Background information

Execute the following statements to create a table named test and insert data into the table:

CREATE TABLE test(id int);
INSERT INTO test SELECT generate_series(1,1000000);
EXPLAIN SELECT * FROM test;

By default, the Elastic Parallel Query feature is disabled. The native execution plan for a single-table query is to perform a sequential scan (Seq Scan), as shown in the following code:

                       QUERY PLAN
--------------------------------------------------------
 Seq Scan on test  (cost=0.00..35.50 rows=2550 width=4)
(1 row)

To enable the Elastic Parallel Query feature, perform the following steps.

  1. Enable this feature for the table named test:

    ALTER TABLE test SET (px_workers=1);
    SET polar_enable_px=on;
    EXPLAIN SELECT * FROM test;

    The following query result is returned:

                                      QUERY PLAN
    -------------------------------------------------------------------------------
     PX Coordinator 2:1  (slice1; segments: 2)  (cost=0.00..431.00 rows=1 width=4)
       ->  Seq Scan on test (scan partial)  (cost=0.00..431.00 rows=1 width=4)
     Optimizer: PolarDB PX Optimizer
    (3 rows)
  2. Query the names of all the read-only nodes.

    Execute the following statement:

    SHOW polar_cluster_map;

    The following query result is returned:

     polar_cluster_map
    -------------------
     node1,node2,node3
    (1 row)

    In this example, the cluster has three read-only nodes: node1, node2, and node3.

  3. Specify node1 and node2 for Elastic Parallel Query.

    Execute the following statement:

    SET polar_px_nodes='node1,node2';

    Query the nodes that are specified for Elastic Parallel Query:

    SHOW polar_px_nodes ;

    The following query result is returned:

     polar_px_nodes
    ----------------
     node1,node2
    (1 row)

Performance data

The following performance data is obtained from a test in which five read-only nodes are used for parallel execution.

  • When SELECT COUNT(*) is executed for a full table scan, Elastic Parallel Query is 60 times faster than single-node parallel execution.

  • In TPC-H queries, Elastic Parallel Query is 30 times faster than single-node parallel execution.

    Note In this example, a test based on the TPC-H benchmark is implemented, but it does not meet all the requirements of a TPC-H benchmark test. Therefore, the test results may be incomparable with the published results of the TPC-H benchmark test.