PolarDB O Edition allows you to use the cross-node parallel execution feature to run analytical queries so that you can use some hybrid transaction/analytical processing (HTAP) capabilities. This topic describes how to use the feature to improve the performance of analytical queries.

How it works

For cross-node parallel execution, a query is run on multiple nodes in parallel. The node that coordinates how the query is run is called the query coordinator (QC) node and the other nodes are called parallel execution (PX) nodes. After a query request is initiated, the QC node divides the execution plan of the query into shards and routes the shards to the PX nodes. Each PX node runs the assigned part of the execution plan and sends the query result to the QC node. Then, the QC node aggregates the query results.

How it works

In the preceding figure, RO1 is the QC node and RO2, RO3, and RO4 are the PX nodes. After RO1 receives a query request, RO1 divides the execution plan of the query into three shards and routes the shards to RO2, RO3, and RO4. Each PX node runs its assigned part of the execution plan, obtains the required data blocks from the shared storage system Polar File System (PolarFS), and then sends the query result to the QC node. Then, the QC node aggregates the query results and returns the final result.

Note

Cross-node parallel execution is suitable only for analytical queries that are run at a low frequency because multiple read-only nodes must be used to run each query in parallel.

Use the cross-node parallel execution feature to perform analytical queries at different levels

You can use the cross-node parallel execution feature to perform analytical queries at the following three granularity levels:
  • System level: Configure the relevant parameters to specify whether to enable the cross-node parallel execution feature for all queries and all sessions.
  • Session level: Execute the ALTER SESSION statement or configure the session-level GUC parameters to specify whether to enable the cross-node parallel execution feature for the current session.
  • Query level: Use a hint to specify whether the cross-node parallel execution feature is enabled for a specific query.

Parameters

By default, the cross-node parallel execution feature is disabled in PolarDB O Edition. To use this feature, configure the following parameters.

Parameter Description
polar_cluster_map Queries the names of all the read-only nodes in PolarDB O Edition. You cannot configure this parameter. Each time you add a read-only node, the list of node names returned for the parameter is automatically updated.
polar_px_nodes Specifies the read-only nodes in which your query is run in parallel. This parameter is empty by default. This specifies that all read-only nodes are used for parallel execution. If you want to use specific nodes for parallel execution, set the parameter to the names of the nodes that you want to use. Use commas (,) to separate the node names. The following code provides an example:
SHOW polar_px_nodes ;
 polar_px_nodes
----------------

(1 row)
SET polar_px_nodes='node1,node2';
SHOW polar_px_nodes ;
 polar_px_nodes
----------------
 node1,node2
(1 row)
polar_px_enable_replay_wait Specifies whether to enable strong consistency. Valid values: on and off. A delay exists between the primary node and read-only nodes in your PolarDB O Edition cluster. For example, after the primary node executes a CREATE TABLE statement, this DDL statement is replayed on the read-only nodes so that the read-only nodes are synchronized with the primary node. The replay takes time that causes the delay. If you set the polar_px_enable_replay_wait parameter to on, strong consistency is enabled. After a request is initiated to run a query on multiple read-only nodes in parallel, the request is routed to the read-only nodes. To ensure strong consistency, the read-only nodes replay the latest log record generated before the query request is initiated and all the previous log records. Then, the read-only nodes start to run the query.

The default value is off. This value specifies that strong consistency is disabled. If you set the parameter to off, a long delay exists between the primary node and read-only nodes due to log replay and the read-only nodes may fail to read the latest log records of DDL operations. If you set the polar_px_enable_replay_wait parameter to on, the performance of cross-node parallel execution is reduced.

polar_px_max_workers_number Specifies the maximum number of workers processes that can be run on a node for cross-node parallel execution. The default value is 30. This parameter is used to limit the degree of parallelism on each node. The number of workers processes in all sessions for cross-node parallel execution on each node cannot exceed the value of this parameter.
polar_enable_px Specifies whether to enable the cross-node parallel execution feature. The default value is off. This value specifies that the feature is disabled.
polar_px_dop_per_node Specifies the degree of parallelism for cross-node parallel execution on the current session. The default value is 1. We recommend that you set the value to the number of cores in your system. If you set polar_px_dop_per_node to N, each node uses N px workers processes for a session to run your query in parallel.
px_workers Specifies whether the feature is applied to the specified tables. By default, the feature is applied to no tables. The feature is computing resource-consuming. To reduce the consumed computing resources, you can configure this parameter to specify the tables to which the feature applies. The following code provides an example to describe how to configure this parameter:
--Enable the feature for the table named t1.
ALTER TABLE t1 SET(px_workers=1);

--Disable the feature for the table named t1.
ALTER TABLE t1 SET(px_workers=-1);

--Do not apply the feature to the table named t1. This is the default setting.
ALTER TABLE t1 SET(px_workers=0);

Examples

The following section provides an example to describe the effect of the cross-node parallel execution feature. A single-table query is performed in the example.

Background information

Execute the following statements to create a table named test and insert data into the table:

CREATE TABLE test(id int);
INSERT INTO test SELECT generate_series(1,1000000);
EXPLAIN SELECT * FROM test;

By default, the cross-node parallel execution feature is disabled. The native execution plan for a single-table query is to perform a sequential scan (Seq Scan), as shown in the following code:

                       QUERY PLAN
--------------------------------------------------------
 Seq Scan on test  (cost=0.00..35.50 rows=2550 width=4)
(1 row)

To enable the cross-node parallel execution feature, perform the following steps:

  1. Enable this feature for the table named test:
    ALTER TABLE test SET (px_workers=1);
    SET polar_enable_px=on;
    EXPLAIN SELECT * FROM test;

    The following query result is returned:

                                      QUERY PLAN
    -------------------------------------------------------------------------------
     PX Coordinator 2:1  (slice1; segments: 2)  (cost=0.00..431.00 rows=1 width=4)
       ->  Seq Scan on test (scan partial)  (cost=0.00..431.00 rows=1 width=4)
     Optimizer: PolarDB PX Optimizer
    (3 rows)
  2. Query the names of all the read-only nodes.

    Execute the following statement:

    SHOW polar_cluster_map;

    The following query result is returned:

     polar_cluster_map
    -------------------
     node1,node2,node3
    (1 row)

    In this example, the cluster has three read-only nodes: node1, node2, and node3.

  3. Specify node1 and node2 for cross-node parallel execution.

    Execute the following statement:

    SET polar_px_nodes='node1,node2';

    Query the nodes that are specified for cross-node parallel execution:

    SHOW polar_px_nodes ;

    The following query result is returned:

     polar_px_nodes
    ----------------
     node1,node2
    (1 row)

Performance data

The following performance data is obtained from a test in which five read-only nodes are used for parallel execution.

  • When SELECT COUNT(*) is executed for a full table scan, cross-node parallel execution is 60 times faster than single-node parallel execution.
  • In TPC-H queries, cross-node parallel execution is 30 times faster than single-node parallel execution.
    Note This implementation of TPC-H is derived from the TPC-H Benchmark and is not comparable to published TPC-H Benchmark results, as this implementation does not comply with all the requirements of the TPC-H Benchmark.

Specify a level for cross-node parallel execution

This section describes how to use the cross-node parallel execution feature to perform analytical queries at different granularity levels:

  • System level

    If you want to use the cross-node parallel execution feature at the system level, configure global Grand Unified Configuration (GUC) parameters to enable the feature and to specify the degree of parallelism.

    Examples
    postgres=# alter system set polar_enable_px=1;
    ALTER SYSTEM
    postgres=# alter system set polar_px_dop_per_node=1;
    ALTER SYSTEM
    postgres=# select pg_reload_conf();
     pg_reload_conf
    ----------------
     t
    (1 row)
    postgres=# \c postgres
    You are now connected to database "postgres" as user "postgres".
    
    postgres=# drop table if exists t1;
    DROP TABLE
    postgres=# select id into t1 from generate_series(1, 1000) as id order by id desc;
    SELECT 1000
    postgres=# alter table t1 set (px_workers=1);
    ALTER TABLE
    postgres=# explain (verbose, costs off)   select * from t1 where id < 10;
                    QUERY PLAN
    -------------------------------------------
     PX Coordinator 2:1  (slice1; segments: 2)
       Output: id
       ->  Partial Seq Scan on public.t1
             Output: id
             Filter: (t1.id < 10)
     Optimizer: PolarDB PX Optimizer
    (6 rows)
  • Session level
    If you want to use the cross-node parallel execution feature at the session level, execute the ALTER SESSION statement or configure session-level GUC parameters.
    • ALTER SESSION syntax
      ALTER SESSION ENABLE PARALLEL QUERY
      ALTER SESSION DISABLE PARALLEL QUERY
      ALTER SESSION FORCE PARALLEL QUERY [PARALLER integer]
      Note
      • ALTER SESSION ENABLE PARALLEL QUERY specifies that you can use a hint or the corresponding syntax to enable cross-node parallel execution for the current session.
      • ALTER SESSION DISABLE PARALLEL QUERY specifies that the current session does not support cross-node parallel execution even if a hint or the corresponding syntax is specified.
      • ALTER SESSION FORCE PARALLEL QUERY <PARALLEL integer> specifies that the cross-node parallel execution feature is forcibly enabled for the current session. <PARALLEL integer> specifies the degree of parallelism. If you do not specify <PARALLEL integer>, the default value of your database is used as the degree of parallelism. The default value is the value specified by the polar_px_dop_per_node parameter.

        You can use different methods to specify degrees of parallelism. The degrees of parallelism have different priorities based on the method. The degree of parallelism specified in a hint has the highest priority, the degree of parallelism specified in the FORCE PARALLEL statement has the medium priority, and the degree of parallelism specified by the polar_px_dop_per_node parameter has the lowest priority.

      If you execute the ALTER SESSION statement, the configuration of cross-node parallel execution for the current session is modified. After the current session is reconnected, cross-node parallel execution is enabled for the session by default.

      Examples
      --enable
      postgres=# set polar_enable_px = false;
      SET
      postgres=# set polar_px_enable_hint = true;
      SET
      postgres=# alter session enable parallel query;
      ALTER SESSION
      
      postgres=# explain (verbose, costs off)   select /*+ PARALLEL(4)*/ * from t1 where id < 10;
      INFO:  [HINTS] PX PARALLEL(4) accepted.
                      QUERY PLAN
      -------------------------------------------
       PX Coordinator 8:1  (slice1; segments: 8)
         Output: id
         ->  Partial Seq Scan on public.t1
               Output: id
               Filter: (t1.id < 10)
       Optimizer: PolarDB PX Optimizer
      (6 rows)
      --disable
      postgres=# set polar_enable_px = false;
      SET
      postgres=# set polar_px_enable_hint = true;
      SET
      postgres=# alter session disable parallel query;
      ALTER SESSION
      
      postgres=# explain (verbose, costs off)   select /*+ PARALLEL(4)*/ * from t1 where id < 10;
             QUERY PLAN
      ------------------------
       Seq Scan on public.t1
         Output: id
         Filter: (t1.id < 10)
      (3 rows)
      --force
      postgres=# set polar_enable_px = false;
      SET
      postgres=# set polar_px_enable_hint = false;
      SET
      postgres=# alter session force parallel query;
      ALTER SESSION
      
      postgres=# explain (verbose, costs off)   select * from t1 where id < 10;
                 QUERY PLAN
      -------------------------------------------
       PX Coordinator 2:1  (slice1; segments: 2)
         Output: id
         ->  Partial Seq Scan on public.t1
               Output: id
               Filter: (t1.id < 10)
       Optimizer: PolarDB PX Optimizer
      (6 rows)
      
      postgres=# alter session force parallel query parallel 2;
      ALTER SESSION
      postgres=# explain (verbose, costs off)   select * from t1 where id < 10;
                      QUERY PLAN
      -------------------------------------------
       PX Coordinator 4:1  (slice1; segments: 4)
         Output: id
         ->  Partial Seq Scan on public.t1
               Output: id
               Filter: (t1.id < 10)
       Optimizer: PolarDB PX Optimizer
      (6 rows)
    • Configure GUC parameters

      GUC parameters can be configured at the system or session level. You can configure the GUC parameters to specify whether to enable the cross-node parallel execution feature for the current session.

      Examples
      postgres=# set polar_enable_px = true;
      SET
      postgres=# set polar_px_dop_per_node = 1;
      SET
      postgres=# explain (verbose, costs off)   select * from t1 where id < 10;
                      QUERY PLAN
      -------------------------------------------
       PX Coordinator 2:1  (slice1; segments: 2)
         Output: id
         ->  Partial Seq Scan on public.t1
               Output: id
               Filter: (t1.id < 10)
       Optimizer: PolarDB PX Optimizer
      (6 rows)
  • Query level
    If you want to use the cross-node parallel execution feature at the query level, use SQL hints to enable the feature and specify the degree of parallelism. Use the following hint syntax:
    /*+ PARALLEL(DEFAULT) */
    /*+ PARALLEL(integer) */
    /*+ NO_PARALLEL(tablename) */
    Note
    • PARALLEL(DEFAULT) specifies that cross-node parallel execution is enabled and that the default degree of parallelism is used. The default degree of parallelism is specified by the polar_px_dop_per_node parameter.
    • PARALLEL(integer) specifies that the cross-node parallel execution feature is enabled and the degree of parallelism specified by the integer parameter is used.
    • NO_PARALLEL(tablename) specifies that cross-node parallel execution is disabled for the specified table. For queries that involve this table, cross-node parallel execution is disabled.
    • These hints are compatible with the hints provided by Oracle. If more than one PARALLEL hint is used, take note of the following items:
      • If multiple hint blocks such as/*+.A.*/ /*+.B.*/ /*+.C.*/ are used together, only the first hint block takes effect.
      • If multiple PARALLEL hints exist in a single hint block, such as /*+ parallel(A) parallel(B)*/, the execution result varies based on the degrees of parallelism specified in these hints. If the degrees of parallelism specified in these hints are the same, one of these hints takes effect. Otherwise, none of the hints take effect.
      • If a PARALLEL hint and a NO_PARALLEL hint exist in the same hint block, such as /*+ parallel(A) no_parallel(t1)*/, the NO_PARALLEL hint does not take effect.
    • Cross-node parallel execution supports only PARALLEL and NO_PARALLEL hints.
    • To specify whether cross-node parallel execution is enabled at the query level, configure the polar_px_enable_hint GUC parameter. The default value of this parameter is false.
    Examples
    postgres=# set polar_enable_px = false;
    SET
    postgres=# set polar_px_dop_per_node = 1;
    SET
    postgres=# set polar_px_enable_hint = true;
    SET
    postgres=# explain (verbose, costs off)   select * from t1 where id < 10;
           QUERY PLAN
    ------------------------
     Seq Scan on public.t1
       Output: id
       Filter: (t1.id < 10)
    (3 rows)
    
    postgres=# explain (verbose, costs off)   select /*+ PARALLEL(DEFAULT) */ * from t1 where id < 10;
                    QUERY PLAN
    -------------------------------------------
     PX Coordinator 2:1  (slice1; segments: 2)
       Output: id
       ->  Partial Seq Scan on public.t1
             Output: id
             Filter: (t1.id < 10)
     Optimizer: PolarDB PX Optimizer
    (6 rows)
    
    postgres=# explain (verbose, costs off)   select /*+ PARALLEL(4) */ * from t1 where id < 10;
                    QUERY PLAN
    -------------------------------------------
     PX Coordinator 8:1  (slice1; segments: 8)
       Output: id
       ->  Partial Seq Scan on public.t1
             Output: id
             Filter: (t1.id < 10)
     Optimizer: PolarDB PX Optimizer
    (6 rows)
    
    postgres=# explain (verbose, costs off)   select /*+ PARALLEL(0) */ * from t1 where id < 10;
           QUERY PLAN
    ------------------------
     Seq Scan on public.t1
       Output: id
       Filter: (t1.id < 10)
    (3 rows)
    
    postgres=# explain (verbose, costs off)   select /*+ NO_PARALLEL(t1) */ * from t1 where id < 10;
           QUERY PLAN
    ------------------------
     Seq Scan on public.t1
       Output: id
       Filter: (t1.id < 10)
    (3 rows)
  • Results of specifying different settings for the three levels of cross-node parallel execution
    The following table describes the rules that are applied to the results of your queries when you specify different settings for the three levels of cross-node parallel execution.
    System level Session level Query level Result
    polar_enable_px=on polar_px_dop_per_node=X enable No hint specified Parallel execution with a degree of parallelism X
    polar_enable_px=on polar_px_dop_per_node=X enable PARALLEL(Y) Parallel execution with a degree of parallelism Y
    polar_enable_px=on polar_px_dop_per_node=X enable NO_PARALLEL Serial execution
    polar_enable_px=on polar_px_dop_per_node=X disable No hint specified Serial execution
    polar_enable_px=on polar_px_dop_per_node=X disable PARALLEL(Y) Serial execution
    polar_enable_px=on polar_px_dop_per_node=X disable NO_PARALLEL Serial execution
    polar_enable_px=on polar_px_dop_per_node=X FORCE PARALLER Z No hint specified Parallel execution with a degree of parallelism Z
    polar_enable_px=on polar_px_dop_per_node=X FORCE PARALLER Z PARALLEL(Y) Parallel execution with a degree of parallelism Y
    polar_enable_px=on polar_px_dop_per_node=X FORCE PARALLER Z NO_PARALLEL Serial execution
    polar_enable_px=off polar_px_dop_per_node=X enable No hint specified Serial execution
    polar_enable_px=off polar_px_dop_per_node=X enable PARALLEL(Y) Parallel execution with a degree of parallelism Y
    polar_enable_px=off polar_px_dop_per_node=X enable NO_PARALLEL Serial execution
    polar_enable_px=off polar_px_dop_per_node=X disable No hint specified Serial execution
    polar_enable_px=off polar_px_dop_per_node=X disable PARALLEL(Y) Serial execution
    polar_enable_px=off polar_px_dop_per_node=X disable NO_PARALLEL Serial execution
    polar_enable_px=off polar_px_dop_per_node=X FORCE PARALLER Z No hint specified Parallel execution with a degree of parallelism Z
    polar_enable_px=off polar_px_dop_per_node=X FORCE PARALLER Z PARALLEL(Y) Parallel execution with a degree of parallelism Y
    polar_enable_px=off polar_px_dop_per_node=X FORCE PARALLER Z NO_PARALLEL Serial execution