This topic describes how to use ePQ parallel scans and batch writes for the statement to create materialized views.
Prerequisites
Your PolarDB for PostgreSQL cluster runs one of the following database engine versions:
PolarDB for PostgreSQL 14 with revision version 2.0.14.6.6.0 or later
PolarDB for PostgreSQL 11 with revision version 2.0.11.9.22.0 or later
You can check the revision version of your cluster in the PolarDB console or by executing the SHOW polardb_version; statement. If the revision version does not meet the requirements,update it.
Background information
Materialized views are database objects that contain query results. Unlike normal views, materialized views store both view definitions and data copies when they are created. If the data in the materialized view is inconsistent with that in the view definition, you can refresh the materialized view to keep the data consistent. Materialized views essentially pre-compute queries in view definitions so that they can be reused during queries.
The CREATE TABLE AS statement is used to create a table which uses the same structure as the output columns of a query.
The SELECT INTO statement is used to create a new table and write the queried data to the table instead of returning the queried data to the client. The new table uses exactly the same structure as the output columns of the query.
How it works
The PostgreSQL kernel uses the same set of code logics to handle the statements to create and refresh materialized views and the CREATE TABLE AS and SELECT INTO statements, because these statements perform similar steps at the database level. The following steps are usually included:
Data scans: Execute the query specified in the view definition or in the
CREATE TABLE ASorSELECT INTOstatement to scan the data that meets the conditions.Data writes: Write the data scanned in the preceding step to a new materialized view or a new table.
To optimize the preceding steps, PolarDB for PostgreSQL introduces ePQ parallel scans and batch writes. When a large amount of data are to be scanned or written, ePQ parallel scans and batch writes can improve the performance of these DDL statements and shorten the execution time.
ePQ parallel scans: The ePQ feature allows you to use the I/O bandwidth and computing resources of multiple compute nodes to execute queries in view definitions in parallel. This improves the utilization of computing resources and bandwidth.
Batch writes: A certain number of tuples that are accumulated in the memory are written to the table or materialized view at a time. This reduces the overheads for recording WAL logs and the frequency of locking pages.
Usage
ePQ parallel scans
You can use the
polar_px_enable_create_table_asparameter to specify whether to enable ePQ parallel scans to speed up queries for theCREATE TABLE ASstatement. Default value: ON. Valid values:ON: enables ePQ parallel scans to speed up queries for the
CREATE TABLE ASstatement.OFF: disables ePQ parallel scans to speed up queries for the
CREATE TABLE ASstatement.
SET polar_px_enable_create_table_as = ON;NoteThis parameter is valid only if the ePQ feature is enabled (the
polar_enable_pxparameter is set to ON).Due to limits of the ePQ feature, you cannot enable ePQ parallel scans to speed up queries for the
CREATE TABLE AS ... WITH OIDSstatement. The processing for the CREATE TABLE AS ... WITH OIDS statement: use the built-in optimizer of PostgreSQL to generate an execution plan for the query in the DDL definition and use the single-node executor of PostgreSQL to perform the query.
Batch writes
You can use the
polar_enable_create_table_as_bulk_insertparameter to specify whether to enable batch writes for theSELECT INTOstatement. Default value: ON. Valid values:ON: enables batch writes for the
SELECT INTOstatement.OFF: disables batch writes for the
SELECT INTOstatement.
SET polar_enable_create_table_as_bulk_insert = ON;