Statistical analysis of specified columns in wide table mode - PolarDB

A wide table can have tens or even hundreds of columns. Queries often need to analyze only a few of these columns. You can use a columnstore index to speed up these queries.

Background information

In many SaaS business systems, a table may contain dozens or even hundreds of columns, which brings technical challenges for queries.

Only a few columns need to be analyzed in some queries, but many irrelevant columns are read when the row-oriented storage engine is used. This increases the I/O burden on the system.
The query mode is not fixed. Multiple filter conditions can be used when you query hundreds of columns. When you build a combined index, you must match all query scenarios in advance. If the query conditions change, the combined index becomes invalid.

Columnstore indexes are well-suited for these two scenarios. Because columnstore indexes are column-based, reading one column does not affect others. The columns are also independent, so the order of multiple filter conditions does not impact the effectiveness of the columnstore index.

Results

With a dataset of 100 million rows and a degree of parallelism of 4, queries that use a columnstore index perform 30 times faster than native parallel execution in PostgreSQL.

Query	Parallel execution in the native PostgreSQL	Columnstore index
Q1	243 s	7.9 s

Procedure

Step 1: Prepare the environment

Ensure that your cluster version and configuration meet the following requirements:
- Cluster versions:
  - PostgreSQL 16 (minor engine version 2.0.16.8.3.0 or later)
  - PostgreSQL 14 (minor engine version 2.0.14.10.20.0 or later)
  Note
  You can view the minor engine version number in the console or by running the SHOW polardb_version; statement. If the minor engine version does not meet the requirement, you must upgrade the minor engine version.
- The source table must have a primary key. The primary key column must be included when you create the columnstore index.
- The wal_level parameter must be set to logical. This adds the information required to support logical replication to the write-ahead logging (WAL).
  Note
  You can set the wal_level parameter in the console. Modifying this parameter restarts the cluster. Plan your business operations accordingly and proceed with caution.

Enable the columnstore index feature.

The method for enabling the columnstore index feature varies depending on the minor engine version of your PolarDB for PostgreSQL cluster:

PostgreSQL 16 (2.0.16.9.8.0 or later) or PostgreSQL 14 (2.0.14.17.35.0 or later)

For PolarDB for PostgreSQL clusters with these versions, two methods are available. The differences are outlined in the following table. You can choose the method that best fits your needs.

Comparison item	[Recommended] Add a columnstore index read-only node	Directly use the pre-installed columnstore index extension
Method	You can add a columnstore index node manually through a visual interface in the console.	No action is required. You can use the extension directly.
Resource allocation	The columnstore engine uses all resources exclusively and can fully utilize the available memory.	The columnstore engine can use only 25% of the memory. The remaining memory is allocated to the row store engine.
Business impact	Transactional processing (TP) and analytical processing (AP) workloads are isolated on different nodes and do not affect each other.	TP and AP workloads run on the same node and affect each other.
Costs	In-Memory Column Index (IMCI) read-only nodes incur additional charges and are billed at the same rate as regular compute nodes.	No additional cost.

Add a columnstore index read-only node

You can add a columnstore index read-only node in one of the following two ways:

Note

The cluster must contain at least one read-only node. You cannot add a columnstore index read-only node to a single-node cluster.

Add in the console

Log on to the PolarDB console and select the region where the cluster is located. You can open the Add/Remove Node wizard in one of the following ways:
- On the Clusters page, click Add/Remove Node in the Actions column.
- On the Basic Information page of the target cluster, click Add/Remove Node in the Database Nodes section.
Select Add Read-only IMCI Node and click OK.
On the cluster upgrade/downgrade page, add the columnstore index read-only node and complete the payment.
1. Click Add an IMCI Node and select the node specifications.
2. Select a switchover time.
3. (Optional) Review the Product Terms of Service and Service Level Agreement.
4. Click Buy Now.
After the payment is complete, return to the cluster details page and wait for the columnstore index read-only node to be added. The node is added when its status changes to Running.

Add during purchase

On the PolarDB purchase page, select the number of Nodes for the IMCI Read-Only Nodes parameter.

PostgreSQL 16 (2.0.16.8.3.0 to 2.0.16.9.8.0) or PostgreSQL 14 (2.0.14.10.20.0 to 2.0.14.17.35.0)

For PolarDB for PostgreSQL clusters with these versions, the columnstore index is deployed as the polar_csi extension in the database cluster. Before you can use the columnstore index, you must create the extension in the specified database.

Note

The polar_csi extension is scoped to the database level. To use the columnstore index in multiple databases within a cluster, you must create the polar_csi extension for each database.
The database account used to install the extension must be a privileged account.

You can install the polar_csi extension in one of the following two ways.

Install from the console

Log on to the PolarDB console. In the navigation pane on the left, click Clusters. Select the region where your cluster is located, and then click the cluster ID to go to the cluster details page.
In the navigation pane on the left, choose Settings and Management > Extension Management. On the Extension Management tab, select Uninstalled Extensions.
In the upper-right corner of the page, select the target database. In the row for the polar_csi extension, click Install in the Actions column. In the Install Extension dialog box that appears, select the target Database Account and click OK to install the extension in the target database.

Install from the command line

Connect to the database cluster and execute the following statement in a target database where you have the required permissions to create the polar_csi extension.

CREATE EXTENSION polar_csi;

Step 2: Prepare the data

This example uses a table named widecolumntable that contains 24 columns with data types such as BIGINT, DECIMAL, TEXT, JSONB, and TEXT[]. This example analyzes five columns (id_1, domain, consumption, start_time, and end_time) to calculate the total amount spent by each customer across multiple domains in the past year.

Create a table named widecolumntable with the following table schema. Then, insert test data as needed.

CREATE TABLE widecolumntable (
  id_1 BIGINT NOT NULL PRIMARY KEY,
  id_2 BIGINT,
  id_3 BIGINT,
  id_4 BIGINT,
  id_5 BIGINT,
  id_6 BIGINT,
  version INT,
  domain TEXT,
  consumption DECIMAL(18,3),
  c_level CHARACTER varying(1) NOT NULL,
  priority BIGINT,
  operator TEXT,
  notify_policy TEXT,
  call_id UUID NOT NULL,
  provider_id BIGINT NOT NULL,
  name_1 TEXT NOT NULL,
  name_2 TEXT NOT NULL,
  name_3 TEXT,
  start_time TIMESTAMP WITH TIME ZONE NOT NULL,
  end_time TIMESTAMP WITH TIME ZONE NOT NULL,
  comment JSONB NOT NULL,
  description TEXT[] NOT NULL,
  created_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL,
  updated_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL
);

Add the id_1, domain, consumption, start_time, and end_time columns to the columnstore index.

CREATE INDEX idx_wide_csi ON widecolumntable USING CSI(id_1, domain, consumption,  start_time, end_time);

Step 3: Run the query

Use different execution engines to run a query that calculates the total amount spent by each customer in different domains over the past year and sorts the results by the total amount.

Use the columnstore index.

--- Enable the columnstore index and set the degree of parallelism to 4.
SET polar_csi.enable_query to on;
SET polar_csi.exec_parallel to 4;

---Q1
EXPLAIN ANALYZE SELECT id_1, domain,SUM(consumption) FROM widecolumntable WHERE start_time > '20230101' and end_time < '20240101' GROUP BY id_1, domain Order By SUM(consumption);

Disable the columnstore index and use the row store engine.

--- Disable the columnstore index, use the row-store engine, and set the degree of parallelism to 4.
SET polar_csi.enable_query to off;
SET max_parallel_workers_per_gather to 4;

---Q1
EXPLAIN ANALYZE SELECT id_1, domain,SUM(consumption) FROM widecolumntable WHERE start_time > '20230101' and end_time < '20240101' GROUP BY id_1, domain Order By SUM(consumption);