A wide table can have tens or even hundreds of columns. Queries often need to analyze only a few of these columns. You can use a columnstore index to speed up these queries.
Background information
In many SaaS business systems, a table may contain dozens or even hundreds of columns, which brings technical challenges for queries.
Only a few columns need to be analyzed in some queries, but many irrelevant columns are read when the row-oriented storage engine is used. This increases the I/O burden on the system.
The query mode is not fixed. Multiple filter conditions can be used when you query hundreds of columns. When you build a combined index, you must match all query scenarios in advance. If the query conditions change, the combined index becomes invalid.
Columnstore indexes are well-suited for these two scenarios. Because columnstore indexes are column-based, reading one column does not affect others. The columns are also independent, so the order of multiple filter conditions does not impact the effectiveness of the columnstore index.
Results
With a dataset of 100 million rows and a degree of parallelism of 4, queries that use a columnstore index perform 30 times faster than native parallel execution in PostgreSQL.
Query | Parallel execution in the native PostgreSQL | Columnstore index |
Q1 | 243 s | 7.9 s |
Procedure
Step 1: Prepare the environment
Ensure that your cluster version and configuration meet the following requirements:
Cluster versions:
PostgreSQL 16 (minor engine version 2.0.16.8.3.0 or later)
PostgreSQL 14 (minor engine version 2.0.14.10.20.0 or later)
NoteYou can view the minor engine version number in the console or by running the
SHOW polardb_version;statement. If the minor engine version does not meet the requirement, you must upgrade the minor engine version.The source table must have a primary key. The primary key column must be included when you create the columnstore index.
The
wal_levelparameter must be set tological. This adds the information required to support logical replication to the write-ahead logging (WAL).NoteYou can set the wal_level parameter in the console. Modifying this parameter restarts the cluster. Plan your business operations accordingly and proceed with caution.
Enable the columnstore index feature.
The method for enabling the columnstore index feature varies depending on the minor engine version of your PolarDB for PostgreSQL cluster:
Step 2: Prepare the data
This example uses a table named widecolumntable that contains 24 columns with data types such as BIGINT, DECIMAL, TEXT, JSONB, and TEXT[]. This example analyzes five columns (id_1, domain, consumption, start_time, and end_time) to calculate the total amount spent by each customer across multiple domains in the past year.
Create a table named
widecolumntablewith the following table schema. Then, insert test data as needed.CREATE TABLE widecolumntable ( id_1 BIGINT NOT NULL PRIMARY KEY, id_2 BIGINT, id_3 BIGINT, id_4 BIGINT, id_5 BIGINT, id_6 BIGINT, version INT, domain TEXT, consumption DECIMAL(18,3), c_level CHARACTER varying(1) NOT NULL, priority BIGINT, operator TEXT, notify_policy TEXT, call_id UUID NOT NULL, provider_id BIGINT NOT NULL, name_1 TEXT NOT NULL, name_2 TEXT NOT NULL, name_3 TEXT, start_time TIMESTAMP WITH TIME ZONE NOT NULL, end_time TIMESTAMP WITH TIME ZONE NOT NULL, comment JSONB NOT NULL, description TEXT[] NOT NULL, created_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL, updated_at TIMESTAMP WITH TIME ZONE DEFAULT now() NOT NULL );Add the
id_1,domain,consumption,start_time, andend_timecolumns to the columnstore index.CREATE INDEX idx_wide_csi ON widecolumntable USING CSI(id_1, domain, consumption, start_time, end_time);
Step 3: Run the query
Use different execution engines to run a query that calculates the total amount spent by each customer in different domains over the past year and sorts the results by the total amount.
Use the columnstore index.
--- Enable the columnstore index and set the degree of parallelism to 4. SET polar_csi.enable_query to on; SET polar_csi.exec_parallel to 4; ---Q1 EXPLAIN ANALYZE SELECT id_1, domain,SUM(consumption) FROM widecolumntable WHERE start_time > '20230101' and end_time < '20240101' GROUP BY id_1, domain Order By SUM(consumption);Disable the columnstore index and use the row store engine.
--- Disable the columnstore index, use the row-store engine, and set the degree of parallelism to 4. SET polar_csi.enable_query to off; SET max_parallel_workers_per_gather to 4; ---Q1 EXPLAIN ANALYZE SELECT id_1, domain,SUM(consumption) FROM widecolumntable WHERE start_time > '20230101' and end_time < '20240101' GROUP BY id_1, domain Order By SUM(consumption);





