PolarDB for PostgreSQL supports two methods for enabling and using columnstore indexes: adding a columnstore index read-only node in the console or directly using the columnstore index extension. You can choose the method that best suits your business scenario.
Applicability
Cluster versions:
PostgreSQL 16 (minor engine version 2.0.16.8.3.0 or later)
PostgreSQL 14 (minor engine version 2.0.14.10.20.0 or later)
NoteYou can view the minor engine version number in the console or by running the
SHOW polardb_version;statement. If the minor engine version does not meet the requirement, you must upgrade the minor engine version.The source table must have a primary key. The primary key column must be included when you create the columnstore index.
The
wal_levelparameter must be set tological. This adds the information required to support logical replication to the write-ahead logging (WAL).NoteYou can set the wal_level parameter in the console. Modifying this parameter restarts the cluster. Plan your business operations accordingly and proceed with caution.
Limits
You can create only one columnstore index for each table.
Modifying a columnstore index is not supported. To add a column to a columnstore index, you must recreate the index.
Enable the columnstore index feature
The method for enabling the columnstore index feature varies depending on the minor engine version of your PolarDB for PostgreSQL cluster:
Create a columnstore index
1. Create a columnstore index for specific columns or all columns
Syntax
-- Create an index for specific columns
CREATE INDEX [ CONCURRENTLY ] csi_index_name ON table_name USING CSI(column1,column2,...);
-- Create an index for all columns
CREATE INDEX [ CONCURRENTLY ] csi_index_name ON table_name USING CSI;Parameters
Parameter | Description |
CONCURRENTLY | Optional. Creates the index in concurrent mode. This does not block write operations on the source table and is suitable for creating indexes on existing online tables. Note
|
csi_index_name | The custom name of the columnstore index. |
table_name | The name of the table for which to create the columnstore index. |
column1,column2,... | The specific columns of the table to include in the columnstore index. If you do not specify columns, an index is created for all columns. |
Example
Prepare a basic test table named
sales.CREATE TABLE sales (sale_id int primary key, name CHAR(10), amount int);Create the columnstore index. Because you can create only one columnstore index per table, run one of the following statements to create a columnstore index for the specified table.
Create a columnstore index for specific columns
CREATE INDEX idx_csi_sales ON sales USING CSI(sale_id, amount);Create a columnstore index for all columns
CREATE INDEX idx_csi_sales ON sales USING CSI;Create a columnstore index for specific columns in concurrent mode
Creating a columnstore index in concurrent mode does not block write operations on the source table. This method is suitable for creating an index on an existing online table.
CREATE INDEX CONCURRENTLY idx_csi_sales ON sales USING CSI(sale_id, amount);After the index is created, you can run the following statement to view the index information:
SELECT * FROM pg_indexes WHERE tablename = 'sales';
2. View the index creation progress
Creating a columnstore index for a large table can be time-consuming. You can check the creation progress using the following method. You can also adjust relevant parameters to improve the creation speed.
SELECT * FROM pg_stat_progress_create_index;3. (Optional) Cancel index creation
If creating an index takes a long time and the process affects your existing services, you can use pg_cancel_backend or pg_terminate_backend to cancel the index creation. These functions require a pid, which you can obtain by following the instructions in the 2. View the index creation progress section.
SELECT pg_cancel_backend(pid);
SELECT pg_terminate_backend(pid);4. View the index size
Use the pg_relation_size function to view the index size. You can combine it with the pg_size_pretty function to display the size in a more readable format.
SELECT * FROM pg_size_pretty(pg_relation_size('csi_index_name'));For example, you can view the storage size of the previously created columnstore index.
SELECT * FROM pg_size_pretty(pg_relation_size('idx_csi_sales'));5. (Optional) Configure real-time data for the columnstore index
PolarDB for PostgreSQL clusters support creating columnstore indexes on historical data and automatically synchronize new data from the row-oriented table to the columnstore index. After you create a columnstore index for a table using the CREATE INDEX statement, new data is written to the columnstore index in real time. No extra operations are required for this process.
To meet the real-time data requirements of certain business scenarios, you can adjust the polar_csi.update_interval and polar_csi.update_batch_count parameters to improve the real-time performance of the columnstore index.
Use a columnstore index
1. Configure parameters
After you create a columnstore index, you can use the following parameters to control whether SQL queries use the columnstore index.
Parameter | Description |
| Specifies whether to allow queries to use the columnstore index. The valid values are:
|
| The query cost threshold for using the columnstore index. If the query cost is less than the current threshold, the row-store engine is used. Otherwise, the columnstore engine is used. Note If you set this parameter to 0, all queries prioritize using the columnstore index. We do not recommend setting the value to 0 in a production environment. Adjust this value dynamically based on the load of the row-store and columnstore engines. |
2. Configure the scope of the parameters
Global scope
You can use the console to set the polar_csi.enable_query parameter to on. This allows all search statements in all databases to attempt to use the columnstore index.
Session level
All queries within the session attempt to use the columnstore index.
SET polar_csi.enable_query = ON;SQL statement level
A single SQL query within a session can use the columnstore index using a hint.
This feature depends on the pg_hint_plan extension. You must run CREATE EXTENSION pg_hint_plan; in the target database to install this extension.
Set only
polar_csi.enable_query.-- Allow the query to use the columnstore index /*+ SET (polar_csi.enable_query on) */ SELECT COUNT(*) FROM sales;Set
polar_csi.enable_queryandpolar_csi.cost_threshold.-- Allow the query to use the columnstore index and set the query cost threshold to 0 /*+SET (polar_csi.enable_query on) SET(polar_csi.cost_threshold 0)*/ SELECT COUNT(*) FROM sales;
3. Check if a SQL statement uses the columnstore index
You can use tools such as EXPLAIN or EXPLAIN ANALYZE to print the query plan. If the plan contains the CSI Executor keyword, the query is using the columnstore index.
EXPLAIN SELECT COUNT(*) FROM sales;The following result is returned:
QUERY PLAN
--------------
CSI Executor
(1 row)4. (Optional) Configure the query consistency level for the columnstore index
The columnstore index in PolarDB for PostgreSQL supports two query consistency levels to meet different business needs.
Eventual consistency (default): Suitable for queries with high write loads but low real-time data requirements.
Strong consistency: Returns query results only after the data in the column store is fully consistent with the data in the row store.
The real-time performance of the columnstore index in PolarDB for PostgreSQL can reach the sub-second level. However, when the write load is high, the data latency for the columnstore index increases. The default consistency level is eventual consistency. Therefore, if the write load is high and a SQL query uses the columnstore index, the query might not retrieve the latest data.
For scenarios that require strong consistency between the row store and column store, you can set the polar_csi.forward_replay_wait parameter to strong consistency read (on). When a query is run, the system executes the query only after the data in the columnstore index is consistent with the data in the row store.
Delete and rebuild a columnstore index
Because PolarDB for PostgreSQL clusters do not support modifying columnstore indexes, you must delete and recreate the index or rebuild the index to add other columns to an existing columnstore index.
Delete a columnstore index
Syntax
DROP INDEX csi_index_name;Parameters
Parameter | Description |
csi_index_name | The name of the columnstore index to delete. |
Example
Delete the previously created columnstore index.
DROP INDEX idx_csi_sales; Rebuild a columnstore index
Syntax
REINDEX INDEX csi_index_name;Parameters
Parameter | Description |
csi_index_name | The name of the columnstore index to rebuild. |
Example
Rebuild the previously created columnstore index.
REINDEX INDEX idx_csi_sales; Tune columnstore index parameters to improve performance
Improve index creation speed
The index creation speed is related to several parameters, as shown in the following table:
Parameter name | Value range | Default value | Description |
polar_csi.memory_limit | 1 to 1048576 | 1024 | The amount of memory that the columnstore index can use, in MB. A larger memory threshold results in faster index creation. Adjust this parameter based on your cluster specifications. We recommend that you do not set this value to more than 20% of the cluster memory to avoid the risk of out-of-memory (OOM) errors. |
polar_csi.flush_count | 2048 to 20480000 | 204800 | The number of rows in a batch commit. Increasing the number of rows can improve creation efficiency but also requires more memory. |
Improve the real-time performance of the columnstore index
PolarDB added the following parameters in the specified versions. By adjusting these parameters, you can increase the speed of row-to-column transformation and improve the real-time performance of columnstore index data.
PostgreSQL 14: minor engine version 2.0.14.13.28.0 or later.
PostgreSQL 16: No related parameters.
Parameter name | Value range | Default value | Function |
polar_csi.update_interval | 0 to 3600 | 3 | The interval for periodic updates, in seconds. Increasing the update interval can merge small transactions of the same type, which improves data update efficiency when there are many transactions. |
polar_csi.update_batch_count | 1024 to 4294967295 | 100000 | The threshold for the number of rows in a batch update. This is the maximum number of rows in an update transaction. Increasing this threshold can improve data update efficiency. |
Improve query speed
Query performance is mainly related to the following parameters. You can adjust their values to improve query performance:
Parameter name | Value range | Default value | Function |
polar_csi.exec_parallel | 1 to 512 | 2 | The degree of parallelism for a single query executed by the columnstore index. In general, a higher degree of parallelism results in better performance. This is related to the cluster resource specifications. Adjust this parameter based on your cluster specifications. We recommend that you do not set this value to be higher than the number of CPU cores of the compute node. |
polar_csi.memory_limit | 1 to 1048576 | 1024 | The amount of memory that the columnstore index can use, in MB. Adjust this parameter based on your cluster specifications. We recommend that you do not set this value to more than 20% of the cluster memory to avoid the risk of OOM errors. |
polar_csi.cost_threshold | 1 to 1000000000 | 50000 | The query cost threshold for using the columnstore index. If the query cost is less than the current threshold, the row-store engine is used. Otherwise, the columnstore engine is used. If this parameter is set to a large value, time-consuming SQL statements may not use the columnstore index, resulting in slower queries. If this parameter is set to a small value, simple SQL statements may use the columnstore index, reducing the system's concurrency. Adjust this parameter based on your actual business load. Note
|
polar_csi.forward_replay_wait | on|off | off | IMCI query consistency level. The valid values are as follows:
|





