Quick Starts

Introduction to Hologres indexes

Configure indexes

1. Primary key

On the Metadata Management page of the HoloWeb console, find your instance in the left-side navigation pane. Then, right-click Tables in your instance, and select New table from the shortcut menu. On the page that appears, specify Table name. On the Field tab, configure the fields, and select the fields that you want to configure as a primary key. Then, click Submit in the upper-right corner.

image

image

If you plan to configure only a primary key for the table, we recommend that you set Storage Format to Row storage on the Property tab. Row-oriented storage is conducive to accelerating queries.

image

After the table is created, you can view the primary key configuration on the DDL statement tab.

image

2. Distribution key

On the Metadata Management page of the HoloWeb console, find your instance in the left-side navigation pane. Then, right-click Tables in your instance, and select New table from the shortcut menu. On the page that appears, specify Table name. On the Field tab, configure the fields.

image

You can select the fields that you want to configure as a primary key.

image

On the Property tab, select the fields that you want to configure as a distribution key, and click Submit in the upper-right corner.

image

After the table is created, you can view the distribution key configuration on the DDL statement tab.

image

3. Clustering key

On the Metadata Management page of the HoloWeb console, find your instance in the left-side navigation pane. Then, right-click Tables in your instance, and select New table from the shortcut menu. On the page that appears, specify Table name. On the Field tab, configure the fields.

image

You can select the fields that you want to configure as a primary key.

image

On the Property tab, select the fields that you want to configure as a clustering key. Then, click Submit in the upper-right corner.

image

After the table is created, you can view the clustering key configuration on the DDL statement tab.

image

4. Bitmap column

On the Metadata Management page of the HoloWeb console, find your instance in the left-side navigation pane. Then, right-click Tables in your instance, and select New table from the shortcut menu. On the page that appears, specify Table name. On the Field tab, configure the fields.

image

On the Property tab, select on in Bitmap Column of the fields for which you want to build bitmap indexes. Then, click Submit in the upper-right corner.

image

After the table is created, you can view the bitmap column configuration on the DDL statement tab.

image

5. Segment key

On the Metadata Management page of the HoloWeb console, find your instance in the left-side navigation pane. Then, right-click Tables in your instance, and select New table from the shortcut menu. On the page that appears, specify Table name. On the Field tab, configure the fields.

image

On the Property tab, select the fields that you want to configure as a segment key. Then, click Submit in the upper-right corner.

image

After the table is created, you can view the segment key configuration on the DDL statement tab.

image

6. Dictionary encoding

On the Metadata Management page of the HoloWeb console, find your instance in the left-side navigation pane. Then, right-click Tables in your instance, and select New table from the shortcut menu. On the page that appears, specify Table name. On the Field tab, configure the fields.

image

On the Property tab, select on in Dictionary Encoding of the fields for which you want to build dictionary mappings. Then, click Submit in the upper-right corner.

image

After the table is created, you can view the dictionary encoding configuration on the DDL statement tab. image

Index comparison

Index type Declared during table creation Description Usage note Applicable query Limits
Primary key Yes
You can quickly locate the row identifier (RID) and clustering key based on the primary key in the primary key index file. Then, you can locate the file where the data you want to query resides based on the RID and clustering key.
● We recommend that you select columns with significant meaning to your business as primary keys. Columns of the SERIAL data type are not recommended. ● If you configure only the primary key for a table, we recommend that you store data in the table in row-oriented storage mode to achieve better query performance.
select sum(a) from tb1 where a > 100 and a < 200; select sum(a) from tb1 where a = 100;
● The column or columns that constitute a primary key must be unique and cannot contain null values. You can configure multiple columns to constitute a primary key in only one CREATE TABLE statement. ● The primary key does not support columns of the following data types: FLOAT, DOUBLE, NUMERIC, ARRAY, JSON, DATE, and other complex data types. The primary key of a table cannot be changed. If you want to change the primary key, you need to create another table. In Hologres V1.3.22 and later, you can configure columns of the DATE data type to constitute a primary key. ● If you specify a primary key for a partitioned table, you need to include the partition key in the primary key. ● You must specify a primary key for a row-oriented table or a row-column hybrid table. A primary key is optional for a column-oriented table. ● The primary key configured for a table cannot be changed. If you want to change the primary key, you need to create a table and import data from the original table.
Distribution key Yes
After you specify a distribution key for a table, data in the table is distributed to shards based on the distribution key by using the Hash(distribution_key)%shard_count algorithm. Data entries with the same distribution key are distributed to the same shard.
● Select the columns in which data is evenly distributed to constitute a distribution key. Otherwise, computing resources cannot be evenly allocated due to data skew. This reduces query efficiency. ● Select the columns that are frequently used in the GROUP BY clause to constitute a distribution key. ● In scenarios where multiple tables are joined, select the columns based on which you want to perform the join operation to constitute a distribution key. This implements a local join and prevents data shuffling. The tables that you want to join must belong to the same table group. ● We recommend that you configure only one distribution key that consists of one or two columns for a table. If you configure a distribution key that consists of more than two columns, and one of the columns is not hit during queries, data shuffling may occur.
select sum(a) from tb1 where a > 100 and a < 200; select sum(a) from tb1 where a = 100 ; select a,sum(b) from agg_tbl group by a; select * from tbl1 join tbl2 on tbl1.a=tbl2.c; SELECT * FROM join_tbl1 INNER JOIN join_tbl2 ON join_tbl2.a = join_tbl1.a INNER JOIN join_tbl3 ON join_tbl2.a = join_tbl3.a;
● You can configure a distribution key when you create a table. If you want to change the distribution key, you need to create a table and import data to the new table. ● You can select one or multiple columns to constitute a distribution key. However, the distribution key cannot contain spaces. If you select multiple columns to constitute a distribution key, separate the column names with commas (,). The order of the columns does not affect data layout and query performance. ● The distribution key does not support complex data types, such as FLOAT, DOUBLE, NUMERIC, ARRAY, and JSON. ● If a primary key is configured for a table, the distribution key of the table must be the primary key or part of the primary key and cannot be null. A null value indicates that no column is specified as the distribution key. This is because one data entry must be distributed to only one shard. The distribution key is set to the primary key by default. If no primary key is configured for a table, no limits are imposed on the distribution key, and it can be null. A null value indicates that no column is specified as the distribution key. In this case, data is randomly distributed to different shards. In Hologres V1.3.28, the columns that are specified to constitute a distribution key cannot be null. ● If a column that constitutes a distribution key contains a null value, the null value is regarded as an empty string and is displayed as quotation marks (").
Clustering key Yes
Clustering keys are used in files. Data in a file is sorted by the clustering key. For some range queries, Hologres can filter the data that is ordered based on the clustering key.
● A clustering key is suitable for point queries and range queries. It can optimize the performance of filter operations. You can configure a column as a clustering key and create a bitmap index for the column at the same time. This delivers the best point query performance. ● Clustering key-based queries follow the leftmost matching principle. We recommend that you specify no more than two columns to constitute a clustering key. Otherwise, clustering key-based queries cannot be performed in some scenarios. A clustering key is used for sorting. In the columns that constitute a clustering key, columns to the left of the table have higher priorities than columns to the right of the table. ● By default, the values in a column that constitutes a clustering key are sorted in ascending order (asc). You cannot set the sorting order to descending (desc) for a column when you create a table. ● By default, the primary key of a row-oriented table is a clustering key. In versions earlier than Hologres V0.9, no clustering key is specified for a table by default. If the clustering key of a table is not the primary key, Hologres generates two sets of sorted data for this table. One set of data is sorted based on the primary key, and the other set of data is sorted based on the clustering key. This results in data redundancy.
select sum(a) from tb1 where a > 100 and a < 200; select sum(a) from tb1 where a = 100;
● If you want to change the clustering key, you need to create another table and import data from the original table. ● The columns that constitute a clustering key cannot contain null values. In Hologres V1.3.20 to V1.3.27, the columns that constitute a clustering key can contain null values. In Hologres V1.3.28 and later, the columns that constitute a clustering key cannot contain null values. If columns that constitute a clustering key contain null values, data correctness may be affected. If null values are required for a clustering key due to business requirements, you can execute the following statement before you execute a SQL statement to configure the clustering key: set hg_experimental_enable_nullable_clustering_key = true ● Columns of the following data types cannot be specified to constitute a clustering key: DECIMAL, NUMERIC, FLOAT, DOUBLE, ARRAY, JSON, JSONB, TIMESTAMP, and other complex data types. ● When you specify columns to constitute a clustering key, you can append :asc, which is the default value, to the column names to specify the ascending order for the indexes to be built. The descending order (desc) is not supported. If you sort your query in descending order, the clustering key is not hit, reducing the query performance. ● By default, no clustering key is specified for a column-oriented table. You need to configure a clustering key based on your business requirements. ● In Hologres, each table can have only one clustering key. You can specify columns to constitute a clustering key for a table only when you create the table.
Segment key (event_time_column) Yes
Segment keys are used in files. Data is written to files in append-only mode. Small files are merged based on the segment key. The segment key identifies the boundary of a file. You can use the segment key to locate the required file. The segment key is designed for ordered and range-specific data such as timestamps and dates. Therefore, the segment key must be strongly correlated with the data write time.
● We recommend that you configure the segment key for the columns where data monotonically increases or decreases, such as columns of the TIMESTAMP type. The segment key is also applicable to columns that are strongly correlated with time, such as log and traffic columns. Appropriate configuration of the segment key can greatly improve performance. If data in the column that constitutes a segment key is unordered, data in each file has no distinguishing features and remains unordered after the files are merged. As a result, the effect of file data filtering cannot be achieved. ● If a table does not contain columns where data monotonically increases or decreases, you can add an update_time column. Each time you execute the UPSERT statement, the current time is written to the update_time column. ● Segment key-based queries follow the leftmost matching principle. We recommend that you do not specify multiple columns to constitute a segment key. Otherwise, data queries based on the segment key cannot be accelerated in some scenarios. In most cases, we recommend that you specify one or two columns to constitute a segment key.
select sum(a) from tb1 where ts > '2020-01-01' and a < '2020-03-02';
● The columns that constitute a segment key cannot contain null values and cannot contain values of the FLOAT, DOUBLE, ARRAY, JSON, or other complex data types. In Hologres V1.3.20 to V1.3.27, the columns that constitute a segment key can contain null values. In Hologres V1.3.28 and later, the columns that constitute a segment key cannot contain null values. If columns that constitute a segment key contain null values, data correctness may be affected. If null values are required for a segment key due to business requirements, you can execute the following statement before you execute a SQL statement to configure the segment key: set hg_experimental_enable_nullable_segment_key = true ● If you want to change the segment key, you need to create a table. ● You cannot configure a segment key for row-oriented table. ● By default, the first non-null column of the TIMESTAMP or TIMESTAMPTZ data type in a column-oriented table is used as the segment key. If such a column does not exist, the first non-null column of the DATE data type is used as the segment key by default. In versions earlier than Hologres V0.9, no segment key is configured for a table by default. ● The segment key is renamed event_time_column in Hologres V0.9. However, segment key can still be used in Hologres V0.9 and later.
Bitmap column No
Bitmap columns are used in files. Bitmaps are created for data in a file based on the bitmap column. For equivalent queries, Hologres can encode data of each row by value, and perform bit operations to find the rows where data resides. The time complexity is O(1).
● We recommend that you build bitmap mappings for an equivalent query column. This way, Hologres can quickly locate the number of the row where the data that matches filter conditions resides. Bitmap mappings occupy a large amount of memory resources. We recommend that you do not build bitmap mappings for all columns especially in tables with many columns. ● We recommend that you do not build bitmap mappings for columns of the NUMERIC data type. Values of the NUMERIC data type need to be converted into binary data for storage, and extra processing overhead is generated.
select * from tb1 where a ='aaabbbcccddd';
● Only column-oriented tables and row-column hybrid tables support bitmap columns. Row-oriented tables do not support bitmap columns. ● The columns for which you want to build bitmap mappings can be null. ● By default, Hologres in the current version implicitly builds bitmap mappings for all columns of the TEXT data type. ● You can change bitmap columns separately from the transaction of CREATE TABLE. The change does not immediately take effect. Bitmap mappings are asynchronously deleted and built in the background.
Dictionary encoding column No
Dictionary encoding is a compression-based storage technology that enables Hologres to encode and store raw data as data of the NUMERIC data type and maintain the related encoding schema. During data reads, Hologres decodes data based on the encoding schema. Therefore, dictionary encoding can accelerate queries with clauses such as GROUP BY and FILTER on columns with string comparisons and especially on columns with a high recurrence rate of values.
● We recommend that you build dictionary mappings for columns with string comparisons and a high recurrence rate of values. ● We recommend that you do not build dictionary mappings for all columns. Otherwise, extra processing overheads for encoding and decoding are generated. ● You can build dictionary mappings for columns by modifying the dictionary_encoding_columns property after a table is created. The modification does not immediately take effect. Dictionary mappings are asynchronously built and deleted at the backend. For more information, see ALTER TABLE.
select * from tb1 where a='aaabbbcccddd'; select a,sum(b) from agg_tbl group by a;
● Only column-oriented tables and row-column hybrid tables support dictionary encoding. ● The columns for which you want to build dictionary mappings can be null. ● We recommend that you build dictionary mappings for columns with a few values. Data in these columns allows compression-based storage. ● By default, Hologres V0.8 or earlier versions implicitly build dictionary mappings for all columns of the TEXT data type. In V0.9 and later, Hologres automatically determines whether to build dictionary mappings for a column based on the characteristics of data in the column.

Check whether indexes are built

In this example, the following statements are executed to create a table.

image

You can add the EXPLAIN keyword before an SQL statement to query its execution plans. You can view the Index Scan section to check whether an index is built for a column as expected.

image

Causes of index-related issues

  1. An error is reported when I insert data to a table

Causes:

The columns that constitute a primary key, a segment key, or a clustering key cannot be empty.

  1. Table creation fails

Causes:

  • Row-oriented tables do not support bitmap columns. You can configure bitmap columns only for column-oriented tables and row-column hybrid tables.

  • Columns of complex data types cannot be configured as primary keys, distribution keys, segment keys, or clustering keys.

  • If you specify a primary key for a partitioned table, you need to include the partition key in the primary key.

  • You must configure a primary key for a row-oriented table or a row-column hybrid table. A primary key is not required for a column-oriented table.

  • You cannot configure a segment key for a row-oriented table.

  • You can configure only one clustering key for a table.

3. Table modification fails

Causes:

You cannot change the primary key, distribution key, segment key, or clustering key for an existing table. If you want to change it, you need to create a table.

4. Indexes are not used during queries

Causes:

  • Indexes are built for multiple columns.

Key(a,b)

The indexes are invalid if the where b=1 clause is used.

The indexes are valid if the where a=1 and where a=1 and b=1 clauses are used.

Sorting logic: Data is sorted by column a. For data entries with the same value of column a, data is sorted by column b.

  • Columns that constitute a distribution key cannot contain spaces.

  • A clustering key supports only the ascending order (asc).

  • Changes of bitmap columns and dictionary encoding columns do not immediately take effect. They are asynchronously built and deleted in the background.

5. Indexes do not accelerate queries

Causes:

  • Data in the columns that constitute a distribution key is not evenly distributed. As a result, computing resources cannot be evenly allocated due to data skew.

  • The columns for which you build dictionary mappings have a large number of distinct values. This generates extra overhead when the data is queried.

Summary

This topic describes how to configure different types of indexes in Hologres and provides causes of index-related issues. This topic also provides details of principles, scenarios, limits, and usage notes of indexes in Hologres.

Was this helpful?

open