Specify a storage mode for a Hologres table - Hologres - Alibaba Cloud Documentation Center

Hologres supports row-oriented storage, column-oriented storage, and row-column hybrid storage. Different storage modes apply to different scenarios. You can specify a storage mode for a table based on the scenarios of the table. An appropriate storage mode helps increase the data processing and query speed and save the storage space.

Syntax

Hologres supports row-oriented storage, column-oriented storage, and row-column hybrid storage. When you create a table, you can specify a storage mode for the table by configuring the orientation property. Syntax:

-- Syntax supported in Hologres V2.1 and later:
CREATE TABLE <table_name> (...) WITH (orientation = '[column | row | row,column]');

-- Syntax supported in all Hologres versions:
BEGIN;
CREATE TABLE <table_name> (...);
call set_table_property('<table_name>', 'orientation', '[column | row | row,column]');
COMMIT;

table_name: the name of the table.

Note

orientation: specifies the storage mode of a table in Hologres. Row-column hybrid storage is supported in Hologres V1.1 and later.
By default, the column-oriented storage mode is used when a table is created. If you want to use the row-oriented or row-column hybrid storage mode, you must explicitly specify it when you create a table. You cannot modify the storage mode of an existing table. To modify the storage mode of the table, you must create a new table.

Suggestions

The following table provides suggestions on different storage modes.

Storage mode	Scenario	Recommended maximum number of columns	Description
Column-oriented storage	The column-oriented storage mode is suitable for online analytical processing (OLAP) scenarios and supports complex queries, data association, scanning, filtering, and statistics collection.	300	By default, more indexes are created for column-oriented tables. For example, bitmap indexes are created for columns of the STRING type. These indexes can significantly accelerate data filtering and statistics collection in queries.
Row-oriented storage	The row-oriented storage mode is suitable for point queries based on primary keys. Sample query statement: `select * from <tablename> where pk =xxx;`	3,000	By default, indexes are created only for primary key columns of row-oriented tables. In this case, only queries based on primary keys can be accelerated.
Row-column hybrid storage	The row-column hybrid storage mode is suitable for all scenarios in which the column-oriented and row-oriented storage modes can be used and point queries that are not based on primary keys.	1,000	The row-column hybrid storage mode is suitable for a wider range of scenarios. However, more overheads are required in this mode, such as the overheads of storage and internal data status synchronization.

How it works

Column-oriented storage

If a table uses the column-oriented storage mode, data is stored in columns. By default, the column-oriented table follows the ORC format and uses various encoding algorithms to encode data, such as run length encoding (RLE) and dictionary encoding. Then, the column-oriented table uses mainstream compression algorithms to compress the encoded data, such as SNAPPY, ZLIB, ZSTD, and LZ4. When the column-oriented table uses bitmap indexes and deferred materialization, the efficiency of data storage and query can be improved.

The system stores a primary key index file at the underlying layer for each table. For more information, see Primary key. If a primary key is configured for a column-oriented table, the system automatically generates a row identifier (RID) to quickly locate an entire row of data. If appropriate indexes are configured for the columns that you want to query, such as the distribution key and clustering key, the shard and file where data resides can be quickly located by using the indexes. This improves the query performance. Therefore, the column-oriented storage mode is suitable for a wide range of scenarios and is usually used in OLAP scenarios. Sample code:

Table creation syntax supported in Hologres V2.1 and later:

CREATE TABLE public.tbl_col (
    id text NOT NULL,
    name text NOT NULL,
    class text NOT NULL,
    in_time timestamptz NOT NULL,
    PRIMARY KEY (id)
)
WITH (
    orientation = 'column',
    clustering_key = 'class',
    bitmap_columns = 'name',
  	event_time_column = 'in_time'
);
select * from public.tbl_col where id ='3333';
select id, class,name from public.tbl_col where id < '3333' order by id;

Table creation syntax supported in all Hologres versions:

begin;
create table public.tbl_col (
id text NOT NULL,
name text NOT NULL,
class text NOT NULL,
in_time TIMESTAMPTZ NOT NULL,
PRIMARY KEY (id)
);
call set_table_property('public.tbl_col', 'orientation', 'column');
call set_table_property('public.tbl_col', 'clustering_key', 'class');
call set_table_property('public.tbl_col', 'bitmap_columns', 'name');
call set_table_property('public.tbl_col', 'event_time_column', 'in_time');
commit;
select * from public.tbl_col where id ='3333';
select id, class,name from public.tbl_col where id < '3333' order by id;

The following figure shows a storage model.

Row-oriented storage

If a table uses the row-oriented storage mode, data is stored in rows. By default, the row-oriented table follows the SST format. Data is compressed and stored in blocks based on keys in order and files are sorted by indexes, such as block indexes, bloom filter indexes, and the background compaction mechanism. This improves the efficiency of point queries.

Recommended. Configure a primary key.

The system stores a primary key index file at the underlying layer for each table. For more information, see Primary key. If a primary key is configured for a row-oriented table, the system automatically generates an RID to quickly locate an entire row of data. In addition, the system configures the primary key as the distribution key and clustering key. This way, the shard and file where data resides can be quickly located. If the query is performed based on the primary key, Hologres can locate the entire row of data in all columns by scanning only one primary key. This improves the query efficiency. Sample code:

Table creation syntax supported in Hologres V2.1 and later:

CREATE TABLE public.tbl_row (
    id text NOT NULL,
    name text NOT NULL,
    class text,
    PRIMARY KEY (id)
)
WITH (
    orientation = 'row',
    clustering_key = 'id',
    distribution_key = 'id'
);
-- Perform a point query based on a primary key.
select * from public.tbl_row where id ='1111';

-- Query data based on multiple keys.
select * from public.tbl_row where id in ('1111','2222','3333');

Table creation syntax supported in all Hologres versions:

begin;
create table public.tbl_row (
    id text NOT NULL,
    name text NOT NULL,
    class text ,
PRIMARY KEY (id)
);
call set_table_property('public.tbl_row', 'orientation', 'row');
call set_table_property('public.tbl_row', 'clustering_key', 'id');
call set_table_property('public.tbl_row', 'distribution_key', 'id');
commit;

-- Perform a point query based on a primary key.
select * from public.tbl_row where id ='1111';

-- Query data based on multiple keys.
select * from public.tbl_row where id in ('1111','2222','3333');

The following figure shows a storage model. 行存示意图

(Not recommended) Set the primary key and clustering key to different columns.
When you create a table, if you specify the table as a row-oriented table and set the primary key and clustering key to different columns, the system locates the clustering key and RID based on the primary key and then locates the entire row of data by using the clustering key and RID during data query. This way, the table is scanned twice and the performance is compromised. Sample code:
- Syntax of creating a row-oriented table whose primary key and clustering key are different for Hologres V2.1 and later:
```
CREATE TABLE public.tbl_row (
    id text NOT NULL,
    name text NOT NULL,
    class text,
    PRIMARY KEY (id)
)
WITH (
    orientation = 'row',
    clustering_key = 'name',
    distribution_key = 'id'
);
```
- Syntax of creating a row-oriented table whose primary key and clustering key are different for all Hologres versions:
```
begin;
create table public.tbl_row (
    id text NOT NULL,
    name text NOT NULL,
    class text ,
PRIMARY KEY (id)
);
call set_table_property('public.tbl_row', 'orientation', 'row');
call set_table_property('public.tbl_row', 'clustering_key', 'name');
call set_table_property('public.tbl_row', 'distribution_key', 'id');
commit;
```
The following figure shows a storage model.

In summary, row-oriented tables are suitable for point queries based on primary keys. Point queries can be achieved with a high queries per second (QPS). We recommend that you only configure the primary key when you create a table. After you configure the primary key, the system automatically configure the primary key as the distribution key and clustering key to improve query performance. We recommend that you do not set the primary key and clustering key to different columns. Otherwise, the performance may be compromised.

Row-column hybrid storage

In actual use, a table may be used for both point queries based on primary keys and OLAP queries. Therefore, Hologres V1.1 and later support row-column hybrid storage. The row-column hybrid storage mode combines the benefits of the row-oriented and column-oriented storage modes. The row-column hybrid storage mode also supports both high-performance point queries based on primary keys and OLAP. When data is stored in the underlying layer, the data is stored in duplicate, one in rows and the other in columns. This increases storage overheads.

When you write data, data is written in both rows and columns at the same time. A success message is returned only when the data is written in duplicate. This ensures the data atomicity.
When you query data, the optimizer generates an execution plan based on the SQL statement you execute. The query engine determines whether to use row-oriented storage or column-oriented storage based on the query efficiency. A primary key must be specified for a table that uses row-column hybrid storage.
- If you perform point queries based on a primary key by executing statements such as select * from tbl where pk=xxx and accelerate the execution of SQL statements by using a fixed plan, the execution plan that is generated by the optimizer follows the row-oriented storage mode by default. For more information about fixed plans, see Accelerate the execution of SQL statements by using fixed plans.
- If you perform point queries based on common columns rather than a primary key by executing statements such as select * from tbl where col1=xx and col2=yyy, the optimizer generates the following execution plan: the system first reads data in the column-oriented storage mode to obtain key values and then queries data in the row-oriented storage mode based on the key values. This prevents full table scans and improves the query performance. This execution plan is suitable especially when the table contains many columns, and many columns need to be displayed in the query result. In this case, the benefits of row-column hybrid storage can be fully used to improve the performance of fast data retrieval.
- In other query scenarios, the column-oriented storage mode is used by default.

Row-column hybrid storage delivers high efficiency in general queries, especially in point queries that are not based on primary keys. Sample code:

Table creation syntax supported in Hologres V2.1 and later:

CREATE TABLE public.tbl_row_col (
    id text NOT NULL,
    name text NOT NULL,
    class text NOT NULL,
    PRIMARY KEY (id)
)
WITH (
    orientation = 'row,column',
    distribution_key = 'id',
    clustering_key = 'class',
    bitmap_columns = 'name'
);

SELECT * FROM public.tbl_row_col where id ='2222'; -- Point query based on a primary key.
SELECT * FROM public.tbl_row_col where class='Class 2'; -- Point query that is not based on a primary key.
SELECT * FROM public.tbl_row_col where id ='2222' and class='Class 2'; -- General OLAP query.

Table creation syntax supported in all Hologres versions:

begin;
create table public.tbl_row_col (
id text NOT NULL,
name text NOT NULL,
class text ,
PRIMARY KEY (id)
);
call set_table_property('public.tbl_row_col', 'orientation','row,column');
call set_table_property('public.tbl_row_col', 'distribution_key','id');
call set_table_property('public.tbl_row_col', 'clustering_key','class');
call set_table_property('public.tbl_row_col', 'bitmap_columns','name');
commit;

SELECT * FROM public.tbl_row_col where id ='2222'; -- Point query based on a primary key.
SELECT * FROM public.tbl_row_col where class='Class 2'; -- Point query that is not based on a primary key.
SELECT * FROM public.tbl_row_col where id ='2222' and class='Class 2 '; -- General OLAP query.

The following figure shows a storage model. 行列共存

Examples

The following statements provide examples on how to create tables that use different storage modes.

Table creation syntax supported in Hologres V2.1 and later:

-- Create a row-oriented table.
CREATE TABLE public.tbl_row (
    a integer NOT NULL,
    b text NOT NULL,
    PRIMARY KEY (a)
)
WITH (
    orientation = 'row'
);

-- Create a column-oriented table.
CREATE TABLE tbl_col (
    a int NOT NULL,
    b text NOT NULL
)
WITH (
    orientation = 'column'
);

-- Create a row-column hybrid table.
CREATE TABLE tbl_col_row (
    pk text NOT NULL,
    col1 text,
    col2 text,
    col3 text,
    PRIMARY KEY (pk)
)
WITH (
    orientation = 'row,column'
);

Table creation syntax supported in all Hologres versions:

-- Create a row-oriented table.
begin;
create table public.tbl_row (
    a integer NOT NULL,
    b text NOT NULL
    ,PRIMARY KEY (a)
);
call set_table_property('public.tbl_row', 'orientation', 'row');
commit;


-- Create a column-oriented table.
begin;
create table tbl_col (
  a int not null, 
  b text not null);
call set_table_property('tbl_col', 'orientation', 'column');
commit;

-- Create a row-column hybrid table.
begin;
create table tbl_col_row (
	pk  text  not null, 
	col1 text, 
	col2 text, 
	col3 text, 
	PRIMARY KEY (pk)); 
call set_table_property('tbl_col_row', 'orientation', 'row,column');
commit;

References

For more information about how to configure table properties based on business query scenarios, see Guide on scenario-specific table creation and tuning.