All Products
Search
Document Center

Hologres:Column-oriented storage for JSONB-formatted data

Last Updated:Oct 18, 2023

Hologres V1.3 and later support column-oriented storage for JSONB-formatted data in column-oriented tables. This reduces the storage resources that are required to store JSONB-formatted data and accelerates queries of JSONB-formatted data. This topic describes how to implement column-oriented storage for JSONB-formatted data in Hologres.

How it works

The following figure shows the implementation of column-oriented storage for JSONB-formatted data in Hologres. After column-oriented storage is enabled for the JSONB data type, the system converts the column of the JSONB data type into multiple columns with strong schemas at the underlying layer. This way, the column in which the value that you want to query resides can be directly located to improve query performance. Column-oriented storage enables JSONB-formatted data to be stored and compressed with the same efficiency as structured data at the storage layer because the values of different keys are stored in different columns. This helps improve efficiency and reduces costs.

Note

Column-oriented storage is not suitable for JSON-formatted data. We recommended that you do not enable this feature for JSON-formatted data.

image

Limits

  • Only Hologres V1.3 and later support column-oriented storage for the JSONB data type. If you want to use this feature, we recommend that you upgrade your Hologres instance to V1.3.37 or later for better performance. You can manually upgrade your Hologres instance in the Hologres console or join a Hologres DingTalk group to apply for an instance upgrade. For more information about how to manually upgrade a Hologres instance, see Instance upgrades. For more information about how to join a Hologres DingTalk group, see Obtain online support for Hologres.

  • Column-oriented storage for the JSONB data type can be used only for column-oriented tables but not row-oriented tables. Column-oriented storage for the JSONB data type can be triggered only if a column-oriented table contains 1,000 or more data records.

  • The following table describes operators that are supported by column-oriented storage for the JSONB data type. If you use unsupported operators in queries, the query performance may deteriorate.

    Operator

    Right operand type

    Description

    Operation and result

    ->

    text

    Obtains a JSON object field based on a key.

    • Sample operation:

      select '{"a": {"b":"foo"}}'::json->'a'

    • Returned result:

      {"b":"foo"}

    ->>

    text

    Obtains a JSON object field as text.

    • Sample operation:

      select '{"a":1,"b":2}'::json->>'b'
    • Returned result:

      2

Use column-oriented storage for the JSONB data type

Enable column-oriented storage for the JSONB data type

Execute the following statement to enable column-oriented storage for a column of the JSONB data type in a table.

-- Enable column-oriented storage for a column of the JSONB data type in a table.
ALTER TABLE <table_name> ALTER COLUMN <column_name> SET (enable_columnar_type = ON);

The table_name parameter specifies the name of the table, and the column_name parameter specifies the name of the column.

Important
  • After column-oriented storage is enabled for a column of the JSONB data type, the system converts the storage mode of all historical data in the column into the column-oriented storage mode during compaction.

  • Compaction consumes system resources, such as memory resources. We recommend that you perform compaction during off-peak hours. You can run the vacuum table_name; command to forcefully trigger compaction. The compaction operation is complete when the vacuum command finishes running.

  • After compaction is complete, newly written data is stored in column-oriented storage mode.

Enable DECIMAL type inference

Important

Before you enable DECIMAL type inference, you must make sure that column-oriented storage is enabled for the JSONB data type.

Hologres V2.0.11 and later support column-oriented storage of data of the DECIMAL type. Sample JSON data:

{
  "name":"Mike",
  "statistical_period":"2023-01-01 00:00:00+08",
  "balance":123.45
}

After you enable DECIMAL type inference for data of balance, column-oriented storage is supported for the data. Sample statement:

-- Enable column-oriented storage for a column of the DECIMAL data type in a table.
ALTER TABLE <table_name> ALTER COLUMN <column_name> SET (enable_decimal = ON);

The table_name parameter specifies the name of the table, and the column_name parameter specifies the name of the column.

Check the storage mode of the JSONB data type

The statements that are used to check the storage mode of the JSONB data type differ based on Hologres versions.

  • For Hologres V1.3.37 and later, execute the following statement:

    Note

    In Hologres V2.0.17 and earlier, the statement can be used to check the storage modes of only tables in the public schema. In Hologres V2.0.18 and later, the statement can be used to check the storage modes of tables in all schemas.

    -- In Hologres V2.0.17 and earlier, the statement can be used to check the storage modes of only tables in the public schema. In Hologres V2.0.18 and later, the statement can be used to check the storage modes of tables in all schemas.
    select * from hologres.hg_column_options where table_name = '<schema.table_name>';

    The table_name parameter specifies the name of the table in which the JSONB-formatted data is stored.

  • For Hologres V1.3.10 to V1.1.36, execute the following statements:

    SELECT DISTINCT
        a.attnum as num,
        a.attname as name,
        format_type(a.atttypid, a.atttypmod) as type,
        a.attnotnull as notnull, 
        com.description as comment,
        coalesce(i.indisprimary,false) as primary_key,
        def.adsrc as default,
        a.attoptions
    FROM pg_attribute a 
    JOIN pg_class pgc ON pgc.oid = a.attrelid
    LEFT JOIN pg_index i ON 
        (pgc.oid = i.indrelid AND i.indkey[0] = a.attnum)
    LEFT JOIN pg_description com on 
        (pgc.oid = com.objoid AND a.attnum = com.objsubid)
    LEFT JOIN pg_attrdef def ON 
        (a.attrelid = def.adrelid AND a.attnum = def.adnum)
    WHERE a.attnum > 0 AND pgc.oid = a.attrelid
    AND pg_table_is_visible(pgc.oid)
    AND NOT a.attisdropped
    AND pgc.relname = '<table_name>' 
    ORDER BY a.attnum;

    The table_name parameter specifies the name of the table in which the JSONB-formatted data is stored.

  • Sample result

    In the following figure, the attoptions or option property of a column is enable_columnar_type = ON, which indicates that column-oriented storage is enabled for this column.

    image

Disable column-oriented storage for the JSONB data type

Execute the following statement to disable column-oriented storage for a column of the JSONB data type in a table.

-- Disable column-oriented storage for a column of the JSONB data type in a table.
ALTER TABLE <table_name> ALTER COLUMN <column_name> SET (enable_columnar_type = OFF);

The table_name parameter specifies the name of the table, and the column_name parameter specifies the name of the column.

Important
  • After column-oriented storage is disabled for a column of the JSONB data type, the system converts the storage mode of all historical data in the column into the standard JSONB data storage mode during compaction. After compaction is complete, the conversion is complete.

  • Compaction consumes system resources, such as memory resources. We recommend that you perform compaction during off-peak hours. You can run the vacuum table_name; command to forcefully trigger compaction. The compaction operation is complete when the vacuum command finishes running.

  • After compaction is complete, newly written data is stored in the standard JSONB data storage mode.

Disable DECIMAL type inference

To disable DECIMAL type inference for a column of a table, execute the following statement:

-- Disable column-oriented storage for a column of the DECIMAL type in a table.
ALTER TABLE <table_name> ALTER COLUMN <column_name> SET (enable_decimal = OFF);

The table_name parameter specifies the name of the table, and the column_name parameter specifies the name of the column.

Note

After you disable DECIMAL type inference, compaction is immediately triggered to convert the storage mode of data of the DECIMAL type from column-oriented storage mode into the original mode.

Create bitmap indexes

In Hologres, the bitmap_columns property specifies a bitmap index. This property uses an index schema that is independent of data storage. You can use the bitmap_columns property to accelerate equivalent comparisons based on a bitmap vector structure. Bitmap indexes can help filter data that equals a specified value in a stored file. Therefore, the bitmap_columns property is applicable to point queries. In Hologres V2.0 and later, you can create bitmap indexes for columns with JSONB-formatted data that is stored in column-oriented storage mode. After column-oriented storage is enabled for the JSONB data type, the system can parse data of the following data types: INT, INT[], BIGINT, BIGINT[], TEXT, TEXT[], and JSONB. After the bitmap index feature is enabled, the system creates bitmap indexes for columns with parsed data of the INT, INT[], BIGINT, BIGINT[], TEXT, and TEXT[] data types.

Syntax:

call set_table_property('<table_name>', 'bitmap_columns', '[<columnName>{:[on|off]}[,...]]');

The following table describes the parameters in the preceding syntax.

Parameter

Description

table_name

The name of the table.

columnName

The name of the column.

on

Indicates that a bitmap index is created for the current column.

Important

You can create a bitmap index only for a column with JSONB-formatted data that is stored in column-oriented storage mode.

off

Indicates that no bitmap index is created for the current column.

Example

  1. Create a table.

    DROP TABLE IF EXISTS user_tags;
    
    -- Create a table.
    BEGIN;
    CREATE TABLE IF NOT EXISTS user_tags (
        ds timestamptz,
        tags jsonb
    );
    COMMIT;
  2. Enable column-oriented storage for the tags column of the JSONB data type.

    ALTER TABLE user_tags ALTER COLUMN tags SET (enable_columnar_type = ON);
  3. View the storage mode of the tags column.

    select * from hologres.hg_column_options where table_name = 'user_tags';

    In the following returned result, the options property of the tags column is enable_columnar_type = on, which indicates that column-oriented storage is enabled.

     schema_name | table_name | column_id | column_name |       column_type        | notnull | comment | default |          options
    -------------+------------+-----------+-------------+--------------------------+---------+---------+---------+---------------------------
     public      | user_tags  |         1 | ds          | timestamp with time zone | f       |         |         |
     public      | user_tags  |         2 | tags        | jsonb                    | f       |         |         | {enable_columnar_type=on}
    (2 rows)
  4. Import data.

    INSERT INTO user_tags (ds, tags)
    SELECT
        '2022-01-01 00:00:00+08'
        , ('{"id":' || i || ',"first_name" :"Sig",  "gender" :"Male"}')::jsonb
    FROM
        generate_series(1, 10001) i;
  5. Optional. Forcefully write data to disks.

    When data is written to disks, Hologres enables column-oriented storage for JSONB-formatted data. To view the storage effect at the earliest opportunity, you can execute the following statement to forcefully write data to disks:

    VACUUM user_tags;
  6. Query data from the table.

    Execute the following statement to query data in the first_name column whose value in the id column is 10:

    SELECT
        (tags -> 'first_name')::text AS first_name
    FROM
        user_tags
    WHERE (tags -> 'id')::int = 10;
  7. Check whether column-oriented storage is enabled for the JSONB data type based on the execution plan.

    -- Query detailed statistics.
    SET hg_experimental_show_execution_statistics_in_explain = ON;
    -- Query the execution plan.
    EXPLAIN ANALYZE
    SELECT
        (tags -> 'first_name')::text AS first_name
    FROM
        user_tags
    WHERE (tags -> 'id')::int = 10;

    In the following figure, columnar_access_used is returned, which indicates that column-oriented storage is enabled for the JSONB data type.

    image
  8. For the query in Step 6, you can create a bitmap index for the tags column to improve the efficiency of a key-based point query. Sample statement:

    call set_table_property('user_tags', 'bitmap_columns', 'tags');
  9. View the execution plan to check whether the bitmap index is used.

    -- View the execution plan.
    EXPLAIN ANALYZE
    SELECT
        (tags -> 'first_name')::text AS first_name
    FROM
        user_tags
    WHERE (tags -> 'id')::int = 10;

    The following figure shows the returned result.image..png

    The returned result contains bitmap_used, which indicates that the bitmap index is used.

Scenarios in which column-oriented storage is not recommended for the JSONB data type

Column-oriented storage for the JSONB data type helps save the required storage resources for JSONB-formatted data and significantly improves query performance. However, this feature is not suitable in the following scenarios.

Queries that require complete columns of the JSONB data type

Column-oriented storage for the JSONB data type provided by Hologres delivers better performance in most scenarios. However, for queries that require complete columns of the JSONB data type, the original storage mode of JSONB-formatted data delivers better performance. Sample SQL statements:

-- Execute the following data definition language (DDL) statement to create a table:
CREATE TABLE TBL(key int, json_data jsonb); 
SELECT json_data FROM TBL WHERE key = 123;
SELECT * FROM TBL limit 10;

Column-oriented storage for the JSONB data type is implemented at the underlying layer. If you want to obtain complete columns of the JSONB data type, the system needs to convert data in column-oriented storage into data in the original JSONB format.

image

The conversion process generates a large number of I/O operations and incurs high overheads. The conversion may cause performance to deteriorate if a large number of columns and a large amount of data are involved. We recommend that you do not enable column-oriented storage for the JSONB data type in this scenario.

Storage of extremely sparse data of the JSONB data type

If the JSONB-formatted data that you want to store in column-oriented storage contains sparse fields, Hologres places the sparse fields in a column named holo.remaining to prevent an excessive number of columns from being generated. If all the JSONB-formatted fields are sparse, for example, each field appears only once, column-oriented storage does not take effect. This is because all fields are placed in the column holo.remaining. In this case, column-oriented storage for the JSONB data type does not improve query performance.

Storage of JSONB-formatted data with complex nested structures

In this example, the root node of the JSONB-formatted data is an array, which contains JSONB-formatted data of different structures. Hologres stores data with the complex nested structures in one column. In this case, column-oriented storage for the JSONB data type does not improve query performance.

'[
  {"key1": "value1"}, 
  {"key2": 123},
  {"key3": 123.01}
]'

Best practices of column-oriented storage for the JSONB data type

Diagnostics of slow queries

If the query performance deteriorates after you enable column-oriented storage for the JSONB data type, check whether the query requires complete columns of the JSONB data type. For complex SQL statements, you can execute the Explain Analyze statement to check whether the query requires complete columns of the JSONB data type. Sample statements:

CREATE TABLE TBL(key int, json_data json); -- Create a table.
ALTER TABLE TBL ALTER COLUMN json_data SET (enable_columnar_type = on);
Explain Analyze SELECT json_data FROM TBL WHERE key = 123;

The return value of the Explain Analyze statement contains the Hint keyword. If the Hint keyword contains the following information, the query requires complete columns of the JSONB data type. As a result, the query performance deteriorates.

Column 'json_data' has enabled columnar jsonb, but the query scanned the entire Jsonb value

SQL syntax with better performance

  • JSONB-formatted data can be converted into the TEXT format by using different syntaxes. In this example, the ->> operator provides better performance if you want to obtain the name attribute of the json_data column.

    -- Better performance
    SELECT json_data->>'name' FROM tbl; 
    -- Ordinary performance
    SELECT (json_data->'name')::text FROM tbl;
  • If a JSONB-formatted field contains a TEXT array and you want to check whether the TEXT array contains the specified value, we recommend that you use the following syntax:

    SELECT key FROM tbl WHERE jsonb_to_textarray(json_data->'phones') && ARRAY['123456'];

FAQ

Why does the storage usage increase after I enable column-oriented storage for the JSONB data type?

After column-oriented storage is enabled for the JSONB data type, keys in the original JSONB-formatted data are no longer stored. Only the values that correspond to the keys are stored, and the data type of each column is the same. Column-oriented storage for the JSONB data type provides a higher compression ratio. Theoretically, the storage usage is significantly reduced.

However, if the fields in the JSONB-formatted data are sparse, the number of columns significantly increases after column-oriented storage. Each column incurs additional storage overhead due to operations such as column statistics collection and index creation. If the data type of each column after column-oriented storage is TEXT, the compression performance is not good. The actual compression efficiency differs based on data features, such as data sparsity. Column-oriented storage for the JSONB data type might not deliver better compression performance in some scenarios.