All Products
Search
Document Center

Hologres:Hologres dimension tables

Last Updated:Mar 20, 2024

This topic describes how to create a Hologres dimension table by using Blink in exclusive mode.

Limits

  • We recommend that you use row-oriented tables as Hologres dimension tables. The performance overheads of column-oriented tables are higher than those of row-oriented tables in point queries.

    When you create a row-oriented table, you must configure a primary key and configure the primary key as the clustering key of the table. For example, you can execute the following statements to create a table in Hologres:

    begin;
    create table test(a int primary key, b text, c text, d float8, e int8);
    call set_table_property('test', 'orientation', 'row');
    call set_table_property('test', 'clustering_key', 'a');
    commit;
  • When you join a dimension table with another table, you must specify all the fields in the primary key of the dimension table in the ON clause.

  • Hologres does not support one-to-many mappings between input and output when you use a Hologres Blink connector to join a dimension table with another table.

  • You cannot use Blink to write data of an array type.

DDL syntax

To create a Hologres dimension table, execute a data definition language (DDL) statement that uses the following syntax:

CREATE TABLE rds_dim_table(
 id INT,
 len INT,
 content VARCHAR,
 PRIMARY KEY (id),
 PERIOD FOR SYSTEM_TIME -- Define the change period of the dimension table. This indicates that the dimension table is continuously updated. 
) with (
  type='hologres',
  'dbname'='<yourDbname>',  -- The name of the Hologres database. 
  'tablename'='<yourTablename>', -- The name of the Hologres table to which you want to write data. 
  'username'='<yourUsername>', -- The AccessKey ID of your Alibaba Cloud account. 
  'password'='<yourPassword>', -- The AccessKey secret of your Alibaba Cloud account. 
  'endpoint'='<yourEndpoint>' -- The endpoint of the virtual private cloud (VPC) in which the Hologres instance resides. 
);

The following table describes the parameters in the with clause.

Parameter

Description

Required

type

The type of the dimension table.

Set the value to hologres.

Yes

dbname

The name of the Hologres database.

Yes

tablename

The name of the Hologres table to which you want to write data.

Yes

username

The AccessKey ID of the Alibaba Cloud account.

You can obtain the AccessKey ID from the AccessKey Pair page.

Yes

password

The AccessKey secret of the Alibaba Cloud account.

You can obtain the AccessKey secret from the AccessKey Pair page.

Yes

endpoint

The endpoint of the VPC in which the Hologres instance resides.

You can view the endpoint in the Network Information section of the instance details page in the Hologres console. The VPC endpoint must contain a port number and follow the format of IP address:Port number.

Yes

Cache parameters

The SQL statement that is used to create a Hologres dimension table may contain cache parameters. The following table describes the parameters.

Parameter

Description

Required

Example

cache

The policy that is used to cache data.

No

Hologres supports the following cache policies for dimension tables:

  • None: Data is not cached. This is the default value.

  • LRU: Only specific data in the dimension table is cached. The system searches the cache each time it receives a data record from the source table. If the system does not find the record in the cache, the system searches for the data record in the physical dimension table.

    If you use this cache policy, you must configure the cachesize and cachettlms parameters.

  • ALL: All the data in the dimension table is cached. Before a job runs, the system loads all data in the dimension table to the cache. This way, the cache is searched for all subsequent queries in the dimension table. If the corresponding keys do not exist in your cache, a cache miss occurs. In this case, all the data is cached again after the cache expires.

    If the remote table stores only a small amount of data and a large number of keys are missing, we recommend that you set this parameter to ALL. If you use the ON clause to join the dimension table with a source table and the dimension table does not contain the keys that are specified in the join conditions, these keys are missing.

    If you use this cache policy, you must configure the cachettmls and cachereloadtimeblackList parameters.

Note
  • If you set the cache parameter to ALL, you must increase the memory of the node for joining tables because the system asynchronously loads data from the dimension table. The increased memory size is twice the data amount in the remote table.

  • If a dimension table stores a large amount of data and the cache parameter is set to ALL, an out of memory (OOM) error may occur or a full garbage collection (GC) may be time-consuming. You can use one of the following methods to resolve this issue:

    • If all the data in the dimension table can be cached, enable the partitionedjoin feature for the table.

      • For Blink versions earlier than V3.6.0, the full data of the dimension table is loaded for each concurrent job.

      • For Blink V3.6.0 and later, the partitionedjoin feature is supported if you set the cache parameter to ALL.

      After you enable the partitionedjoin feature, only the required data is cached for each concurrent job.

cachesize

The maximum number of rows of data entries that can be cached.

No

This parameter is valid only if you set the cache parameter to LRU. Default value: 10000.

cachettlms

The timeout period of the cache. Unit: milliseconds.

No

  • If you set the cache parameter to LRU, this parameter specifies the timeout period of the cache. By default, cached data does not expire.

  • If you set the cache parameter to ALL, this parameter specifies the interval at which the system refreshes the cache. By default, the cache is not refreshed.

partitionedjoin

Specifies whether to query data from partitions.

No

Valid values:

  • false: This value is the default value.

  • true

Note

The partitions refer to the partitions of keys in a dimension table that are segmented based on concurrency.

async

Specifies whether to read data in the asynchronous mode.

In the asynchronous mode, multiple requests and responses are concurrently processed. Therefore, consecutive requests do not block each other, and the query throughput is improved. However, requests are not processed in an absolute order in the asynchronous mode.

No

Valid values:

  • false: This value is the default value.

  • true

Note

cacheempty

Specifies whether to cache the data of JOIN queries whose returned results are empty.

No

Valid values:

  • true: The data of JOIN queries whose returned results are empty is cached. This is the default value.

  • false: The data of JOIN queries whose returned results are empty is not cached.

Examples

Execute the following statements to create a Hologres dimension table and import data.

create table randomSource (a int, b VARCHAR, c VARCHAR) with (type = 'random');

create table dim_test (
      a int,
    b VARCHAR,
    c VARCHAR,
    PRIMARY KEY (a, b), 
    PERIOD FOR SYSTEM_TIME
) with (
  type = 'hologres',
  'dbname'='<yourDbname>', --The name of the Hologres database. 
  'tablename'='<yourTablename>', -- The name of the Hologres table to which you want to write data. 
  'username'='<yourUsername>', -- The AccessKey ID of your Alibaba Cloud account. 
  'password'='<yourPassword>', -- The AccessKey secret of your Alibaba Cloud account. 
  'endpoint'='<yourEndpoint>' --The endpoint of the VPC in which the Hologres instance resides. 
);

create table print_sink (
  a int,
  b VARCHAR
) with (type = 'print', `ignoreWrite` = 'false');

insert into print_sink
select randomSource.a, dim_test.b from randomSource
LEFT JOIN dim_test FOR SYSTEM_TIME AS OF PROCTIME()
on randomSource.a = dim_test.a and randomSource.b = dim_test.b;

Data type mappings

For more information about the mappings between the data types in Blink in exclusive mode and Hologres, see Data types.