This topic describes how to create a Hologres dimension table by using Blink in exclusive mode.

Usage notes

  • We recommend that you use row-oriented tables as Hologres dimension tables. In point query scenarios, column-oriented tables cost more than row-oriented tables do.
    When you create a row-oriented table, you must set a primary key and set it as the clustering key of the table. For example, you can execute the following statements to create a Hologres dimension table:
    begin;
    create table test(a int primary key, b text, c text, d float8, e int8);
    call set_table_property('test', 'orientation', 'row');
    call set_table_property('test', 'clustering_key', 'a');
    commit;
  • When you join a dimension table with another table, you must specify all fields in the primary key of the dimension table in the ON clause.
  • Hologres does not support a one-to-many mapping between input and output when you use Holo-blink connectors to join dimension tables.
  • You cannot use Blink to write data of an array type.

DDL syntax

To create a Hologres dimension table, execute DDL statements that use the following syntax:
CREATE TABLE rds_dim_table(
 id INT,
 len INT,
 content VARCHAR,
 PRIMARY KEY (id),
 PERIOD FOR SYSTEM_TIME -- The change period of the dimension table. This indicates that the dimension table is continuously updated. 
) with (
  type='hologres',
  'dbname'='<yourDbname>',  -- The name of the Hologres database to which you want to connect. 
  'tablename'='<yourTablename>', -- The name of the Hologres table to which you want to write data. 
  'username'='<yourUsername>', -- The AccessKey ID of your Alibaba Cloud account. 
  'password'='<yourPassword>', -- The AccessKey secret of your Alibaba Cloud account. 
  'endpoint'='<yourEndpoint>' -- The VPC endpoint of your Hologres instance. 
);
The following table describes the parameters in the with object.
Parameter Description Required
type The type of the dimension table.

Set the value to hologres.

Yes
dbname The name of the Hologres database to which you want to connect. Yes
tablename The name of the Hologres table to which you want to write data. Yes
username The AccessKey ID of your Alibaba Cloud account.

You can obtain the AccessKey ID from the AccessKey Pair page in the Resource Access Management (RAM) console.

Yes
password The AccessKey secret of your Alibaba Cloud account.

You can obtain the AccessKey secret from the AccessKey Pair page in the RAM console.

Yes
endpoint The Virtual Private Cloud (VPC) endpoint of your Hologres instance.

You can view the endpoint of the Hologres instance in the Network Information section of the instance details page in the Hologres console. The endpoint must contain a port number and must be in the format of Domain name:Port number.

Yes

Parameters in the cache object

The SQL statement that is used to create a Hologres dimension table may contain a cache object. The following table describes the parameters in the cache object.
Parameter Description Required Example
cache The policy that is used to cache data. No Hologres supports the following cache policies for dimension tables:
  • None: Data is not cached. This is the default cache policy.
  • LRU: Only the specified data in the dimension table is cached. The system searches the cache each time it receives a data entry from the source table. If the system does not find the entry in the cache, the system searches for the data entry in the physical dimension table.

    If you use this cache policy, you must set the cachesize and cachettmlss parameters.

  • ALL: All data in the dimension table is cached. Before the system runs a job, the system loads all data in the dimension table to the cache. This way, the cache is searched for all subsequent queries in the dimension table. If the corresponding keys do not exist in your cache, a cache miss occurs. In this case, all the data is cached again after the cache expires.

    If the remote table stores only a small volume of data and a large number of keys are missing, we recommend that you set this parameter to ALL. If you use the ON clause to join the dimension table with a source table and the dimension table does not contain the keys that are specified in the join conditions, these keys are missing.

    If you use this cache policy, you must set the cachettmlss and cachereloadtimeblackList parameters.

Note
  • If you set the cache parameter to ALL, you must increase the memory of the node for joining tables because the system asynchronously loads data from the dimension table. The increased memory size is twice the memory size of the remote table.
  • If a dimension table stores a large volume of data and the cache parameter is set to ALL, an out of memory (OOM) error may occur or a full garbage collection (GC) may be time-consuming. You can use one of the following methods to resolve this issue:
    • If all the data in the dimension table can be cached, enable the partitionedjoin feature for the table.
      • For Blink versions earlier than V3.6.0, the full data of the dimension table is loaded for each concurrent job.
      • For Blink V3.6.0 and later, the partitionedjoin feature is supported if you set the cache parameter to ALL.
      After you enable the partitionedjoin feature, only the required data is cached for each concurrent job.
cachesize The maximum number of rows of data entries that can be cached. No This parameter is used only if you set the cache parameter to LRU. Default value: 10000.
cachettmlss The timeout period of the cache. Unit: milliseconds. No
  • If you set the cache parameter to LRU, the cachettmlss parameter specifies the timeout period of the cache. By default, cached data does not expire.
  • If you set the cache parameter to ALL, the cachettmlss parameter specifies the interval at which the system refreshes the cache. By default, the cache is not refreshed.
partitionedjoin Specifies whether to query data from partitions. No Default value: false. Valid values:
  • false
  • true
Note The partitions refer to the partitions of keys in a dimension table that are segmented based on concurrency.
async Specifies whether to read data in asynchronous mode.

In asynchronous mode, multiple requests and responses are processed concurrently, so that consecutive requests do not block each other and the query throughput is improved. However, requests are not processed in an absolute order in asynchronous mode.

No Default value: false. Valid values:
  • false
  • true
Note
cacheempty Specifies whether to cache the JOIN queries whose return results are empty. No Default value: true. Valid values:
  • true: caches the JOIN queries whose return results are empty.
  • false: does not cache the JOIN queries whose return results are empty.

Examples

To create a Hologres dimension table and import data from Blink, execute the following statements:
create table randomSource (a int, b VARCHAR, c VARCHAR) with (type = 'random');

create table dim_test (
      a int,
    b VARCHAR,
    c VARCHAR,
    PRIMARY KEY (a, b), 
    PERIOD FOR SYSTEM_TIME
) with (
  type = 'hologres',
  'dbname'='<yourDbname>', -- The name of the Hologres database to which you want to connect. 
  'tablename'='<yourTablename>', -- The name of the Hologres table to which you want to write data. 
  'username'='<yourUsername>', -- The AccessKey ID of your Alibaba Cloud account. 
  'password'='<yourPassword>', -- The AccessKey secret of your Alibaba Cloud account. 
  'endpoint'='<yourEndpoint>' -- The VPC endpoint of your Hologres instance. 
);

create table print_sink (
  a int,
  b VARCHAR
) with (type = 'print', `ignoreWrite` = 'false');

insert into print_sink
select randomSource.a, dim_test.b from randomSource
LEFT JOIN dim_test FOR SYSTEM_TIME AS OF PROCTIME()
on randomSource.a = dim_test.a and randomSource.b = dim_test.b;

Data type mappings

For information about the mappings between the data types in Blink in exclusive mode and Hologres, see Data types.