How to create a Hologres dimension table in Flink exclusive mode - Hologres

This topic describes how to create a Hologres dimension table in Flink exclusive mode.

Limits

Use Hologres row-oriented tables for point query scenarios because column-oriented tables incur high performance overhead.
When you create a row-oriented table, you must set a primary key and configure it as a clustering key. The following is a sample statement to create a Hologres table.
```
begin;
create table test(a int primary key, b text, c text, d float8, e int8);
call set_table_property('test', 'orientation', 'row');
call set_table_property('test', 'clustering_key', 'a');
commit;
```
The fields specified in the Flink `JOIN ON` clause must be the complete primary key of the table.
The dimension table feature of the Hologres Flink connector does not support one-to-many outputs.
Blink write operations do not support array types.

DDL semantics

The following is the Data Definition Language (DDL) statement to create a Hologres dimension table.

CREATE TABLE rds_dim_table(
 id INT,
 len INT,
 content VARCHAR,
 PRIMARY KEY (id),
 PERIOD FOR SYSTEM_TIME -- Defines the change cycle of the dimension table, indicating that this is a temporal table.
) with (
  type='hologres',
  'dbname'='<yourDbname>',  -- The name of the Hologres database.
  'tablename'='<yourTablename>', -- The name of the table in Hologres that receives data.
  'username'='<yourUsername>', -- The AccessKey ID of your Alibaba Cloud account.
  'password'='<yourPassword>', -- The AccessKey secret of your Alibaba Cloud account.
  'endpoint'='<yourEndpoint>' -- The Endpoint of the Hologres instance in the VPC.
);

The following table describes the `WITH` parameters.

Parameter	Description	Required
type	The type of the dimension table. Set the value to hologres.	Yes
dbname	The name of the Hologres database.	Yes
tablename	The name of the table in Hologres that receives data.	Yes
username	The AccessKey ID of your Alibaba Cloud account. Log on to the AccessKey Management page to obtain the AccessKey ID.	Yes
password	The AccessKey secret of your Alibaba Cloud account. Log on to the AccessKey Management page to obtain the AccessKey secret.	Yes
endpoint	The VPC network address of the Hologres instance. Log on to the Hologres console, go to the product page of the destination instance, and obtain the Endpoint from the Network Information area. The Endpoint must include the port number. The format is `ip:port`.	Yes

Cache parameters

If you use cache parameters for a Hologres dimension table, refer to the descriptions in the following table.

Parameter	Description	Required	Example
cache	The cache policy.	No	Hologres dimension tables support the following three cache policies: None (default): No cache is used. LRU: Caches a portion of the dimension table data. For each row in the source table, the system first looks up data in the cache. If the data is not found, the system looks it up in the physical dimension table. You must configure the cache size (cachesize) and cache time-to-live (TTL) (cachettlms). ALL: Caches all data from the dimension table. Before the job runs, the system loads all data from the dimension table into the cache. All subsequent lookups are performed against the cache. If data for a key is not found in the cache, the key does not exist. The cache is fully reloaded after it expires. This policy is suitable for scenarios where the remote table has a small data volume and a high number of key misses. A key miss occurs when the ON condition cannot associate data from the source table with the dimension table during a JOIN operation. You must configure the cache TTL (cachettlms) and the disallowed cache refresh time (cachereloadtimeblackList). Note The system loads dimension table data asynchronously. When you use the ALL cache policy, increase the memory of the dimension table join node. The increased memory size must be twice the data volume of the remote table. For dimension tables with a large data volume, using the ALL cache policy may cause out-of-memory (OOM) errors or long full garbage collection (GC) times. To resolve this issue, use one of the following solutions: For dimension tables that support the ALL cache policy, enable partitioned join optimization. In versions earlier than 3.6.0, each concurrent task loads the full data of the dimension table by default. In versions 3.6.0 and later, the ALL cache policy supports partitioned join optimization. After you enable partitioned join optimization, each concurrent task caches only the data it requires.
cachesize	The cache size.	No	If you set the cache policy to LRU, you can set the cache size. The default value is 10,000 rows.
cachettlms	The cache expiration time in milliseconds.	No	If you set the cache policy to LRU, you can set the cache expiration time. By default, the cache does not expire. If you set the cache policy to ALL, this parameter specifies the interval at which the cache is reloaded. By default, the cache is not reloaded.
partitionedjoin	Looks up data by partition.	No	The partitionedjoin parameter supports the following values: false (default) true Note Partitioning here refers to partitioning the keys of the dimension table in the cache based on concurrency. It does not mean that Hologres partitioned tables are supported as dimension tables.
async	Specifies whether to read data asynchronously. Asynchronous mode can process multiple requests and responses concurrently. This improves query throughput because consecutive requests do not need to block and wait. However, the absolute order of requests cannot be guaranteed in asynchronous mode.	No	The async parameter supports the following values: false (default) true
cacheempty	Specifies whether to cache data for which the join result is empty.	No	The cacheempty parameter supports the following values: true (default): Caches data for which the join result is empty. false: Does not cache data for which the join result is empty.

Usage example

The following is a sample statement to create a Hologres dimension table and import data.

create table randomSource (a int, b VARCHAR, c VARCHAR) with (type = 'random');

create table dim_test (
      a int,
    b VARCHAR,
    c VARCHAR,
    PRIMARY KEY (a, b), 
    PERIOD FOR SYSTEM_TIME
) with (
  type = 'hologres',
  'dbname'='<yourDbname>', -- The name of the Hologres database.
  'tablename'='<yourTablename>', -- The name of the table in Hologres that receives data.
  'username'='<yourUsername>', -- The AccessKey ID of your Alibaba Cloud account.
  'password'='<yourPassword>', -- The AccessKey secret of your Alibaba Cloud account.
  'endpoint'='<yourEndpoint>' -- The Endpoint of the Hologres instance in the VPC.
);

create table print_sink (
  a int,
  b VARCHAR
) with (type = 'print', `ignoreWrite` = 'false');

insert into print_sink
select randomSource.a, dim_test.b from randomSource
LEFT JOIN dim_test FOR SYSTEM_TIME AS OF PROCTIME()
on randomSource.a = dim_test.a and randomSource.b = dim_test.b;

Data type mapping

For more information about the data type mapping between Flink and Hologres in exclusive mode, see Data type summary.