This topic describes how to create a Hologres dimension table. This topic also describes the parameters in the WITH clause, cache parameters, and data type mappings used when you create a Hologres dimension table.

Notice
  • This topic applies only to Blink 3.6.0 and later. If your Blink version is earlier than 3.6.0, you can submit a ticket to obtain the required Java Archive (JAR) files for the upgrade.
  • We recommend that you use Hologres 0.7 or later.

What is Hologres?

Hologres is compatible with the PostgreSQL protocol and closely connected to the big data ecosystem. Hologres supports real-time analysis and processing of petabytes of data with high concurrency and low latency. This allows you to use existing Business Intelligence (BI) tools to perform multidimensional analysis and business exploration.

Limits

  • We recommend that you use row-oriented storage to create a Hologres dimension table. Column-oriented storage consumes a large number of performance overheads for point queries.
    When you use row-oriented storage to create a Hologres dimension table, you must set the primary key to clustering key. The following statements show an example:
    begin;
    create table test(a int primary key, b text, c text, d float8, e int8);
    call set_table_property('test', 'orientation', 'row');
    call set_table_property('test', 'clustering_key', 'a');
    commit;
  • When you join a Hologres dimension table with another table, you must specify all the fields in the primary key of the dimension table in the ON clause.
  • Hologres does not support a one-to-many mapping between input and output when you use Hologres connectors to join dimension tables.
  • You cannot read data from Hologres partitioned tables.

DDL syntax

In Realtime Compute for Apache Flink, you can use a Hologres table as a dimension table. The following code shows an example.
CREATE TABLE hologres_dim_table(
  id INT,
  len INT,
  content VARCHAR,
  PRIMARY KEY (id),
  PERIOD FOR SYSTEM_TIME  --Define the change period of the dimension table. 
) WITH (
  type='hologres',
  endpoint='...',
  dbname='...',
  tablename='...',
  username='...',
  password='...'
);

Parameters in the WITH clause

Parameter Description Required Remarks
type The type of the database. Yes Set the value to hologres.
endpoint The endpoint of the Hologres instance. Yes N/A.
tablename The name of the table.
Note If the public schema is not used, you must set tableName to schema.tableName.
Yes N/A.
dbname The name of the database. Yes N/A.
username The username that is used to access the database. Yes N/A.
password The password that is used to access the database. Yes N/A.

Cache parameters

Parameter Description Required Remarks
cache The cache policy. No The following cache policies are supported:
  • None: indicates that data is not cached. This is the default value.
  • LRU: indicates that only the specified data in the dimension table is cached. Each time the system receives a data record, the system searches the cache. If the system does not find the record in the cache, the system searches for the data record in the physical dimension table.

    If this cache policy is used, you must configure the cacheSize and cacheTTLMs parameters.

cacheSize The maximum number of rows of data records that can be cached. No This parameter is available when you set the cache parameter to LRU. Default value: 10000.
cacheTTLMs The timeout period of the cache. Unit: milliseconds. No You can set this parameter when you set the cache parameter to LRU. By default, the cached data does not expire.
partitionedJoin Specifies whether to use the joinKey for partitioning. No Valid values:
  • false: The joinKey is not used for partitioning.
  • true: The joinKey is used for partitioning. Data is distributed to JOIN nodes to improve the cache hit rate.
async Specifies whether to read data in asynchronous mode. No Valid values:
  • false: Data is read synchronously. This is the default value.
  • true: Data is read asynchronously.

Data type mappings

Data type of Hologres Data type of Realtime Compute for Apache Flink
INT INT
INT[] ARRAY<INT>
BIGINT BIGINT
BIGINT[] ARRAY<BIGINT>
REAL FLOAT
REAL[] ARRAY<FLOAT>
DOUBLE PRECISION DOUBLE
DOUBLE PRECISION[] ARRAY<DOUBLE>
BOOLEAN BOOLEAN
BOOLEAN[] ARRAY<BOOLEAN>
TEXT VARCHAR
TEXT[] ARRAY<VARCHAR>
NUMERIC DECIMAL
DATE DATE
TIMESTAMP WITH TIMEZONE TIMESTAMP

Sample code

In Realtime Compute for Apache Flink, you can use a Hologres table as a dimension table. The following code shows an example.
create table randomSource (a int, b VARCHAR, c VARCHAR) with (type = 'random');

create table test (
  a int, 
  b VARCHAR, 
  c VARCHAR, 
  PRIMARY KEY (a, b), PERIOD FOR SYSTEM_TIME
) with (
  type = 'hologres',
  ...
);

create table print_sink (
  a int, 
  b VARCHAR
) with (
  type = 'print', 
  `ignoreWrite` = 'false'
);

insert into print_sink
select randomSource.a, test.b from randomSource 
LEFT JOIN test FOR SYSTEM_TIME AS OF PROCTIME() 
on randomSource.a = test.a and randomSource.b = test.b;