This topic shows you how to create a Hologres dimension table by using Blink in exclusive mode.

Limits

  1. We recommend that you use row-oriented tables as Hologres dimension tables. In point query scenarios, column-oriented tables cost more than row-oriented tables do.
    When you create a row-oriented table, you must set a primary key and set it as the clustering key of the table. For example, you can execute the following statements to create a Hologres dimension table:
    begin;
    create table test(a int primary key, b text, c text[], d float8, e int8);
    call set_table_property('test', 'orientation', 'row');
    call set_table_property('test', 'clustering_key', 'a');
    commit;
  2. When you join a dimension table with another table, you must specify all the fields in the primary key of the dimension table in the ON clause.
  3. Hologres does not support a one-to-many mapping between input and output when you use Holo-blink connectors to join dimension tables.

DDL syntax

Execute the following sample data definition language (DDL) statement to create a Hologres dimension table:
CREATE TABLE rds_dim_table(
 id INT,
 len INT,
 content VARCHAR,
 PRIMARY KEY (id),
 PERIOD FOR SYSTEM_TIME --Define the change period of the dimension table. This indicates that the dimension table is continuously updated.
) with (
  type='hologres',
  'dbname'='<yourDbname>',  --The name of the Hologres database to be connected to.
  'tablename'='<yourTablename>', --The name of the Hologres table to which you want to write data.
  'username'='<yourUsername>', --The AccessKey ID of your Alibaba Cloud account.
  'password'='<yourPassword>', --The AccessKey secret of your Alibaba Cloud account.
  'endpoint'='<yourEndpoint>' --The virtual private cloud (VPC) endpoint of your Hologres instance.
);
The following table describes the parameters in the with object.
Parameter Description Required
type The type of the dimension table.

Set the value to hologres.

Yes
dbname The name of the Hologres database to be connected to. Yes
tablename The name of the Hologres table to which you want to write data. Yes
username The AccessKey ID of your Alibaba Cloud account.

You can obtain the AccessKey ID from the AccessKey Management page.

Yes
password The AccessKey secret of your Alibaba Cloud account.

You can obtain the AccessKey secret from the AccessKey Management page.

Yes
endpoint The VPC endpoint of your Hologres instance.

You can log on to the Hologres console, go to the details page of the instance that you want to view, and then obtain the endpoint from the Configurations tab. The endpoint must contain a port number, in the format of IP address:Port number.

Yes

Parameters in the cache object

The SQL statement that is used to create a Hologres dimension table may contain a cache object. The following table describes the parameters in the cache object.
Parameter Description Required Example
cache The policy that you want to use to cache data. No Hologres supports the following cache policies for dimension tables:
  • None: Do not cache data. This is the default cache policy.
  • LRU: Cache only the specified data in the dimension table. The system searches among the cache each time it receives a data record from the source table. If the system does not find the record in the cache, the system searches for the data record in the physical dimension table.

    If you use this cache policy, you must set the cachesize and cachettmlss parameters.

  • ALL: Cache all the data in the dimension table. Before the system runs a job, the system loads all the data in the dimension table to the cache. If you need to retrieve data from the dimension table, the system searches among the cache. If a cache miss occurs, the corresponding keys do not exist in your cache. In this case, all the data is cached again after the cache expires.

    If the remote table stores only a small volume of data and a large number of missing keys exist, we recommend that you set this parameter to ALL. If you use the ON clause to join the dimension table with a source table and the dimension table does not include the keys that are specified in the join conditions, missing keys occur.

    If you use this cache policy, you must set the cachettmlss and cachereloadtimeblackList parameters.

Note
  • If you set the cache parameter to ALL, you must increase the memory space of the node where the dimension table is joined with another table. This is because the system asynchronously loads the data from the dimension table. The increased memory size is twice the size of the data in the remote table.
  • If a dimension table stores a large volume of data and the cache parameter is set to ALL, an out of memory (OOM) error may occur or a full garbage collection (GC) may be time-consuming. To resolve this issue, you can use one of the following methods:
    • If all the data in the dimension table can be cached, enable the partitionedjoin feature for the table.
      • For Blink versions earlier than V3.6.0, the full data of the dimension table is loaded for each concurrent job by default.
      • For Blink V3.6.0 and later, the partitionedjoin feature is supported if you set the cache parameter to ALL.
      After you enable the partitionedjoin feature, only the required data is cached for each concurrent job.
    • Use an HBase or ApsaraDB RDS dimension table that uses key-value pairs to store data.
cachesize The maximum number of data records that can be cached. No This parameter is used only if you set the cache parameter to LRU. Default value: 10000.
cachettmlss The timeout period of the cache. Unit: milliseconds. No
  • If you set the cache parameter to LRU, the cachettmlss parameter specifies the timeout period of the cache. By default, the cached data does not expire.
  • If you set the cache parameter to ALL, the cachettmlss parameter specifies the interval at which the system refreshes the cache. By default, the cache is not refreshed.
partitionedjoin Specifies whether to query data from partitions. No Valid values:
  • false: This value is the default value.
  • true
async Specifies whether to asynchronously read data. No Valid values:
  • false: This value is the default value.
  • true
cacheempty Specifies whether to cache JOIN queries whose return results are empty. No Valid values:
  • true: Cache JOIN queries whose return results are empty. This value is the default value.
  • false: Do not cache JOIN queries whose return results are empty.

Examples

Execute the following sample statements to create a Hologres dimension table and import data:
create table randomSource (a int, b VARCHAR, c VARCHAR) with (type = 'random');

create table test (
      a int,
    b VARCHAR,
    c VARCHAR,
    PRIMARY KEY (a, b), PERIOD FOR SYSTEM_TIME
) with (
  type = 'hologres',
  'dbname'='<yourDbname>', --The name of the Hologres database to be connected to.
  'tablename'='<yourTablename>', --The name of the Hologres table to which you want to write data.
  'username'='<yourUsername>', --The AccessKey ID of your Alibaba Cloud account.
  'password'='<yourPassword>', --The AccessKey secret of your Alibaba Cloud account.
  'endpoint'='<yourEndpoint>' --The VPC endpoint of your Hologres instance.
);

create table print_sink (
  a int,
  b VARCHAR
) with (type = 'print', `ignoreWrite` = 'false');

insert into print_sink
select randomSource.a, test.b from randomSource
LEFT JOIN test FOR SYSTEM_TIME AS OF PROCTIME()
on randomSource.a = test.a and randomSource.b = test.b;

Mappings between data types

For information about the mappings between the data types supported by Blink in exclusive mode and those supported by Hologres, see Data types.