This topic describes how to create an AnalyticDB for MySQL V3.0 dimension table. This topic also describes the parameters in the WITH clause and cache parameters used when you create an AnalyticDB for MySQL V3.0 dimension table.
DDL syntax
CREATE TABLE dim_ads(
`name` VARCHAR,
id VARCHAR,
PRIMARY KEY (`name`),
PERIOD FOR SYSTEM_TIME
)with(
type='ADB30',
url='jdbc:mysql://<Internal endpoint>/<databaseName>',
tableName='xxx',
userName='xxx',
password='xxx'
);
- You must specify a primary key when you declare a dimension table.
- When you join a dimension table with another table, the ON condition must contain equality conditions that include all primary keys.
- The primary key of an AnalyticDB for MySQL database can be defined as the primary key or unique index column of an AnalyticDB for MySQL dimension table.
Parameters in the WITH clause
Parameter | Description | Required | Remarks |
---|---|---|---|
type | The type of the dimension table. | Yes | Set the value to ADB30. |
url | The URL of the AnalyticDB for MySQL database. | Yes | The URL of the AnalyticDB for MySQL database, such as url='jdbc:mysql://databaseName****-cn-shenzhen-a.ads.aliyuncs.com:10014/databaseName' .
Note
|
tableName | The name of the table. | Yes | N/A. |
userName | The username that is used to access the AnalyticDB for MySQL database. | Yes | N/A. |
password | The password that is used to access the AnalyticDB for MySQL database. | Yes | N/A. |
maxRetryTimes | The maximum number of retries for writing data to the table. | No | Default value: 3. |
Cache parameters
Parameter | Description | Required | Remarks |
---|---|---|---|
cache | The policy that is used to cache data. | No | Valid values:
Note
|
cacheSize | The maximum number of rows of data records that can be cached. | No | This parameter is available only if you set the cache parameter to LRU. Default value: 10000. |
cacheTTLMs | The interval at which the system refreshes the cache. The system reloads the latest data in the dimension table based on the value of this parameter. This ensures that the data in the source table is associated with the latest data in the dimension table. | No | Unit: milliseconds. This parameter is empty by default. This indicates that the updates in the dimension table are not reloaded. |
cacheReloadTimeBlackList | The periods of time during which cache is not refreshed. This parameter takes effect when the cache parameter is set to ALL. The cache is not refreshed during the time periods that you specify for this parameter. This parameter is useful for large-scale online promotional events such as Double 11. | No | This parameter is optional. This parameter is empty by default. For example, you can
specify this parameter as '2017-10-24 14:00 -> 2017-10-24 15:00, 2017-11-10 23:30 -> 2017-11-11 08:00'. Use the following delimiters to separate time periods:
|
partitionedJoin | Specifies whether to enable the partitionedJoin feature. If the partitionedJoin feature
is enabled, shuffling is implemented based on join keys before the primary table is
joined with the dimension table. This process provides the following benefits:
|
No | The default value of this parameter is false. This indicates that the partitionedJoin
feature is disabled.
Note Before you enable the partitionedJoin feature, set partitionedJoin to true.
|
maxJoinRows | The maximum number of results that are returned each time a data record in the primary table is queried and matched with data records in the dimension table. | No | Default value: 1024. If you can estimate that a data record in the primary table corresponds
to a maximum of n data records in the dimension table, you can set the maxJoinRows to n to ensure efficient matching in Realtime Compute for Apache Flink.
Note When you join a dimension table with another table, this parameter specifies the maximum
number of results that can be returned after a data record in the primary table is
matched with data records in the dimension table.
|
Sample code
CREATE TABLE datahub_input1 (
id BIGINT,
name VARCHAR,
age BIGINT
) WITH (
type='datahub'
);
create table phoneNumber (
name VARCHAR,
phoneNumber BIGINT,
primary key(name),
PERIOD FOR SYSTEM_TIME--The identifier of a dimension table.
) with (
type='ADB30'
);
CREATE table result_infor (
id BIGINT,
phoneNumber BIGINT,
name VARCHAR
) with (
type='rds'
);
INSERT INTO result_infor
SELECT
t.id,
w.phoneNumber,
t.name
FROM datahub_input1 as t
JOIN phoneNumber FOR SYSTEM_TIME AS OF PROCTIME() as w -- You must include this clause when you perform a JOIN operation on the dimension table.
ON t.name = w.name;