This topic describes how to use the standard CREATE TABLE statement to create a dimension table in Realtime Compute for Apache Flink. To use the standard data definition language (DDL) statement to create a dimension table, add PERIOD FOR SYSTEM_TIME to the statement to define the change period of the dimension table.
Example
CREATE TABLE white_list (
id varchar,
name varchar,
age int,
PRIMARY KEY (id),
PERIOD FOR SYSTEM_TIME --Define the change period of the dimension table. In Realtime Compute for Apache Flink V3.X and later, you do not need to declare PERIOD FOR SYSTEM_TIME. You need to declare only FOR SYSTEM_TIME AS OF PROCTIME() when you join a dimension table with another table.
) with (
type = 'RDS',
...
);
Note
- You must specify a primary key for each dimension table. When you join a dimension table with another table, the ON clause must contain the equivalent (=) conditions for all the primary key fields.
- You can execute only the
INNER JOIN
or theLEFT JOIN
statement to join a source table and a dimension table. - The unique key of the dimension table must be the same as that of the database table.
If they are not the same, you may encounter the following issues:
- Reading data from the dimension table slows down.
- When you join the dimension table with another table, the join operation starts from the first data record. In the Realtime Compute for Apache Flink job, multiple data records that have the same key are sequentially updated in the database. This may cause errors in the join result.
INDEX syntax
Note We recommend that you use the
INDEX
syntax in Realtime Compute for Apache Flink V2.2.7 and later.
In Realtime Compute for Apache Flink versions that are earlier than V2.2, you must
declare the
PRIMARY KEY
when you create a dimension table. In this case, you can perform only one-to-one
table joins. The INDEX
syntax is introduced to meet the requirements for one-to-many table joins. For dimension
tables that do not support the ALL
cache policy, you can use INDEX LOOKUP
to meet the requirements for one-to-many table joins. CREATE TABLE Persons (
ID bigint,
LastName varchar,
FirstName varchar,
Nick varchar,
Age int,
[UNIQUE] INDEX(LastName,FirstName,Nick), --Define the index. You do not need to specify the index type, such as fulltext or clustered.
PERIOD FOR SYSTEM_TIME
) with (
type='RDS',
...
);
UNIQUE INDEX
represents a one-to-one table join. INDEX
represents a one-to-many table join.
Note
UNIQUE CONSTRAINT
(UNIQUE KEY
) is supported in Realtime Compute for Apache Flink V2.2.7 and later. In Realtime Compute for Apache Flink versions that are earlier than V2.2.7, you can define the index by usingPRIMARY KEY
.- The engine preferentially uses
UNIQUE INDEX
when it generates an execution plan. If INDEX is used in the DDL statement and the JOIN equivalent (=) conditions include bothUNIQUE
andNON-UNIQUE INDEX
, the system preferentially usesUNIQUE INDEX
to search for data in the right table. - The dimension table types, such as ApsaraDB RDS dimension table and MaxCompute dimension table, supports one-to-many table joins.
- In one-to-many table joins, you can use the
maxJoinRows
parameter to specify the maximum number of associated rows in the right table for each row in the left table. The default value is 1024. If one row is associated with an excessively large number of rows, the performance of stream processing tasks may be compromised. If this occurs, you can increase the cache size. You can use thecacheSize
parameter to limit the number of keys in the left table. - The INDEX syntax cannot be used to perform one-to-many table joins on Tablestore and Hologres dimension tables.
Differences among dimension tables, source tables, and result tables
Item | Source Table | Result table | Dimension table |
---|---|---|---|
Trigger computing | Supported | Not supported | Not supported |
Read data | Supported (You can directly read data from source tables.) | Not supported | Supported (You can read data from dimension tables only by joining a dimension table with a source table.) |
Write data | Not supported | Supported | Not supported |
Cache data | Not supported | Not supported | Supported |