Meta tables, managed through Data Management, allow for the creation and management of input, output, and dimension tables used during development. This topic describes the process for creating and managing meta tables.
Advantages
Meta tables offer several benefits:
Secure and reliable: Meta tables help prevent the leakage of sensitive information that could occur when writing native Flink DDL statements directly.
Efficiency and user experience: Creating a table once allows for multiple references, eliminating the need to rewrite DDL statements or perform complex mappings. This streamlines development and enhances both efficiency and user experience.
Asset lineage: Meta tables maintain information on upstream and downstream asset lineage.
Functions of meta tables
Meta tables enable the following:
Platformization: Centralized maintenance of all real-time meta tables and associated schema information.
Assetization: Unified configuration and management of tables for real-time development.
Meta table page introduction
Area | Description |
①Action bar | Supports save, submit, unpublish, refresh, edit lock, and locate operations. |
②Basic information of meta table | Basic information of the meta table, including the name of the meta table, data source type, data source name, source table name, and connector name. Note When the data source type of the meta table is selected as Hive and the source table is selected as a Hudi table, the connector is dp-hudi. |
③Operate meta table structure | Supports searching table fields, adding fields, exporting Flink DDL, sorting, and parsing operations. Adding fields supports the following methods.
|
④Meta table field list | Displays the fields of the meta table parsed by the system. Includes ordinal number, field name, whether it is metadata, Flink field type, original field type, description, and supports editing and deleting operations. |
⑤Configure meta table | Supports configuring the properties of the meta table and viewing the historical versions of the meta table.
|
Procedure
Step 1: create a meta table
On the Dataphin home page, select Development > Data Development from the top menu bar.
Select Project from the top menu bar, then choose Data Processing > Tables from the left-side navigation pane.
Click the Tables list
new icon to open the New Table dialog box.
In the New Table dialog box, configure the parameters.
Parameter
Description
Table type
Select meta table.
Meta table name
Enter the name of the meta table. The naming convention is as follows:
Only uppercase and lowercase English letters, numbers, and underscores (_) are supported, and it cannot start with a number.
Cannot exceed 64 characters.
Data source
For details about the real-time data sources supported by Dataphin and the types of tables created, see Real-time data sources supported by Dataphin.
You can also customize the real-time data source type. For specific operations, see Create a custom data source type.
After selecting the data source, you also need to configure the corresponding information according to the data source type. For configuration instructions, see Appendix: Meta table data source configuration parameters
Source table
Enter or select the source table.
NoteWhen the data source is selected as Hive, you can also select a Hudi table. In the source table drop-down list, the Hudi table is marked with an
icon.
When the data source is selected as Log Service, DataHub, Kafka, Elasticsearch, Redis, RabbitMQ, source table configuration is not supported.
Select directory
The default selection is table management. You can also create a target folder on the Table Management page and select the target folder as the directory for the meta table. The procedure is as follows:
Click the
icon above the table management list on the left side of the page to open the New Folder dialog box.
In the New Folder dialog box, enter the folder Name and select the Directory location as needed.
Click OK.
Description
Enter a brief description, within 1000 characters.
Click OK to complete the creation of the meta table.
Step 2: add fields
Dataphin meta tables support three methods for adding fields:
Add fields by SQL import
On the real-time meta table page, click + Add Field and select SQL Import.
In the SQL Import dialog box, enter SQL code.
NoteDataphin provides reference examples based on your data source type. You can view the corresponding code example by clicking the reference example
in the window.
After entering the code, click Format
to adjust the code format with one click.
If you select Import Parameter Values In With Parameters At The Same Time, the values in the with parameters will be imported as well.
An example code for a MySQL data source is as follows:
create table import_table ( retailer_code INT comment '' ,qty_order VARCHAR comment '' ,cig_bar_code INT comment '' ,org_code INT comment '' ,sale_reg_code INT comment '' ,order_date TIMESTAMP comment '' ,PRIMARY KEY(retailer_code) ) with ( 'connector' = 'mysql' ,'url' = 'jdbc' ,'table-name' = 'ads' ,'username' = 'dataphin' );
Click OK to finish adding the field.
Add fields by batch import
On the real-time meta table page, click + Add Field and select Batch Import.
In the Batch Import dialog box, enter SQL code according to the batch import format.
Batch import format:
Field name||Field type||Description||Is primary key||Is metadata
Example:
ID||INT||Description||false||false name||INT||Description||false||false
Click OK to finish adding the field.
Add fields by single row addition
On the real-time meta table page, click + Add Field and select Single Row Addition.
In the Single Row Addition dialog box, configure the parameters.
Parameter
Description
Is metadata
The default is No. If you select Yes, you do not need to fill in whether it is a primary key or the original field type. You need to select the Flink SQL field type.
Field name
Enter a field name.
Only uppercase and lowercase English letters, numbers, underscores (_), and half-width periods (.) are supported, and it cannot start with a number.
Is primary key
Select whether the field is a primary key based on business needs.
NoteIf your data source is Kafka and the connector is Kafka, select whether it is a message key.
If your data source is HBase, select RowKey.
Field type and original field type
HBase does not have an original field type. You need to select the Flink SQL field type. In addition, if the field is not a RowKey, you need to fill in the column family.
If the Flink SQL field type of the meta table and the original field type are many-to-one, you need to select the Flink SQL field type. The original field type is mapped from the Flink SQL field type. At this time, the original field type is only for display and cannot be edited, such as Kafka.
If the Flink SQL field type and the original field type of this data source are one-to-many, select the original field type first. After selecting the original field type, editing is allowed, and precision can be manually added, such as MySQL, Oracle, PostgreSQL, Microsoft SQL Server, Hive, and other data sources.
Click OK to finish adding the field.
Step 3: configure meta table properties
After creating the meta table, click the Properties button on the right to configure the Basic Information, Meta Table Parameters, Reference, and modify the Test data table.
Parameter
Description
Basic Information
Meta Table Name
The default is the name of the created meta table and cannot be modified.
Datasource
The default is the type of data source created.
Data Source Parameters
Different compute engines support different data sources, and different data sources require different configuration parameters. For more information, see Appendix: Meta table data source configuration parameters.
Description
Please enter a description of the meta table, within 1000 characters.
Meta Table Parameters
Parameter Name
Provide different meta table parameters based on the data source type. You can pull down to obtain the meta table parameters supported by the data source and their corresponding descriptions, or you can fill them in manually. If you need to add parameters, you can click Add Parameter.
The number of parameters does not exceed 50. The parameter name can only be numbers, uppercase and lowercase English letters, underscores (_), hyphens (-), half-width periods (.), half-width colons (:), and forward slashes (/).
Parameter Value
Parameter values provide options based on the parameter type. If there are no options, you need to enter them manually. Single quotes are not supported. For example: Parameter Name: address, Parameter Value: Ningbo.
Actions
You can click
to delete the corresponding parameter.
Reference
Flink Task Name
The Flink task name that references this meta table will be displayed.
NoteDraft tasks are not included in the reference information.
Default Read During Task Debugging
Set the default data table to be read during task debugging. You can choose the production table or the development table.
If you choose to read the production table, the corresponding production table data can be read during debugging, which poses a risk of data leakage. Please operate with caution.
If the default read production table is set during task debugging, you need to apply for the development and production data source permissions of the personal account. For how to apply for data source permissions, see Apply for data source permissions.
NoteHive tables, Paimon tables do not support debugging.
Read During Development Environment Testing
Set the default data table to be read during task testing. You can choose the production table or the development table.
If you choose to read the production table, the corresponding production table data can be read during testing, which poses a risk of data leakage. Please operate with caution.
If the default read production table is set during development environment testing, you need to apply for the development and production data source permissions of the personal account. For how to apply for data source permissions, see Apply for data source permissions.
Write During Development Environment Testing
Supports selecting the current source table and other test tables. If you select another test table, you need to select the corresponding table.
Click OK.
Step 4: submit or publish the meta table
Click Submit in the top-left menu bar of the meta table page.
In the Submission Remarks dialog box, enter remarks.
Click OK And Submit.
If the project mode is Dev-Prod, you must publish the meta table to the production environment. For detailed instructions, see Manage publishing tasks.
Appendix: Meta table data source configuration parameters
Data source | Configuration | Description |
MaxCompute |
| Source table: The source table of the data. blinkType: Supports selecting odps or continuous-odps.
|
| Source table | Source table: The source table of the data. |
SAP HANA |
| Source table: The source table of the data. Update time field: Select the field in the SAP HANA table that is the update time (timestamp type) from the drop-down options, or enter the HANA SQL time string expression, such as |
| Source topic | Source topic: The source topic of the data. |
|
|
|
Kafka |
|
|
Hudi |
|
|
Elasticsearch |
|
|
Redis | None | |
RabbitMQ |
|
|
What to do next
After creating the meta table, you can proceed to develop real-time tasks based on it. For more information, see: