All Products
Search
Document Center

Dataphin:Create and manage meta tables

Last Updated:Mar 05, 2025

Meta tables, managed through Data Management, allow for the creation and management of input, output, and dimension tables used during development. This topic describes the process for creating and managing meta tables.

Advantages

Meta tables offer several benefits:

  • Secure and reliable: Meta tables help prevent the leakage of sensitive information that could occur when writing native Flink DDL statements directly.

  • Efficiency and user experience: Creating a table once allows for multiple references, eliminating the need to rewrite DDL statements or perform complex mappings. This streamlines development and enhances both efficiency and user experience.

  • Asset lineage: Meta tables maintain information on upstream and downstream asset lineage.

Functions of meta tables

Meta tables enable the following:

  • Platformization: Centralized maintenance of all real-time meta tables and associated schema information.

  • Assetization: Unified configuration and management of tables for real-time development.

Meta table page introduction

image

Area

Description

Action bar

Supports save, submit, unpublish, refresh, edit lock, and locate operations.

Basic information of meta table

Basic information of the meta table, including the name of the meta table, data source type, data source name, source table name, and connector name.

Note

When the data source type of the meta table is selected as Hive and the source table is selected as a Hudi table, the connector is dp-hudi.

Operate meta table structure

Supports searching table fields, adding fields, exporting Flink DDL, sorting, and parsing operations. Adding fields supports the following methods.

Meta table field list

Displays the fields of the meta table parsed by the system. Includes ordinal number, field name, whether it is metadata, Flink field type, original field type, description, and supports editing and deleting operations.

Configure meta table

Supports configuring the properties of the meta table and viewing the historical versions of the meta table.

Procedure

Step 1: create a meta table

  1. On the Dataphin home page, select Development > Data Development from the top menu bar.

  2. Select Project from the top menu bar, then choose Data Processing > Tables from the left-side navigation pane.

  3. Click the Tables list image new icon to open the New Table dialog box.

  4. In the New Table dialog box, configure the parameters.

    Parameter

    Description

    Table type

    Select meta table.

    Meta table name

    Enter the name of the meta table. The naming convention is as follows:

    • Only uppercase and lowercase English letters, numbers, and underscores (_) are supported, and it cannot start with a number.

    • Cannot exceed 64 characters.

    Data source

    Source table

    Enter or select the source table.

    Note

    When the data source is selected as Hive, you can also select a Hudi table. In the source table drop-down list, the Hudi table is marked with an image icon.

    When the data source is selected as Log Service, DataHub, Kafka, Elasticsearch, Redis, RabbitMQ, source table configuration is not supported.

    Select directory

    The default selection is table management. You can also create a target folder on the Table Management page and select the target folder as the directory for the meta table. The procedure is as follows:

    1. Click the image icon above the table management list on the left side of the page to open the New Folder dialog box.

    2. In the New Folder dialog box, enter the folder Name and select the Directory location as needed.

    3. Click OK.

    Description

    Enter a brief description, within 1000 characters.

  5. Click OK to complete the creation of the meta table.

Step 2: add fields

Dataphin meta tables support three methods for adding fields:

Add fields by SQL import

  1. On the real-time meta table page, click + Add Field and select SQL Import.

  2. In the SQL Import dialog box, enter SQL code.

    Note
    • Dataphin provides reference examples based on your data source type. You can view the corresponding code example by clicking the reference example image in the window.

    • After entering the code, click Format image to adjust the code format with one click.

    • If you select Import Parameter Values In With Parameters At The Same Time, the values in the with parameters will be imported as well.

    An example code for a MySQL data source is as follows:

    create table import_table (
      retailer_code INT comment ''
      ,qty_order VARCHAR comment ''
      ,cig_bar_code INT comment ''
      ,org_code INT comment ''
      ,sale_reg_code INT comment ''
      ,order_date TIMESTAMP comment ''
      ,PRIMARY KEY(retailer_code)
    ) with (
      'connector' = 'mysql'
      ,'url' = 'jdbc'
      ,'table-name' = 'ads'
      ,'username' = 'dataphin'
    );
  3. Click OK to finish adding the field.

Add fields by batch import

  1. On the real-time meta table page, click + Add Field and select Batch Import.

  2. In the Batch Import dialog box, enter SQL code according to the batch import format.

    • Batch import format:

      Field name||Field type||Description||Is primary key||Is metadata
    • Example:

      ID||INT||Description||false||false
      name||INT||Description||false||false
  3. Click OK to finish adding the field.

Add fields by single row addition

  1. On the real-time meta table page, click + Add Field and select Single Row Addition.

  2. In the Single Row Addition dialog box, configure the parameters.

    Parameter

    Description

    Is metadata

    The default is No. If you select Yes, you do not need to fill in whether it is a primary key or the original field type. You need to select the Flink SQL field type.

    Field name

    Enter a field name.

    Only uppercase and lowercase English letters, numbers, underscores (_), and half-width periods (.) are supported, and it cannot start with a number.

    Is primary key

    Select whether the field is a primary key based on business needs.

    Note
    • If your data source is Kafka and the connector is Kafka, select whether it is a message key.

    • If your data source is HBase, select RowKey.

    Field type and original field type

    • HBase does not have an original field type. You need to select the Flink SQL field type. In addition, if the field is not a RowKey, you need to fill in the column family.

    • If the Flink SQL field type of the meta table and the original field type are many-to-one, you need to select the Flink SQL field type. The original field type is mapped from the Flink SQL field type. At this time, the original field type is only for display and cannot be edited, such as Kafka.

    • If the Flink SQL field type and the original field type of this data source are one-to-many, select the original field type first. After selecting the original field type, editing is allowed, and precision can be manually added, such as MySQL, Oracle, PostgreSQL, Microsoft SQL Server, Hive, and other data sources.

  3. Click OK to finish adding the field.

Step 3: configure meta table properties

  1. After creating the meta table, click the Properties button on the right to configure the Basic Information, Meta Table Parameters, Reference, and modify the Test data table.

    Parameter

    Description

    Basic Information

    Meta Table Name

    The default is the name of the created meta table and cannot be modified.

    Datasource

    The default is the type of data source created.

    Data Source Parameters

    Different compute engines support different data sources, and different data sources require different configuration parameters. For more information, see Appendix: Meta table data source configuration parameters.

    Description

    Please enter a description of the meta table, within 1000 characters.

    Meta Table Parameters

    Parameter Name

    Provide different meta table parameters based on the data source type. You can pull down to obtain the meta table parameters supported by the data source and their corresponding descriptions, or you can fill them in manually. If you need to add parameters, you can click Add Parameter.

    The number of parameters does not exceed 50. The parameter name can only be numbers, uppercase and lowercase English letters, underscores (_), hyphens (-), half-width periods (.), half-width colons (:), and forward slashes (/).

    Parameter Value

    Parameter values provide options based on the parameter type. If there are no options, you need to enter them manually. Single quotes are not supported. For example: Parameter Name: address, Parameter Value: Ningbo.

    Actions

    You can click image to delete the corresponding parameter.

    Reference

    Flink Task Name

    The Flink task name that references this meta table will be displayed.

    Note

    Draft tasks are not included in the reference information.

    Default Read During Task Debugging

    Set the default data table to be read during task debugging. You can choose the production table or the development table.

    If you choose to read the production table, the corresponding production table data can be read during debugging, which poses a risk of data leakage. Please operate with caution.

    If the default read production table is set during task debugging, you need to apply for the development and production data source permissions of the personal account. For how to apply for data source permissions, see Apply for data source permissions.

    Note

    Hive tables, Paimon tables do not support debugging.

    Read During Development Environment Testing

    Set the default data table to be read during task testing. You can choose the production table or the development table.

    If you choose to read the production table, the corresponding production table data can be read during testing, which poses a risk of data leakage. Please operate with caution.

    If the default read production table is set during development environment testing, you need to apply for the development and production data source permissions of the personal account. For how to apply for data source permissions, see Apply for data source permissions.

    Write During Development Environment Testing

    Supports selecting the current source table and other test tables. If you select another test table, you need to select the corresponding table.

  2. Click OK.

Step 4: submit or publish the meta table

  1. Click Submit in the top-left menu bar of the meta table page.

  2. In the Submission Remarks dialog box, enter remarks.

  3. Click OK And Submit.

image

If the project mode is Dev-Prod, you must publish the meta table to the production environment. For detailed instructions, see Manage publishing tasks.

Appendix: Meta table data source configuration parameters

Data source

Configuration

Description

MaxCompute

  • Source table

  • blinkType

Source table: The source table of the data.

blinkType: Supports selecting odps or continuous-odps.

  • odps: As a source table for full read, can be used as a sink table.

  • continuous-odps: As a source table for incremental read, cannot be used as a sink table.

  • Tablestore

  • StarRocks

  • Lindorm (wide table)

  • Hologres

  • ClickHouse

  • AnalyticDB for PostgreSQL

  • AnalyticDB for MySQL 3.0

  • Doris

  • Hive

  • Paimon

  • RocketMQ

  • PolarDB-X (formerly DRDS)

  • Aliyun HBase

Source table

Source table: The source table of the data.

SAP HANA

  • Source table

  • Update time field

Source table: The source table of the data.

Update time field: Select the field in the SAP HANA table that is the update time (timestamp type) from the drop-down options, or enter the HANA SQL time string expression, such as concat(column_date,column_time).

  • Log Service

  • DataHub

Source topic

Source topic: The source topic of the data.

  • MySQL

  • PostgreSQL

  • Oracle

  • TiDB

  • OceanBase

  • MongoDB

  • Microsoft SQL Server

  • Source table

  • Read method

  • Source table: The source table of the data.

  • Read method: Supports selecting JDBC read/write table or CDC Source table.

    • JDBC read/write table: Uses JDBC to support table query and table write, used for stream write, batch read/write, etc.

    • CDC Source table: Incremental streaming read.

Kafka

  • Source topic

  • connector

  • Message format

  • Source table: The source table of the data.

  • connector: Supports selecting kafka or upsert-kafka.

  • Message format: Supports csv, json, avro, avro-confluent, debezium-json, canal-json, canal-json-insert, maxwell-json, ogg-json, dataphin-canal-json, raw, or custom input.

  • Primary key message format: When the connector is selected as upsert-kafka, the primary key message format needs to be configured. Supports csv, json, avro, avro-confluent, debezium-json, or custom input.

    Important
    • dataphin-canal-json is used to process data integrated into kafka in real-time.

    • When using custom input, ensure that the custom data format has been packaged into a jar package and uploaded as an additional dependency file to the Flink task. Otherwise, the code may not be referenced or run normally.

Hudi

  • Source table

  • Hudi table type

  • Source table: The source table of the data.

  • Hudi table type : Supports selecting MERGE_ON_READ or COPY_ON_WRITE.

    • MERGE_ON_READ: Low latency for write updates, high latency for reads.

    • COPY_ON_WRITE: High latency for write updates, low latency for reads.

Elasticsearch

  • connector

  • index

  • typeName

  • connector: Supports selecting Elasticsearch, Elasticsearch-6, or Elasticsearch-7.

    • When used as a source table or dimension table, select Elasticsearch.

      Note

      Only Alibaba Cloud Elasticsearch is supported.

    • When used as a sink table, select Elasticsearch-6 or Elasticsearch-7.

  • index: Enter or select the source index.

  • typeName: Enter or select the typeName.

Redis

None

RabbitMQ

  • Exchange

  • Queue

  • Routing tag

  • Exchange: Enter or select the exchange.

  • Queue: Enter or select the queue.

  • Routing tag (optional): Enter the routingKey.

What to do next

After creating the meta table, you can proceed to develop real-time tasks based on it. For more information, see: