Metadata supports various collection source types, including traditional databases such as MySQL and Oracle, big data storage systems such as Hive, Hologres, and application systems. You can view information about the number of collection tasks created and the types of collection objects for different data source types or application systems.
Prerequisites
You need to create an application system in Management Hub > Datasource Management > Application System before you can use the application system type as a collection source.
Limits
By default, metadata collection for relational databases is supported. To collect metadata from other data source types, you need to purchase the corresponding features.
In versions earlier than 5.3, collecting metadata from some data sources, such as AnalyticDB for MySQL 3.0, PolarDB-X (formerly DRDS), SAP HANA, and Hologres, required you to initialize the Metadata Center in the metadata warehouse tenant. In version 5.3 and later, this initialization is not required, and you can configure collection tasks directly.
Metadata collection workflow description
If the network environment of the data source is not connected to the network environment where the Dataphin cluster is located, you need to rely on the registered scheduling cluster feature. The collected data will be written to the Object Storage Service system (such as OSS) that Dataphin deployment depends on as a transit, and then written to the Dataphin system. This will incur additional storage costs.
Procedure
In the top menu bar of the Dataphin homepage, select Administration > Metadata.
In the navigation pane on the left, select Metadata Collection > Collection Overview.
On the Welcome To Metadata Collection And Management page, Dataphin displays information such as the number of collection tasks configured for different data sources or application systems and the supported collection object types in card format.
Data Source: Supports various data source types, such as relational databases and big data storage databases. For more information, see Data sources supported by Dataphin.
The supported versions of MySQL, Oracle, and Hive (MySQL metadatabase, HMS metadata) are as follows:
MySQL: MySQL 5.1.43, MySQL 5.6/5.7, MySQL 8, and RDS MySQL.
Oracle: Oracle 11g, Oracle 12c, Oracle 18c, Oracle 19c, Oracle 21c, and Oracle 23c.
Hive (MySQL metadatabase, HMS metadata): CDH 5.x Hive 1.1.0, EMR 3.x Hive 2.3.5, EMR 5.x 3.1.x, CDH 6.x Hive 2.1.1, FusionInsight 8.x Hive 3.1.0, CDP 7.x Hive 3.1.3, and AsiaInfo DP 5.x Hive 3.1.0.
Application System: Supports Quick BI.
You can quickly create collection tasks for the target data source or application system.
Create Collection Task: Hover over a card to quickly create a collection task. For more information, see Create and manage metadata collection tasks.
NoteOnly one collection task can be configured for a data source. Two different environment sources (development environment and production environment) of the same data source can be configured with separate collection tasks.