The Metadata Center, operated within the metadata warehouse tenant, executes all metadata acquisition tasks. To utilize the Metadata Center feature, initialize the Metadata Center settings in the metadata warehouse tenant and define the compute source information for metadata acquisition task execution. This topic guides you through the setup process for the Metadata Center.
Limits
The compute engine selected for the Metadata Center must match the engine type specified in the metadata warehouse.
The Metadata Center feature is compatible with several compute engines, including MaxCompute, E-MapReduce5.x Hadoop, E-MapReduce3.x Hadoop, CDH5.x Hadoop, CDH6.x Hadoop, Cloudera Data Platform 7.x, Huawei FusionInsight 8.x Hadoop, and AsiaInfo DP5.3 Hadoop.
Once the Metadata Center is initialized, reinitialization is not possible.
Permission description
Only the super administrator or system administrator of the metadata warehouse tenant can perform the Metadata Center initialization configuration.
Glossary
Metadata: Information about data, encompassing technical, business, and management aspects. It details data attributes, origins, formats, and relationships to aid in data retrieval, utilization, and maintenance.
Metadata Center: A system dedicated to extracting, processing, storing, and managing metadata from various business systems, supporting data governance and improving data organization, retrieval, and analysis within the organization.
Metadata center initialization configuration
Sign in to the metadata warehouse tenant using the super administrator or system administrator account.
Navigate to Management Center > System Settings from the top menu bar on the Dataphin home page.
Under the left-side navigation pane, click System Operations And Maintenance, then select Metadata Center Settings to access the Metadata Center Init Configuration page.
Choose the compute source type for Metadata Center initialization based on the compute engine configured in the metadata warehouse, with support for MaxCompute and Hadoop engines.
MaxCompute
Parameter
Description
Compute Source Type
Select the MaxCompute compute engine.
Endpoint
Configure the endpoint for the MaxCompute region where the Dataphin instance is located. For details on MaxCompute endpoints across different regions and network types, refer to MaxCompute Endpoints.
Project Name
This refers to the name of the MaxCompute project, not the DataWorks workspace name.
To view the specific MaxCompute project name, log on to the MaxCompute console, switch the region in the upper left corner, and navigate to the project management tab.
AccessKey ID, Access Key Secret
Enter the AccessKey ID and AccessKey Secret for the account with access to the MaxCompute project.
The AccessKey ID and AccessKey Secret can be obtained from the User Information Management page.
To maintain a normal connection between the Dataphin project space and the MaxCompute project, it is recommended to use the AccessKey of the MaxCompute project administrator.
To ensure uninterrupted metadata acquisition, avoid modifying the AccessKey of the MaxCompute project.
Hadoop
Compute Source Type:
HDFS Cluster Storage: Supports the selection of E-MapReduce5.x Hadoop, E-mapreduce3.x Hadoop, CDH5.x Hadoop, CDH6.x Hadoop, Cloudera Data Platform 7.x, Huawei Fusioninsight 8.x Hadoop, and Asiainfo DP5.3 Hadoop compute engines.
OSS-HDFS Cluster Storage: Only supports the E-mapreduce5.x Hadoop compute engine.
Cluster Configuration
HDFS Cluster Storage
Parameter
Description
NameNode
The NameNode manages the file system namespace and client access privileges in HDFS.
Click Add.
In the Add Namenode Dialog Box, input the NameNode's hostname and port number, then click OK.
After filling in the necessary information, the corresponding format, such as
host=hostname,webUiPort=50070,ipcPort=8020, is automatically generated.
Configuration File
Upload cluster configuration files to set cluster parameters. The system supports uploading core-site.xml, hdfs-site.xml, and other configuration files.
To use the HMS method for metadata retrieval, you must upload hdfs-site.xml, hive-site.xml, core-site.xml, and hivemetastore-site.xml. For FusionInsight 8.X and E-MapReduce5.x Hadoop compute engines, the hivemetastore-site.xml file is also required.
History Log
Set the log path for the cluster, such as
tmp/hadoop-yarn/staging/history/done.Authentication Type
Supports No Authentication and Kerberos authentication methods. Kerberos, a symmetric key-based identity authentication protocol, is commonly used for cluster component authentication and enhances security when enabled.
If Kerberos authentication is enabled, configure the following parameters:
Kerberos Configuration Method
KDC Server: Enter the KDC's unified service address to facilitate Kerberos authentication.
Krb5 File Configuration: Upload the Krb5 file required for Kerberos authentication.
HDFS Configuration
HDFS Keytab File: Upload the HDFS Keytab file.
HDFS Principal: Enter the Principal name for Kerberos authentication, such as
XXXX/hadoopclient@xxx.xxx.
OSS-HDFS Cluster Storage
Parameter
Description
Cluster Storage
Determine the cluster storage type using the following methods:
Before Cluster Creation: The cluster storage type can be viewed on the E-MapReduce5.x Hadoop cluster creation page.
After Cluster Creation: The cluster storage type can be found on the details page of the E-MapReduce5.x Hadoop cluster.
Cluster Storage Root Directory
Enter the root directory for the cluster storage, which can be obtained from the E-MapReduce5.x Hadoop cluster information.
ImportantIf the entered path includes an endpoint, Dataphin will default to using that endpoint. If not, the Bucket-level endpoint configured in core-site.xml will be used. If the Bucket-level endpoint is not configured, the global endpoint in core-site.xml will be used. For more details, see Alibaba Cloud OSS-HDFS Service (JindoFS Service) Endpoint Configuration.
Configuration File
Upload cluster configuration files to set cluster parameters. The system supports uploading core-site.xml, hive-site.xml, and other configuration files. To use the HMS method for metadata retrieval, the hive-site.xml, core-site.xml, and hivemetastore-site.xml files must be uploaded.
History Log
Set the log path for the cluster, for example,
tmp/hadoop-yarn/staging/history/done.AccessKey ID, AccessKey Secret
Enter the AccessKey ID and AccessKey Secret for accessing the OSS cluster. For information about AccessKey, refer to View AccessKey.
ImportantThe AccessKey configuration here takes precedence over the AccessKey set in core-site.xml.
Authentication Type
Supports No Authentication and Kerberos authentication methods. Kerberos, a symmetric key-based identity authentication protocol, is commonly used for cluster component authentication and enhances security when enabled. If Kerberos authentication is chosen, the Krb5 file for Kerberos authentication must be uploaded.
Hive Configuration
Parameter
Description
JDBC URL
Provide the JDBC URL for Hive connectivity.
Authentication Type
For clusters without authentication, Hive supports No Authentication and LDAP as authentication methods.
For clusters with Kerberos authentication, Hive supports No Authentication, LDAP, and Kerberos.
NoteWhen the compute engine is E-MapReduce3.x, E-MapReduce5.x, Cloudera Data Platform 7.x, AsiaInfo DP5.3, or Huawei FusionInsight 8.X, the authentication method can be configured.
Username, Password
Enter the username and password for Hive access.
No Authentication: Only the username is required.
LDAP Authentication: Both the username and password are required.
Kerberos Authentication: No credentials are necessary.
Hive Keytab File
This parameter is required when Kerberos authentication is enabled. Upload the keytab file obtained from the Hive Server.
Upload the keytab file, which is obtainable from the Hive Server.
Hive Principal
Configure this parameter once Kerberos authentication is enabled.
Enter the Principal name that corresponds to the Hive Keytab File used for Kerberos authentication. For instance,
XXXX/hadoopclient@xxx.xxx.Execution Engine
Choose the appropriate execution engine based on the compute engine in use. Supported execution engines vary by compute engine as follows:
E-MapReduce 3.X: Supports MapReduce and Spark.
E-MapReduce 5.X: Supports MapReduce and Tez.
CDH 5.X: Supports MapReduce.
CDH 6.X: Supports MapReduce, Spark, and Tez.
FusionInsight 8.X: Supports MapReduce.
AsiaInfo DP 5.3 Hadoop: Supports MapReduce.
Cloudera Data Platform 7.x: Supports Tez.
NoteAfter setting the execution engine, the compute settings, compute source, tasks, and other elements of the metadata warehouse tenant will use the specified Hive execution engine. Reinitialization will reset these elements to the newly set execution engine.
Metadata Retrieval Method
Metadata can be retrieved using either the Metadatabase or HMS (Hive Metastore Service) method. The configuration details for each method are as follows:
Metadatabase Retrieval Method
Parameter
Description
Database Type
The Hive metadatabase currently supports only MySQL as the database type.
Compatible MySQL versions include: MySQL 5.1.43, MySQL 5.6/5.7, and MySQL 8.
JDBC URL
Enter the JDBC URL for the target database. For instance, the format for the connection address is
jdbc:mysql://host:port/dbname.Username, Password
Provide the username and password for the target database.
HMS Retrieval Method
For metadata retrieval using the HMS method, after enabling Kerberos, you must upload the Keytab File and specify the Principal.
Parameter
Description
Keytab File
Upload the Keytab file necessary for the Kerberos authentication of the Hive metastore.
Principal
Enter the Principal for the Kerberos authentication of the Hive metastore.
After entering the required information, click Connection Test to verify connectivity with Dataphin.
Once the connection test is successful, click Confirm And Start Initialization to check permissions and the metadata warehouse initialization configuration.
Permissions: Confirm that the user performing this operation holds either super administrator or system administrator roles within the metadata warehouse tenant.
Metadata Warehouse Initialization Configuration: Ensure the metadata warehouse initialization has been configured successfully.
After successful verification, the initialization process begins, creating compute sources, projects, data sources, and initializing DDL statements. Once complete, the Metadata Center initialization settings are finalized.
References
Upon completing the Metadata Center initialization settings, you can begin collecting metadata from databases into Dataphin for analysis and management. For more information, see Create and manage metadata acquisition tasks.