All Products
Search
Document Center

Dataphin:Metadata center settings

Last Updated:Feb 06, 2025

The Metadata Center, operated within the metadata warehouse tenant, executes all metadata acquisition tasks. To utilize the Metadata Center feature, initialize the Metadata Center settings in the metadata warehouse tenant and define the compute source information for metadata acquisition task execution. This topic guides you through the setup process for the Metadata Center.

Limits

  • The compute engine selected for the Metadata Center must match the engine type specified in the metadata warehouse.

  • The Metadata Center feature is compatible with several compute engines, including MaxCompute, E-MapReduce5.x Hadoop, E-MapReduce3.x Hadoop, CDH5.x Hadoop, CDH6.x Hadoop, Cloudera Data Platform 7.x, Huawei FusionInsight 8.x Hadoop, and AsiaInfo DP5.3 Hadoop.

  • Once the Metadata Center is initialized, reinitialization is not possible.

Permission description

Only the super administrator or system administrator of the metadata warehouse tenant can perform the Metadata Center initialization configuration.

Glossary

  • Metadata: Information about data, encompassing technical, business, and management aspects. It details data attributes, origins, formats, and relationships to aid in data retrieval, utilization, and maintenance.

  • Metadata Center: A system dedicated to extracting, processing, storing, and managing metadata from various business systems, supporting data governance and improving data organization, retrieval, and analysis within the organization.

Metadata center initialization configuration

  1. Sign in to the metadata warehouse tenant using the super administrator or system administrator account.

  2. Navigate to Management Center > System Settings from the top menu bar on the Dataphin home page.

  3. Under the left-side navigation pane, click System Operations And Maintenance, then select Metadata Center Settings to access the Metadata Center Init Configuration page.

  4. Choose the compute source type for Metadata Center initialization based on the compute engine configured in the metadata warehouse, with support for MaxCompute and Hadoop engines.

    MaxCompute

    Parameter

    Description

    Compute Source Type

    Select the MaxCompute compute engine.

    Endpoint

    Configure the endpoint for the MaxCompute region where the Dataphin instance is located. For details on MaxCompute endpoints across different regions and network types, refer to MaxCompute Endpoints.

    Project Name

    This refers to the name of the MaxCompute project, not the DataWorks workspace name.

    To view the specific MaxCompute project name, log on to the MaxCompute console, switch the region in the upper left corner, and navigate to the project management tab.

    AccessKey ID, Access Key Secret

    Enter the AccessKey ID and AccessKey Secret for the account with access to the MaxCompute project.

    The AccessKey ID and AccessKey Secret can be obtained from the User Information Management page.

    • To maintain a normal connection between the Dataphin project space and the MaxCompute project, it is recommended to use the AccessKey of the MaxCompute project administrator.

    • To ensure uninterrupted metadata acquisition, avoid modifying the AccessKey of the MaxCompute project.

    Hadoop

    • Compute Source Type:

      • HDFS Cluster Storage: Supports the selection of E-MapReduce5.x Hadoop, E-mapreduce3.x Hadoop, CDH5.x Hadoop, CDH6.x Hadoop, Cloudera Data Platform 7.x, Huawei Fusioninsight 8.x Hadoop, and Asiainfo DP5.3 Hadoop compute engines.

      • OSS-HDFS Cluster Storage: Only supports the E-mapreduce5.x Hadoop compute engine.

    • Cluster Configuration

      HDFS Cluster Storage

      Parameter

      Description

      NameNode

      The NameNode manages the file system namespace and client access privileges in HDFS.

      1. Click Add.

      2. In the Add Namenode Dialog Box, input the NameNode's hostname and port number, then click OK.

        After filling in the necessary information, the corresponding format, such as host=hostname,webUiPort=50070,ipcPort=8020, is automatically generated.

      Configuration File

      • Upload cluster configuration files to set cluster parameters. The system supports uploading core-site.xml, hdfs-site.xml, and other configuration files.

      • To use the HMS method for metadata retrieval, you must upload hdfs-site.xml, hive-site.xml, core-site.xml, and hivemetastore-site.xml. For FusionInsight 8.X and E-MapReduce5.x Hadoop compute engines, the hivemetastore-site.xml file is also required.

      History Log

      Set the log path for the cluster, such as tmp/hadoop-yarn/staging/history/done.

      Authentication Type

      Supports No Authentication and Kerberos authentication methods. Kerberos, a symmetric key-based identity authentication protocol, is commonly used for cluster component authentication and enhances security when enabled.

      If Kerberos authentication is enabled, configure the following parameters:

      • Kerberos Configuration Method

        • KDC Server: Enter the KDC's unified service address to facilitate Kerberos authentication.

        • Krb5 File Configuration: Upload the Krb5 file required for Kerberos authentication.

      • HDFS Configuration

        • HDFS Keytab File: Upload the HDFS Keytab file.

        • HDFS Principal: Enter the Principal name for Kerberos authentication, such as XXXX/hadoopclient@xxx.xxx.

      OSS-HDFS Cluster Storage

      Parameter

      Description

      Cluster Storage

      Determine the cluster storage type using the following methods:

      • Before Cluster Creation: The cluster storage type can be viewed on the E-MapReduce5.x Hadoop cluster creation page.

      • After Cluster Creation: The cluster storage type can be found on the details page of the E-MapReduce5.x Hadoop cluster.

      Cluster Storage Root Directory

      Enter the root directory for the cluster storage, which can be obtained from the E-MapReduce5.x Hadoop cluster information.

      Important

      If the entered path includes an endpoint, Dataphin will default to using that endpoint. If not, the Bucket-level endpoint configured in core-site.xml will be used. If the Bucket-level endpoint is not configured, the global endpoint in core-site.xml will be used. For more details, see Alibaba Cloud OSS-HDFS Service (JindoFS Service) Endpoint Configuration.

      Configuration File

      Upload cluster configuration files to set cluster parameters. The system supports uploading core-site.xml, hive-site.xml, and other configuration files. To use the HMS method for metadata retrieval, the hive-site.xml, core-site.xml, and hivemetastore-site.xml files must be uploaded.

      History Log

      Set the log path for the cluster, for example, tmp/hadoop-yarn/staging/history/done.

      AccessKey ID, AccessKey Secret

      Enter the AccessKey ID and AccessKey Secret for accessing the OSS cluster. For information about AccessKey, refer to View AccessKey.

      Important

      The AccessKey configuration here takes precedence over the AccessKey set in core-site.xml.

      Authentication Type

      Supports No Authentication and Kerberos authentication methods. Kerberos, a symmetric key-based identity authentication protocol, is commonly used for cluster component authentication and enhances security when enabled. If Kerberos authentication is chosen, the Krb5 file for Kerberos authentication must be uploaded.

    • Hive Configuration

      Parameter

      Description

      JDBC URL

      Provide the JDBC URL for Hive connectivity.

      Authentication Type

      For clusters without authentication, Hive supports No Authentication and LDAP as authentication methods.

      For clusters with Kerberos authentication, Hive supports No Authentication, LDAP, and Kerberos.

      Note

      When the compute engine is E-MapReduce3.x, E-MapReduce5.x, Cloudera Data Platform 7.x, AsiaInfo DP5.3, or Huawei FusionInsight 8.X, the authentication method can be configured.

      Username, Password

      Enter the username and password for Hive access.

      • No Authentication: Only the username is required.

      • LDAP Authentication: Both the username and password are required.

      • Kerberos Authentication: No credentials are necessary.

      Hive Keytab File

      This parameter is required when Kerberos authentication is enabled. Upload the keytab file obtained from the Hive Server.

      Upload the keytab file, which is obtainable from the Hive Server.

      Hive Principal

      Configure this parameter once Kerberos authentication is enabled.

      Enter the Principal name that corresponds to the Hive Keytab File used for Kerberos authentication. For instance, XXXX/hadoopclient@xxx.xxx.

      Execution Engine

      Choose the appropriate execution engine based on the compute engine in use. Supported execution engines vary by compute engine as follows:

      • E-MapReduce 3.X: Supports MapReduce and Spark.

      • E-MapReduce 5.X: Supports MapReduce and Tez.

      • CDH 5.X: Supports MapReduce.

      • CDH 6.X: Supports MapReduce, Spark, and Tez.

      • FusionInsight 8.X: Supports MapReduce.

      • AsiaInfo DP 5.3 Hadoop: Supports MapReduce.

      • Cloudera Data Platform 7.x: Supports Tez.

      Note

      After setting the execution engine, the compute settings, compute source, tasks, and other elements of the metadata warehouse tenant will use the specified Hive execution engine. Reinitialization will reset these elements to the newly set execution engine.

    • Metadata Retrieval Method

      Metadata can be retrieved using either the Metadatabase or HMS (Hive Metastore Service) method. The configuration details for each method are as follows:

      • Metadatabase Retrieval Method

        Parameter

        Description

        Database Type

        The Hive metadatabase currently supports only MySQL as the database type.

        Compatible MySQL versions include: MySQL 5.1.43, MySQL 5.6/5.7, and MySQL 8.

        JDBC URL

        Enter the JDBC URL for the target database. For instance, the format for the connection address is jdbc:mysql://host:port/dbname.

        Username, Password

        Provide the username and password for the target database.

      • HMS Retrieval Method

        For metadata retrieval using the HMS method, after enabling Kerberos, you must upload the Keytab File and specify the Principal.

        Parameter

        Description

        Keytab File

        Upload the Keytab file necessary for the Kerberos authentication of the Hive metastore.

        Principal

        Enter the Principal for the Kerberos authentication of the Hive metastore.

  5. After entering the required information, click Connection Test to verify connectivity with Dataphin.

  6. Once the connection test is successful, click Confirm And Start Initialization to check permissions and the metadata warehouse initialization configuration.

    Permissions: Confirm that the user performing this operation holds either super administrator or system administrator roles within the metadata warehouse tenant.

    Metadata Warehouse Initialization Configuration: Ensure the metadata warehouse initialization has been configured successfully.

  7. After successful verification, the initialization process begins, creating compute sources, projects, data sources, and initializing DDL statements. Once complete, the Metadata Center initialization settings are finalized.

References

Upon completing the Metadata Center initialization settings, you can begin collecting metadata from databases into Dataphin for analysis and management. For more information, see Create and manage metadata acquisition tasks.