All Products
Search
Document Center

Dataphin:Create an HBase data source

Last Updated:May 28, 2025

By creating an HBase data source, you can read business data from HBase or write data to HBase in Dataphin. This topic describes how to create an HBase data source.

Prerequisites

You need to purchase and activate the high availability feature of DataService Studio or Tag Service to configure active/standby links for a data source.

Background information

HBase is an SQL query engine used to process large amounts of data stored in Hadoop clusters. If you use HBase, you need to create an HBase data source before you can export Dataphin data to HBase.

Permission requirements

Only custom global roles with the Create Data Source permission and the super administrator, data source administrator, domain architect, and project administrator roles can create data sources.

Limits

  • You can create only HBase data sources of versions 0.9.4, 1.1.x, and 2.x.

  • During the connectivity test, the system first checks the connectivity of the active link. After the active link passes the test, the system checks the connectivity of the standby link. If the active link fails the connectivity test, the system does not check the connectivity of the standby link.

Procedure

  1. On the Dataphin homepage, click Management Center > Datasource Management in the top navigation bar.

  2. On the Datasource page, click +Create Data Source.

  3. On the Create Data Source page, select HBase in the NoSQL section.

    If you have recently used HBase, you can also select HBase in the Recently Used section. You can also enter keywords in the search box to quickly find HBase.

  4. On the Create HBase Data Source page, configure the parameters for connecting to the data source.

    1. Configure the basic information of the data source.

      Parameter

      Description

      Datasource Name

      Enter a name for the data source. The name must meet the following requirements:

      • It can contain only Chinese characters, letters, digits, underscores (_), and hyphens (-).

      • It cannot exceed 64 characters in length.

      Datasource Code

      After you configure the data source code, you can access Dataphin data source tables in Flink_SQL tasks or by using the Dataphin JDBC client in the format of Data source code.Table name or Data source code.Schema.Table name for quick consumption. If you need to automatically switch data sources based on the task execution environment, use the variable format ${Data source code}.table or ${Data source code}.schema.table for access. For more information, see Development method for Dataphin data source tables.

      Important
      • The data source code cannot be modified after it is configured successfully.

      • After the data source code is configured successfully, you can preview data on the object details page in the asset directory and asset inventory.

      • In Flink SQL, only MySQL, Hologres, MaxCompute, Oracle, StarRocks, Hive, and SelectDB data sources are currently supported.

      Version

      HBase 2.x data sources support the following versions:

      • CDH5:1.2.0.

      • CDP7.1.3:2.2.3.

      • AsiaInfo DP5.x HBase 2.x.

      • EMR HBase 2.x.

        Note

        HBase 0.9.4 and HBase 1.1.x do not support version configuration.

      Data Source Description

      Enter a brief description of the data source. The description cannot exceed 128 characters.

      Data Source Configuration

      Select the data source that you want to configure:

      • If your business data source distinguishes between production and development data sources, select Production + Development Data Source.

      • If your business data source does not distinguish between production and development data sources, select Production Data Source.

      Active/Standby Links

      This parameter can be configured only when Service high availability or Tag Service high availability is enabled. Options include Active Link Only and Active/Standby Links.

      • Active Link Only: You need to configure only one HBase service. All read and write operations of the data source are performed through the active link. In this mode, if the active link fails, high availability is not supported.

      • Active/Standby Links: You need to configure two HBase services. If the active link fails, the standby link can take over and continue to run normally. This mode supports high availability.

      Tag

      You can categorize and tag data sources based on tags. For information about how to create tags, see Manage data source tags.

    2. Configure the connection parameters between the data source and Dataphin.

      If you select Production + Development Data Source for Data Source Configuration, you need to configure the connection information for both Production + Development Data Source. If you select Production Data Source, you only need to configure the connection information for the Production Data Source.

      Note

      Typically, production and development data sources should be configured as separate data sources to achieve environment isolation and reduce the impact of development operations on production. However, Dataphin also supports configuring them as the same data source with identical parameter values.

      Parameter

      Description

      Active Link

      Connection Url

      Enter the connection address of the active link. The format is hb-proxy-{host}-{port}.hbase.rds.aliyuncs.com.

      Namespace

      Optional. Enter the namespace of the active link HBase.

      Configuration File

      Upload the hbase-site.xml configuration file for the active link HBase.

      Connection Parameters

      Configure the parameters for connecting to the active link HBase in JSON format.

      Kerberos

      Kerberos is an identity authentication protocol based on symmetric key technology that provides authentication for other services and supports SSO (Single Sign-On, which means that after client authentication, you can access multiple services such as HBase and HDFS):

      • If the Hadoop cluster has Kerberos authentication, you need to enable Kerberos.

        • After enabling Kerberos, you need to configure the following parameters:

          • Kerberos configuration method: Supports KDC Server and Krb5 File Configuration methods.

            • KDC Server Address: Configure the KDC server address to assist with Kerberos authentication.

              Note

              You can configure multiple KDC Server addresses, separated by commas ,.

            • Krb5 File Configuration: You need to upload the Krb5 file containing the Kerberos authentication domain name.

          • Keytab File: Upload the Keytab file for Kerberos authentication.

          • Principal: Configure the Principal name for Kerberos authentication.

      • If the Hadoop cluster does not have Kerberos authentication, you do not need to enable Kerberos.

      Standby Link

      The standby link can be configured in active/standby link mode under Production Data Source. After configuring the active link information, you can click Copy Active Link Configuration for quick configuration.

      Connection Address

      Enter the connection address of the standby is hb-proxy-{host}-{port}.hbase.rds.aliyuncs.com.

      Namespace

      Optional. Enter the namespace of the standby link HBase.

      Configuration File

      Upload the hbase-site.xml configuration file for the standby link HBase.

      Connection Parameters

      Configure the parameters for connecting to the standby link HBase in JSON format.

      Kerberos

      The system keeps the Kerberos configuration consistent with the active link by default and does not support modifications. When Kerberos is enabled for the active link, you can customize the Keytab File and Principal.

  5. Select Default Resource Group, which is used to run tasks related to the current data source, including database SQL, offline database migration, data preview, and more.

  6. Click Test Connection or directly click OK to save and complete the creation of the HBase data source.

    Click Test Connection to test whether the data source can connect to Dataphin normally. If you directly click OK, the system automatically tests the connection for all selected clusters. However, even if all selected clusters fail the connection test, the data source can still be created normally.