DataWorks provides HBase Reader and HBase Writer for you to read data from and write data to HBase data sources. You can use the codeless user interface (UI) or code editor to configure synchronization nodes for HBase data sources.

Limits

  • HBase data sources support only Kerberos authentication. Other authentication methods will be available in the future.
  • The following table describes the network connection methods supported by HBase of different editions and versions.
    Edition and version Internet connection VPC connection
    Standard Edition (1.1 and 2.0) Not supported Supported
    Performance-enhanced Edition Supported Supported

Procedure

  1. Go to the Data Source page.
    1. Log on to the DataWorks console.
    2. In the left-side navigation pane, click Workspaces.
    3. After you select the region where the required workspace resides, find the workspace and click Data Integration in the Actions column.
    4. In the left-side navigation pane of the Data Integration page, choose Data Source > Data Sources to go to the Data Source page.
  2. On the Data Source page, click Add data source in the upper-right corner.
  3. In the Add data source dialog box, click HBase in the Big Data Storage section.
  4. In the Add HBase data source dialog box, configure the parameters.
    HBase
    Parameter Description
    Data Source Name The name of the data source. The name can contain letters, digits, and underscores (_) and must start with a letter.
    Data source description The description of the data source. The description can be a maximum of 80 characters in length.
    Environment The environment in which the data source is used. Valid values: Development and Production.
    Note This parameter is displayed only when the workspace is in standard mode.
    Configuration information The configuration information of the HBase cluster.

    You can convert the hbase-site.xml file to the JSON format. Then, add HBase client properties, such as cache and batch for scan, to optimize the interaction between the cluster and the client.

    You must configure different information based on the edition of HBase in use.
    Note Standard Edition (1.1 and 2.0) and Performance-enhanced Edition are supported. For more information about the editions, see ApsaraDB for HBase editions.
    • If you are using the Standard Edition, the default configuration is used. You need only to enter the related ZooKeeper information.
      {
          "hbase.rootdir": "hdfs://localhost:9000/hbase",
          "hbase.zookeeper.quorum": "localhost"
      }
    • If you are using the Performance-enhanced Edition, the endpoint parameter specific to this edition is required, and the zookeeper.quorum parameter is not required.
      The following code provides an example on how to add an HBase data source of the Performance-enhanced Edition (Lindorm):
      Enter the following configurations in the Configuration information field:
      {
      
      "hbase.client.connection.impl" : "com.alibaba.hbase.client.AliHBaseUEConnection",
      
      "hbase.client.endpoint" : "host:30020",
      
      "hbase.client.username" : "root",
      
      "hbase.client.password" : "root"
      
      }
    Special Authentication Method

    Specifies whether identity authentication is required. Default value: None. You can also set this parameter to Kerberos Authentication. For more information about Kerberos authentication, see Configure Kerberos authentication.

    Keytab File

    If you set Special Authentication Method to Kerberos Authentication, you must select the desired keytab file from the Keytab File drop-down list.

    If no keytab file is available, you can click Add Authentication File to upload a keytab file.

    CONF File

    If you set Special Authentication Method to Kerberos Authentication, you must select the desired CONF file from the CONF File drop-down list.

    If no CONF file is available, you can click Add Authentication File to upload a CONF file.

    principal

    The Kerberos principal. Specify this parameter in the format of Principal name/Instance name@Domain name, such as ****/hadoopclient@**.*** .

  5. Set Resource Group connectivity to Data Integration.
  6. Find the desired resource group in the resource group list in the lower part of the dialog box and click Test connectivity in the Actions column.
    A synchronization node can use only one type of resource group. To ensure that your synchronization nodes can be normally run, you must test the connectivity of all the resource groups for Data Integration on which your synchronization nodes will be run. For more information, see Select a network connectivity solution.
    Note
    • Connectivity tests can be performed only for exclusive resource groups for Data Integration. For more information, see Create and use an exclusive resource group for Data Integration.
    • If you are using the Performance-enhanced Edition (Lindorm), an error message indicating that the AliHBase class cannot be found appears. You can ignore this error message.
  7. After the data source passes the connectivity test, click Complete.

What to do next

You have learned how to add an HBase data source. You can proceed to subsequent tutorials. In subsequent tutorials, you will learn how to configure HBase Reader and HBase Writer. For more information, see HBase Reader and HBase Writer.