This topic describes the data types and parameters that HBase Writer supports and how to configure it by using the code editor.

HBase Writer allows you to write data to HBase data stores. Specifically, HBase Writer connects to a remote HBase data store by using the Java client of HBase. Then, HBase Writer uses the PUT method to write data to the HBase data store.
Notice HBase Writer supports only exclusive resource groups for Data Integration, but not the default resource group or custom resource groups. For more information, see Use exclusive resource groups for data integration and Add a custom resource group.

Features

  • HBase 0.94.x, HBase 1.1.x, and HBase 2.x are supported.
    • If you use HBase 0.94.x, set the hbaseVersion parameter to 094x for HBase Writer.
      "writer": {
              "hbaseVersion": "094x"
          }
    • If you use HBase 1.1.x or HBase 2.x, set the hbaseVersion parameter to 11x for HBase Writer.
      "writer": {
              "hbaseVersion": "11x"
          }
      Note HBase Writer for HBase 1.1.x is compatible with HBase 2.0. If you have any issues in using HBase Writer with HBase 2.0,submit a ticket.
  • You can use concatenated fields as a rowkey.

    HBase Writer can concatenate multiple fields to generate the rowkey of an HBase table.

  • You can set the version of each HBase cell.
    The information that can be used as the version of an HBase cell includes:
    • Current time
    • Specified source column
    • Specified time

Data types

The following table describes the data types that HBase Writer supports.
Note
  • The data types of the specified columns must be the same as those in the HBase table.
  • HBase Writer supports only the data types that are described in the following table.
Category HBase data type
Integer INT, LONG, and SHORT
Floating point FLOAT and DOUBLE
Boolean BOOLEAN
String STRING

Parameters

Parameter Description Required Default value
haveKerberos Specifies whether Kerberos authentication is required. A value of true indicates that Kerberos authentication is required.
Note
  • If the value is true, you must specify the following five Kerberos-related parameters:
    • kerberosKeytabFilePath
    • kerberosPrincipal
    • hbaseMasterKerberosPrincipal
    • hbaseRegionserverKerberosPrincipal
    • hbaseRpcProtection
  • If the value is false, Kerberos authentication is not required and you do not need to specify the preceding parameters.
No false
hbaseConfig The properties of the HBase cluster, in JSON format. The hbase.zookeeper.quorum parameter is required. It specifies the ZooKeeper ensemble servers. You can also configure HBase client properties such as cache and batch for scan operations.
Note You must use the internal endpoint to connect to ApsaraDB for HBase.
Yes N/A
mode The mode in which HBase Writer writes data to the HBase data store. Only the normal mode is supported. The dynamic column selection mode is coming soon. Yes N/A
table The name of the HBase table to which HBase Writer writes data. The name is case-sensitive. Yes N/A
encoding The encoding format in which a string is converted by using byte[]. UTF-8 and GBK are supported. No utf-8
column The columns in the HBase table to which HBase Writer writes data.
  • index: the ID of the column in the source table, starting from 0.
  • name: the name of the column in the HBase table, in the columnFamily:column format.
  • type: the type of the data written, which is used by the byte[] constructor.
Yes N/A
rowkeyColumn The rowkey of each HBase cell.
  • index: the ID of the column in the source table, starting from 0. If the column is a constant, set the value to -1.
  • type: the type of the data written, which is used by the byte[] constructor.
  • value: a constant, which is usually used as the delimiter between fields. HBase Writer sequentially concatenates all columns specified in this parameter to a string, and uses the string as the rowkey. The specified columns cannot be all constants.
Example:
"rowkeyColumn": [
          {
            "index":0,
            "type":"string"
          },
          {
            "index":-1,
            "type":"string",
            "value":"_"
          }
      ]
Yes N/A
versionColumn The version of each HBase cell. You can use the current time, a specified source column, or a specified time point as the version. If you do not specify this parameter, the current time is used.
  • index: the ID of the column in the source table, starting from 0. Make sure that the value can be converted to the LONG type.
  • type: the data type. If the data type is DATE, HBase Writer converts the date to yyyy-MM-dd HH:mm:ss or yyyy-MM-dd HH:mm:ss SSS. If you want to use a specified time as the version, set the value to -1.
  • value: the specified time of the LONG type.
Example:
  • "versionColumn":{
    "index":1
    }
  • "versionColumn":{
    "index":-1,
    "value":123456789
    }
No N/A
nullMode The method of processing null values. Valid values:
  • skip: HBase Writer does not write null values to the HBase data store.
  • empty: HBase Writer writes HConstants.EMPTY_BYTE_ARRAY (new byte [0]) to the HBase data store instead of null values.
No skip
walFlag Specifies whether to enable write ahead logging (WAL) for HBase. If the value is true, all the edits that are requested by an HBase client for all Regions carried by the RegionServer are first recorded in the WAL (that is, the HLog). After the edits are recorded in the WAL, they are implemented to the Memstore and a success notification is sent to the HBase client.

If the edits fail to be recorded in the WAL, a failure notification is sent to the HBase client, and the edits are not implemented to the Memstore. If the value is false, WAL is disabled. This way, HBase Writer can write data more efficiently.

No false
writeBufferSize The write buffer size, in bytes, of the HBase client. If you specify this parameter, you must also specify the autoflush parameter. By default, the value of the autoflush parameter is false.

autoflush:

  • If the value is true, the HBase client sends a PUT request each time it receives an edit.
  • If the value is false, the HBase client sends a PUT request only when its write buffer is full.
No 8M

Configure HBase Writer by using the codeless UI

The codeless user interface (UI) is not supported for HBase Writer.

Configure HBase Writer by using the code editor

The following example shows how to configure a sync node to write data to an HBase 1.1.x data store. For more information, see Create a sync node by using the code editor.
{
    "type":"job",
    "version":"2.0", // The version number.
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"hbase",// The writer type.
            "parameter":{
                "mode":"normal",// The mode in which HBase Writer writes data to the HBase data store.
                "walFlag":"false",// WAL is disabled for HBase.
                "hbaseVersion":"094x",// The HBase version.
                "rowkeyColumn":[// The rowkey of each HBase cell.
                    {
                        "index":"0",// The ID of the column in the source table.
                        "type":"string" // The data type.
                    },
                    {
                        "index":"-1",
                        "type":"string",
                        "value":"_"
                    }
                ],
                "nullMode":"skip",// The method of processing null values.
                "column":[// The columns in the HBase table to which HBase Writer writes data.
                    {
                        "name":"columnFamilyName1:columnName1",// The name of the column in the HBase table.
                        "index":"0",// The ID of the column in the source table.
                        "type":"string" // The data type.
                    },
                    {
                        "name":"columnFamilyName2:columnName2",
                        "index":"1",
                        "type":"string"
                    },
                    {
                        "name":"columnFamilyName3:columnName3",
                        "index":"2",
                        "type":"string"
                    }
                ],
                "writeMode":"api",// The write mode.
                "encoding":"utf-8",// The encoding format.
                "table":"",// The name of the destination table.
                "hbaseConfig":{// The properties of the HBase cluster, in JSON format.
                    "hbase.zookeeper.quorum":"hostname",
                    "hbase.rootdir":"hdfs: //ip:port/database",
                    "hbase.cluster.distributed":"true"
                }
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed.
        },
        "speed":{
            "throttle":false,// Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
            "concurrent":1 // The maximum number of concurrent threads.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}