This topic describes the data types and parameters that are supported by HBase Writer and how to configure HBase Writer by using the codeless user interface (UI) and code editor.

HBase Writer writes data to HBase databases. HBase Writer connects to a remote HBase database by using a Java client of HBase and uses the PUT method to write data to the HBase database.

Limits

Supported features

  • HBase Writer can write data to HBase 0.94.X, HBase 1.1.X, and HBase 2.X.
    • If you use HBase 0.94.X, set the hbaseVersion parameter to 094x.
      "writer": {
              "hbaseVersion": "094x"
          }
    • If you use HBase 1.1.X or HBase 2.X, set the hbaseVersion parameter to 11x.
      "writer": {
              "hbaseVersion": "11x"
          }
      Note HBase 1.1.X Writer is compatible with HBase 2.0. If you have questions when you use HBase Writer, submit a ticket.
  • You can use concatenated fields as a rowkey.

    HBase Writer can concatenate multiple fields to generate the rowkey of an HBase table.

  • You can specify the version of each HBase cell.
    Information that can be used as the version of an HBase cell:
    • Current time
    • Specific source column
    • Specific time

Data types

The following table lists the data types that are supported by HBase Writer.
Note
  • The data types of specified columns must be the same as those in an HBase table.
  • Data types that are not listed in the following table are not supported.
Category HBase data type
Integer INT, LONG, and SHORT
Floating point FLOAT and DOUBLE
Boolean BOOLEAN
String STRING

Parameters

Parameter Description Required Default value
haveKerberos Specifies whether Kerberos authentication is required. Valid values: true and false.
Note
  • If you set this parameter to true, Kerberos authentication is required, and you must configure the following parameters that are related to Kerberos authentication:
    • kerberosKeytabFilePath
    • kerberosPrincipal
    • hbaseMasterKerberosPrincipal
    • hbaseRegionserverKerberosPrincipal
    • hbaseRpcProtection
  • If you set this parameter to false, Kerberos authentication is not required, and you do not need to configure the preceding parameters.
No false
hbaseConfig The properties of the HBase cluster, in the JSON format. The hbase.zookeeper.quorum parameter is required. It specifies the ZooKeeper address of the HBase cluster. You can also configure other properties, such as those related to the cache and batch for scan operations.
Note You must use an internal endpoint to access an ApsaraDB for HBase database.
Yes No default value
mode The write mode. Only the normal mode is supported. The dynamic column mode will be available in the future. Yes No default value
table The name of the HBase table to which you want to write data. The name is case-sensitive. Yes No default value
encoding The encoding format that is used to convert a string to the HBase byte[] format. Valid values: utf-8 and gbk. No utf-8
column The names of the columns to which you want to write data.
  • index: the ID of a column in the source table, starting from 0.
  • name: the name of a column in the HBase table. Specify this parameter in the format of Column family:Column name.
  • type: the data type. The value of this parameter is used by the HBase byte[] constructor.
Yes No default value
rowkeyColumn The rowkey column of each row in the HBase table.
  • index: the ID of a column in the source table, starting from 0. If the column is a constant, set this parameter to -1.
  • type: the data type. The value of this parameter is used by the HBase byte[] constructor.
  • value: a constant, which is usually used as the delimiter between fields. HBase Writer concatenates all columns that are specified in this parameter to a string in the same order that the columns are specified. Then, HBase Writer uses the string as the rowkey. The specified columns cannot be all constants.
The following code provides a configuration example:
"rowkeyColumn": [
          {
            "index":0,
            "type":"string"
          },
          {
            "index":-1,
            "type":"string",
            "value":"_"
          }
      ]
Yes No default value
versionColumn The version of each HBase cell. You can use the current time, specific time, or a specific source column as the version. If you do not specify this parameter, the current time is used.
  • index: the ID of a column in the source table, starting from 0. Make sure that the value can be converted to the LONG data type.
  • type: the data type. If the data type is DATE, HBase Writer converts the date to the yyyy-MM-dd HH:mm:ss or yyyy-MM-dd HH:mm:ss SSS format. If you want to use the specified time as the version, set this parameter to -1.
  • value: the specified time of the LONG data type.
The following code provides a configuration example:
  • "versionColumn":{
    "index":1
    }
  • "versionColumn":{
    "index":-1,
    "value":123456789
    }
No No default value
nullMode The method used to process null values. Valid values:
  • skip: HBase Writer does not write null values to HBase.
  • empty: HBase Writer writes HConstants.EMPTY_BYTE_ARRAY (new byte [0]) to HBase instead of null values.
No skip
walFlag Specifies whether to enable write-ahead logging (WAL) for HBase. If you set this parameter to true, WAL is enabled. All the edits that are requested by an HBase client for all regions carried by the RegionServer are first recorded in the WAL log file (HLog). After the edits are recorded in the WAL log file, they are implemented to the MemStore, and a success notification is sent to the HBase client.

If the edits fail to be recorded in the WAL log file, a failure notification is sent to the HBase client, and the edits are not implemented to the MemStore. If you set this parameter to false, WAL is disabled. This way, HBase Writer can write data more efficiently.

No false
writeBufferSize The write buffer size, in bytes, of the HBase client. If you specify this parameter, you must also specify the autoflush parameter. By default, the value of the autoflush parameter is false.

autoflush:

  • If the value is true, the HBase client sends a PUT request each time it receives an edit.
  • If the value is false, the HBase client sends a PUT request only when its write buffer is full.
No 8M

Configure HBase Writer by using the codeless UI

This method is not supported.

Configure HBase Writer by using the code editor

In the following code, a synchronization node is configured to write data to HBase 1.1.X. For more information about how to configure a synchronization node by using the code editor, see Create a sync node by using the code editor.
{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"hbase",// The writer type. 
            "parameter":{
                "mode":"normal",// The write mode. 
                "walFlag":"false",// WAL is disabled for HBase. 
                "hbaseVersion":"094x",// The HBase version. 
                "rowkeyColumn":[// The rowkey column of each row in the HBase table. 
                    {
                        "index":"0",// The ID of a column in the source table. 
                        "type":"string"// The data type. 
                    },
                    {
                        "index":"-1",
                        "type":"string",
                        "value":"_"
                    }
                ],
                "nullMode":"skip",// The method used to process null values. 
                "column":[// The names of the columns to which you want to write data. 
                    {
                        "name":"columnFamilyName1:columnName1",// The name of a column in the HBase table. 
                        "index":"0",// The ID of a column in the source table. 
                        "type":"string"// The data type. 
                    },
                    {
                        "name":"columnFamilyName2:columnName2",
                        "index":"1",
                        "type":"string"
                    },
                    {
                        "name":"columnFamilyName3:columnName3",
                        "index":"2",
                        "type":"string"
                    }
                ],
                "encoding":"utf-8",// The encoding format. 
                "table":"",// The name of the table to which you want to write data. 
                "hbaseConfig":{// The properties of the HBase cluster, in the JSON format. 
                    "hbase.zookeeper.quorum":"hostname",
                    "hbase.rootdir":"hdfs: //ip:port/database",
                    "hbase.cluster.distributed":"true"
                }
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed. 
        },
        "speed":{
            "throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1, // The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}