This topic describes the data types and parameters that HBase Writer supports and how to configure it by using the code editor.
HBase Writer allows you to write data to HBase data stores. Specifically, HBase Writer
connects to a remote HBase data store by using the Java client of HBase. Then, HBase
Writer uses the PUT method to write data to the HBase data store.
Notice HBase Writer supports only exclusive resource groups for Data Integration, but not
the default resource group or custom resource groups. For more information, see Use exclusive resource groups for data integration and Add a custom resource group.
Features
- HBase 0.94.x, HBase 1.1.x, and HBase 2.x are supported.
- If you use HBase 0.94.x, set the hbaseVersion parameter to 094x for HBase Writer.
"writer": { "hbaseVersion": "094x" }
- If you use HBase 1.1.x or HBase 2.x, set the hbaseVersion parameter to 11x for HBase Writer.
"writer": { "hbaseVersion": "11x" }
Note HBase Writer for HBase 1.1.x is compatible with HBase 2.0. If you have any issues in using HBase Writer with HBase 2.0,submit a ticket.
- If you use HBase 0.94.x, set the hbaseVersion parameter to 094x for HBase Writer.
- You can use concatenated fields as a rowkey.
HBase Writer can concatenate multiple fields to generate the rowkey of an HBase table.
- You can set the version of each HBase cell.
The information that can be used as the version of an HBase cell includes:
- Current time
- Specified source column
- Specified time
Data types
The following table describes the data types that HBase Writer supports.
Note
- The data types of the specified columns must be the same as those in the HBase table.
- HBase Writer supports only the data types that are described in the following table.
Category | HBase data type |
---|---|
Integer | INT, LONG, and SHORT |
Floating point | FLOAT and DOUBLE |
Boolean | BOOLEAN |
String | STRING |
Parameters
Parameter | Description | Required | Default value |
---|---|---|---|
haveKerberos | Specifies whether Kerberos authentication is required. A value of true indicates that Kerberos authentication is required.
Note
|
No | false |
hbaseConfig | The properties of the HBase cluster, in JSON format. The hbase.zookeeper.quorum parameter is required. It specifies the ZooKeeper ensemble servers. You can also
configure HBase client properties such as cache and batch for scan operations.
Note You must use the internal endpoint to connect to ApsaraDB for HBase.
|
Yes | N/A |
mode | The mode in which HBase Writer writes data to the HBase data store. Only the normal mode is supported. The dynamic column selection mode is coming soon. | Yes | N/A |
table | The name of the HBase table to which HBase Writer writes data. The name is case-sensitive. | Yes | N/A |
encoding | The encoding format in which a string is converted by using byte[]. UTF-8 and GBK are supported. | No | utf-8 |
column | The columns in the HBase table to which HBase Writer writes data.
|
Yes | N/A |
rowkeyColumn | The rowkey of each HBase cell.
Example:
|
Yes | N/A |
versionColumn | The version of each HBase cell. You can use the current time, a specified source column,
or a specified time point as the version. If you do not specify this parameter, the
current time is used.
Example:
|
No | N/A |
nullMode | The method of processing null values. Valid values:
|
No | skip |
walFlag | Specifies whether to enable write ahead logging (WAL) for HBase. If the value is true,
all the edits that are requested by an HBase client for all Regions carried by the
RegionServer are first recorded in the WAL (that is, the HLog). After the edits are
recorded in the WAL, they are implemented to the Memstore and a success notification
is sent to the HBase client.
If the edits fail to be recorded in the WAL, a failure notification is sent to the HBase client, and the edits are not implemented to the Memstore. If the value is false, WAL is disabled. This way, HBase Writer can write data more efficiently. |
No | false |
writeBufferSize | The write buffer size, in bytes, of the HBase client. If you specify this parameter,
you must also specify the autoflush parameter. By default, the value of the autoflush
parameter is false.
autoflush:
|
No | 8M |
Configure HBase Writer by using the codeless UI
The codeless user interface (UI) is not supported for HBase Writer.
Configure HBase Writer by using the code editor
The following example shows how to configure a sync node to write data to an HBase
1.1.x data store. For more information, see Create a sync node by using the code editor.
{
"type":"job",
"version":"2.0", // The version number.
"steps":[
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType":"hbase",// The writer type.
"parameter":{
"mode":"normal",// The mode in which HBase Writer writes data to the HBase data store.
"walFlag":"false",// WAL is disabled for HBase.
"hbaseVersion":"094x",// The HBase version.
"rowkeyColumn":[// The rowkey of each HBase cell.
{
"index":"0",// The ID of the column in the source table.
"type":"string" // The data type.
},
{
"index":"-1",
"type":"string",
"value":"_"
}
],
"nullMode":"skip",// The method of processing null values.
"column":[// The columns in the HBase table to which HBase Writer writes data.
{
"name":"columnFamilyName1:columnName1",// The name of the column in the HBase table.
"index":"0",// The ID of the column in the source table.
"type":"string" // The data type.
},
{
"name":"columnFamilyName2:columnName2",
"index":"1",
"type":"string"
},
{
"name":"columnFamilyName3:columnName3",
"index":"2",
"type":"string"
}
],
"writeMode":"api",// The write mode.
"encoding":"utf-8",// The encoding format.
"table":"",// The name of the destination table.
"hbaseConfig":{// The properties of the HBase cluster, in JSON format.
"hbase.zookeeper.quorum":"hostname",
"hbase.rootdir":"hdfs: //ip:port/database",
"hbase.cluster.distributed":"true"
}
},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The maximum number of dirty data records allowed.
},
"speed":{
"throttle":false,// Specifies whether to enable bandwidth throttling. A value of false indicates that the bandwidth is not throttled. A value of true indicates that the bandwidth is throttled. The maximum transmission rate takes effect only if you set this parameter to true.
"concurrent":1 // The maximum number of concurrent threads.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}