HBase Reader reads data from HBase. This topic describes the data types and parameters that are supported by HBase Reader and how to configure HBase Reader by using the codeless user interface (UI) and code editor.
HBase Reader connects to a remote HBase database by using a Java client of HBase, scans and reads data based on a specific rowkey range, assembles the data into abstract datasets of the data types supported by Data Integration, and then sends the datasets to a writer.
Limits
- HBase Reader cannot read Phoenix data. Special processing is performed on Phoenix data.
- HBase Reader supports only exclusive resource groups for Data Integration, but not the shared resource group or custom resource groups for Data Integration. For more information, see Create and use an exclusive resource group for Data Integration and Create a custom resource group for Data Integration.
Supported features
- HBase Reader can read data from HBase 0.94.X, HBase 1.1.X, and HBase 2.X.
- If you use HBase 0.94.X, set the plugin parameter to 094x.
"reader": { "plugin": "094x" }
- If you use HBase 1.1.X or HBase 2.X, set the plugin parameter to 11x.
"reader": { "plugin": "11x" }
Note HBase 1.1.X Reader is compatible with HBase 2.0. If you have questions when you use HBase Reader, submit a ticket.
- If you use HBase 0.94.X, set the plugin parameter to 094x.
- HBase Reader supports normal and multiVersionFixedColumn modes.
- In normal mode, HBase Reader reads only the latest version of data from an HBase table and
converts the data to a two-dimensional table (wide table).
hbase(main):017:0> scan 'users' ROW COLUMN+CELL lisi column=address:city, timestamp=1457101972764, value=beijing lisi column=address:country, timestamp=1457102773908, value=china lisi column=address:province, timestamp=1457101972736, value=beijing lisi column=info:age, timestamp=1457101972548, value=27 lisi column=info:birthday, timestamp=1457101972604, value=1987-06-17 lisi column=info:company, timestamp=1457101972653, value=baidu xiaoming column=address:city, timestamp=1457082196082, value=hangzhou xiaoming column=address:country, timestamp=1457082195729, value=china xiaoming column=address:province, timestamp=1457082195773, value=zhejiang xiaoming column=info:age, timestamp=1457082218735, value=29 xiaoming column=info:birthday, timestamp=1457082186830, value=1987-06-17 xiaoming column=info:company, timestamp=1457082189826, value=alibaba 2 row(s) in 0.0580 seconds }
HBase Reader converts the data that is read from the HBase table to the following table.rowKey address:city address:country address:province info:age info:birthday info:company lisi beijing china beijing 27 1987-06-17 baidu xiaoming hangzhou china zhejiang 29 1987-06-17 alibaba - In multiVersionFixedColumn mode, HBase Reader reads data from an HBase table and converts the data to a narrow
table. The narrow table contains four columns rowKey, family:qualifier, timestamp, and value. Before you use HBase Reader to read data, you must specify the columns from which
you want to read data. When HBase Reader reads data, it converts each cell in each
version of the table to a data record.
hbase(main):018:0> scan 'users',{VERSIONS=>5} ROW COLUMN+CELL lisi column=address:city, timestamp=1457101972764, value=beijing lisi column=address:country, timestamp=1457102773908, value=china lisi column=address:province, timestamp=1457101972736, value=beijing lisi column=info:age, timestamp=1457101972548, value=27 lisi column=info:birthday, timestamp=1457101972604, value=1987-06-17 lisi column=info:company, timestamp=1457101972653, value=baidu xiaoming column=address:city, timestamp=1457082196082, value=hangzhou xiaoming column=address:country, timestamp=1457082195729, value=china xiaoming column=address:province, timestamp=1457082195773, value=zhejiang xiaoming column=info:age, timestamp=1457082218735, value=29 xiaoming column=info:age, timestamp=1457082178630, value=24 xiaoming column=info:birthday, timestamp=1457082186830, value=1987-06-17 xiaoming column=info:company, timestamp=1457082189826, value=alibaba 2 row(s) in 0.0260 seconds }
HBase Reader converts the data that is read from the HBase table to the following table.rowKey column:qualifier timestamp value lisi address:city 1457101972764 beijing lisi address:country 1457102773908 china lisi address:province 1457101972736 beijing lisi info:age 1457101972548 27 lisi info:birthday 1457101972604 1987-06-17 lisi info:company 1457101972653 beijing xiaoming address:city 1457082196082 hangzhou xiaoming address:country 1457082195729 china xiaoming address:province 1457082195773 zhejiang xiaoming info:age 1457082218735 29 xiaoming info:age 1457082178630 24 xiaoming info:birthday 1457082186830 1987-06-17 xiaoming info:company 1457082189826 alibaba
- In normal mode, HBase Reader reads only the latest version of data from an HBase table and
converts the data to a two-dimensional table (wide table).
Data types
Category | Data Integration data type | HBase data type |
---|---|---|
Integer | LONG | SHORT, INT, and LONG |
Floating point | DOUBLE | FLOAT and DOUBLE |
String | STRING | BINARY_STRING and STRING |
Date and time | DATE | DATE |
Byte | BYTES | BYTES |
Boolean | BOOLEAN | BOOLEAN |
Parameters
Parameter | Description | Required | Default value |
---|---|---|---|
haveKerberos | Specifies whether Kerberos authentication is required. Valid values: true and false.
Note
|
No | false |
hbaseConfig | The properties of the HBase cluster, in the JSON format. The hbase.zookeeper.quorum parameter is required. It specifies the ZooKeeper address of the HBase cluster. You
can also configure other properties, such as those related to the cache and batch
for scan operations.
Note You must use an internal endpoint to access an ApsaraDB for HBase database.
|
Yes | No default value |
mode | The mode in which HBase Reader reads data from HBase. Valid values: normal and multiVersionFixedColumn. | Yes | No default value |
table | The name of the HBase table from which you want to read data. The name is case-sensitive. | Yes | No default value |
encoding | The encoding format that is used to convert binary data in the HBase byte[] format to strings. Valid values: utf-8 and gbk. | No | utf-8 |
column | The names of the columns from which you want to read data.
|
Yes | No default value |
maxVersion | The number of versions that are read by HBase Reader when multiple versions are available. Valid values: -1 and integers greater than 1. The value -1 indicates that all versions are read. | Required in multiVersionFixedColumn mode | No default value |
range | The rowkey range based on which HBase Reader reads data.
|
No | No default value |
scanCacheSize | The number of rows that HBase Reader reads from the HBase table each time. | No | 256 |
scanBatchSize | The number of columns that HBase Reader reads from the HBase table each time. | No | 100 |
Configure HBase Reader by using the codeless UI
This method is not supported.
Configure HBase Reader by using the code editor
{
"type":"job",
"version":"2.0",// The version number.
"steps":[
{
"stepType":"hbase",// The reader type.
"parameter":{
"mode":"normal",// The mode in which HBase Reader reads data. Valid values: normal and multiVersionFixedColumn.
"scanCacheSize":"256",// The number of rows that HBase Reader reads from the HBase table each time.
"scanBatchSize":"100",// The number of columns that HBase Reader reads from the HBase table each time.
"hbaseVersion":"094x/11x",// The HBase version.
"column":[// The columns from which you want to read data.
{
"name":"rowkey",// The name of a column.
"type":"string"// The data type.
},
{
"name":"columnFamilyName1:columnName1",
"type":"string"
},
{
"name":"columnFamilyName2:columnName2",
"format":"yyyy-MM-dd",
"type":"date"
},
{
"name":"columnFamilyName3:columnName3",
"type":"long"
}
],
"range":{// The rowkey range based on which HBase Reader reads data.
"endRowkey":"",// The end rowkey.
"isBinaryRowkey":true,// The method that is used to convert the specified start and end rowkeys to the byte[] format. Default value: false.
"startRowkey":""// The start rowkey.
},
"maxVersion":"",// The number of versions that are read by HBase Reader when multiple versions are available.
"encoding":"UTF-8",// The encoding format.
"table":"",// The name of the table from which you want to read data.
"hbaseConfig":{// The properties of the HBase cluster, in the JSON format.
"hbase.zookeeper.quorum":"hostname",
"hbase.rootdir":"hdfs://ip:port/database",
"hbase.cluster.distributed":"true"
}
},
"name":"Reader",
"category":"reader"
},
{
"stepType":"stream",
"parameter":{},
"name":"Writer",
"category":"writer"
}
],
"setting":{
"errorLimit":{
"record":"0"// The maximum number of dirty data records allowed.
},
"speed":{
"throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent":1,// The maximum number of parallel threads.
"mbps":"12"// The maximum transmission rate.
}
},
"order":{
"hops":[
{
"from":"Reader",
"to":"Writer"
}
]
}
}