edit-icon download-icon

Configure HBase writer

Last Updated: Mar 21, 2018

The HBase Writer plug‑in provides the ability to write data into HBase. At the underlying implementation level, HBase Writer connects to a remote HBase service with HBase’s Java client, and writes data into Hbase by means of “put”.

Supported features

Hbase0.94.x and Hbase1.1.x versions are supported

  • If your HBase version is Hbase0.94.x, select hbase094x for the Writer plug‑in. See the following.

    1. "writer": {
    2. "plugin": "hbase094x"
    3. }
  • If your HBase version is Hbase1.1.x, select hbase11x for the Writer plug‑in. See the following.

    1. "writer": {
    2. "plugin": "hbase11x"
    3. }

Multiple fields in the source end can be concatenated into a rowkey

Currently, HBase Writer can concatenate multiple fields in the source end into the rowkey of an HBase table. For more information, see rowkeyColumn configuration.

Versioning of data written into HBase

Supported timestamps (versions) for data written into HBase.

  • Current time
  • Specified source column
  • Specified time

Configure HBase client

A required configuration item in HBase Writer is “HBaseConfig”. You must contact HBase PE to extract the configuration items related to the connection to HBase from hbase-site.xml and specify these items in json format. In addition, you can add more HBase Client configurations to optimize the interaction with servers. For example, you can configure the cache (hbase.client.scanner.caching) and batch of scan.

Note: Currently, the settings of HBase Client are implemented in HBaseConfig configuration items.

For example, hbase-site.xml is configured as follows.

  1. <configuration>
  2. <property>
  3. <name>hbase.rootdir</name>
  4. <value>hdfs://10.101.85.161:9000/hbase</value>
  5. </property>
  6. <property>
  7. <name>hbase.cluster.distributed</name>
  8. <value>true</value>
  9. </property>
  10. <property>
  11. <name>hbase.zookeeper.quorum</name>
  12. <value>v101085161.sqa.zmf</value>
  13. </property>
  14. </configuration>

The converted JSON value is.

  1. "hbaseConfig": {
  2. "hbase.rootdir": "hdfs: //10.101.85.161:9000/hbase",
  3. "hbase.cluster.distributed": "true",
  4. "hbase.zookeeper.quorum": "v101085161.sqa.zmf"
  5. }

HBase Reader supports HBase data types and converts HBase data types as follows.

Internal data integration type HBase data type
Long int, short, long
Double float, double
String string
Boolean boolean

Note: Apart from the field types listed here, other types are not supported.

Parameter description

  • haveKerberos

    • Description: If haveKerberos is true, the HBase cluster must be authenticated using kerberos.

      Note: If this value is configured as true, the following five parameters related to kerberos authentication must be configured.

      • kerberosKeytabFilePath
      • kerberosPrincipal
      • hbaseMasterKerberosPrincipal
      • hbaseRegionserverKerberosPrincipal
      • hbaseRpcProtection

      If the HBase cluster is not authenticated by using kerberos, these six parameters are not required.

    • Required: No

    • Default value: False

  • hbaseConfig

    • Description: The configuration information provided by each HBase cluster for the connection to the Data Integration client is stored in hbase-site.xml. Contact your HBase PE for the configuration information and convert it into JSON format. In addition, more HBase client configurations can be added, for example, to configure the cache and batch of scan to optimize the interaction with servers.

    • Required: Yes

    • Default value: None

  • mode

    • Description: The mode in which data is written into HBase. Currently, only the normal mode is supported. The dynamic column mode will be available later.

    • Required: Yes

    • Default value: None

  • table

    • Description: Name of the HBase table to be written (case‑sensitive).

    • Required: Yes

    • Default value: None

  • encoding

    • Description: The encoding method is UTF-8 or GBK, which is used when data in string is converted to HBase byte[ ].

    • Required: No

    • Default value: utf-8

  • column

    • Description: The HBase field to be written.

      • index: Specify the index of the column that corresponds to the column of the Reader, starting from 0.
      • name: Specify the column in the HBase table, which must be in column family:column name format.
      • type: Specify the type of data to be written, which is used to convert HBase byte[ ].

      The configuration format is as follows.

      1. "column": [
      2. {
      3. "index":1,
      4. "name": "cf1:q1",
      5. "type": "string"
      6. },
      7. {
      8. "index":2,
      9. "name": "cf1:q2",
      10. "type": "string"
      11. }
    • Required: Yes

    • Default value: None

  • rowkeyColumn

    • Description: The HBase rowkey column to be written.

      • index: Specify the index of the column that corresponds to the column of the Reader, starting from 0. The index is -1 in case of constant.
      • type: Specify the type of data to be written, which is used to convert HBase byte[ ].
      • value: A configuration constant, which is usually used as the concatenation operator of multiple fields. HBase Writer concatenates all columns of the rowkeyColumn into a rowkey in the configuration sequence to write data into HBase. The rowkey cannot contain constants only.

      The configuration format is as follows.

      1. "rowkeyColumn": [
      2. {
      3. "index":0,
      4. "type":"string"
      5. },
      6. {
      7. "index":-1,
      8. "type":"string",
      9. "value":"_"
      10. }
      11. ]
    • Required: Yes

    • Default value: None

  • versionColumn

    • Description: Specify the timestamp when data is written into HBase. Current time, specified time column, and specified time are supported (choose one out of the three). The current time is used if this parameter is not configured.

      • index: Specify the index that corresponds to the column of the Reader, starting from 0. The index must be able to be converted to ‘long’ data type.
      • type: If the type is Date, HBase Writer resolves the data in the formats of yyyy-MM-dd HH:mm:ss and yyyy-MM-dd HH:mm:ss SSS. The index is –1 in case of a specified time.
      • value: The long value of the specified time.

      The configuration format is as follows.

      1. "versionColumn":{
      2. "index":1
      3. }

      or

      1. "versionColumn":{
      2. "index":-1,
      3. "value":123456789
      4. }
    • Required: No

    • Default value: None

  • nullMode

    • Description: How to process a null value you read. The following two methods are supported.

      • skip: This column is not written into HBase.
      • empty: HConstants.EMPTY_BYTE_ARRAY is written into HBase, that is, new byte [0].
    • Required: No

    • Default value: skip

  • walFlag

    • Description: When committing data to the RegionServer in the cluster (Put/Delete operation), HBase Client writes the WAL (Write Ahead Log, which is a HLog shared by all Regions on a RegionServer). HBase Client writes data into MemStore only after it successfully writes data into WAL. In this case, the client is notified that the data is successfully committed. In case of failure to write the WAL, HBase Client is notified that the commit is failed. Disable walFlag (false) to stop writing the WAL so as to improve data writing performance.

    • Required: No

    • Default value: false

  • writeBufferSize

    • Description: Set the buffer size (in byte) of HBase client. Use it with autoflush.

      autoflush: If it is set to true, HBase Client performs an update operation for each put. If it is set to false, HBase Client initiates a write request to the HBase server only when the client write buffer is filled up with put.

    • Required: No

    • Default value: 8M

Development in wizard mode

Currently, development in wizard mode is not supported.

Development in script mode

Configure a job to write data from a local machine into hbase1.1.x.

  1. {
  2. "type": "job",
  3. "traceId": "your traceId",
  4. "version": "1.0",
  5. "configuration": {
  6. "setting": {
  7. "errorLimit": {
  8. "record": "0"
  9. },
  10. "speed": {
  11. "mbps": "1"
  12. }
  13. },
  14. "transformer": [],
  15. "reader": {
  16. "plugin": "stream",
  17. "parameter": {}
  18. },
  19. "writer": {
  20. "plugin": "hbase094x",
  21. "parameter": {
  22. "haveKerberos": true,
  23. "kerberosKeytabFilePath": "/opt/datax/xxx.keytab",
  24. "kerberosPrincipal": "xxx/hadoopclient@xxx.xxx",
  25. "hbaseMasterKerberosPrincipal": "xxx",
  26. "hbaseRegionserverKerberosPrincipal": "xxx",
  27. "hbaseRpcProtection": "xxx",
  28. "hbaseConfig": {
  29. "hbase.rootdir": "hdfs: //10.101.85.161:9000/hbase",
  30. "hbase.cluster.distributed": "true",
  31. "hbase.zookeeper.quorum": "v101085161.sqa.zmf"
  32. },
  33. "table": "writer",
  34. "mode": "normal",
  35. "rowkeyColumn": [
  36. {
  37. "index": 0,
  38. "type": "string"
  39. },
  40. {
  41. "index": -1,
  42. "type": "string",
  43. "value": "_"
  44. }
  45. ],
  46. "column": [
  47. {
  48. "index": 1,
  49. "name": "cf1:q1",
  50. "type": "string"
  51. },
  52. {
  53. "index": 2,
  54. "name": "cf1:q2",
  55. "type": "string"
  56. },
  57. {
  58. "index": 3,
  59. "name": "cf1:q3",
  60. "type": "string"
  61. },
  62. {
  63. "index": 4,
  64. "name": "cf2:q1",
  65. "type": "string"
  66. },
  67. {
  68. "index": 5,
  69. "name": "cf2:q2",
  70. "type": "string"
  71. },
  72. {
  73. "index": 6,
  74. "name": "cf2:q3",
  75. "type": "string"
  76. }
  77. ],
  78. "versionColumn": {
  79. "index": -1,
  80. "value": "123456789"
  81. },
  82. "encoding": "utf-8"
  83. }
  84. }
  85. }
  86. }
Thank you! We've received your feedback.