This topic describes the parameters that are supported by HBase11xsql Writer and how to configure HBase11xsql Writer by using the codeless user interface (UI) and code editor.

HBase11xsql Writer writes large amounts of data to HBase tables that are created based on Phoenix. Phoenix can encode the primary key to rowkey. If you use the HBase API to write data to an HBase table that is created based on Phoenix, you must manually convert data, which is time-consuming and error-prone. However, HBase11xsql Writer writes data to HBase tables without manual data conversions.

HBase11xsql Writer connects to a remote HBase table by using Java Database Connectivity (JDBC), and executes an UPSERT statement to write data to the HBase table.
Notice HBase11xsql Writer supports only exclusive resource groups for Data Integration, but not the default resource group or custom resource groups for Data Integration. For more information, see Create and use an exclusive resource group for Data Integration and Add a custom resource group for Data Integration.

Column order

The column order in the writer must match the column order in the reader. The column order in the reader defines the order of columns in each row of the output data. However, the column order in the writer is the expected order of columns in each row of the input data. Example:

Specified column order in the reader: c1, c2, c3, c4.

Specified column order in the writer: x1, x2, x3, x4.

In this case, the value of Column c1 in the reader is assigned to Column x1 in the writer. If the specified column order in the writer is x1, x2, x4, x3, the value of Column c3 is assigned to Column x4 and the value of Column c4 is assigned to Column x3.

Features

HBase11xsql Writer can write data of an indexed table to an HBase table and synchronously update all the indexed tables.

Limits

HBase11xsql Writer has the following limits:
  • HBase11xsql Writer can write data only to HBase 1.x.
  • HBase11xsql Writer can write data only to the tables that are created based on Phoenix but not to native HBase tables.
  • HBase11xsql Writer cannot write data with timestamps.

How it works

HBase11xsql Writer connects to an HBase table by using the Phoenix JDBC driver, and executes an UPSERT statement to write large amounts of data to the table. Phoenix can synchronously update indexed tables when HBase11xsql Writer writes data to an HBase table.

Parameters

Parameter Description Required Default value
plugin The writer type. Set this parameter to hbase11xsql. Yes No default value
table The name of the table to which you want to write data. The name is case-sensitive. In normal cases, the name of a table that is created based on Phoenix is all capitalized. Yes No default value
column The names of the columns to which you want to write data. The name is case-sensitive. In normal cases, the name of each column in a table that is created based on Phoenix is all capitalized.
Note
  • HBase11xsql Writer writes data in accordance with the order of the columns that are obtained from the reader.
  • You do not need to specify the data type for each column. HBase11xsql Writer automatically obtains the metadata of columns from Phoenix.
Yes No default value
hbaseConfig The properties of the HBase cluster. The hbase.zookeeper.quorum parameter is required. It specifies the ZooKeeper ensemble servers.
Note
  • Separate multiple IP addresses with commas (,), such as ip1,ip2,ip3.
  • The zookeeper.znode.parent parameter is optional. Default value: /hbase.
Yes No default value
batchSize The maximum number of data records to write at a time. No 256
nullMode The method to process null values. Valid values:
  • skip: HBase11xsql Writer does not write null values to the HBase table.
  • empty: HBase11xsql Writer writes 0 or an empty string instead of null values to the HBase table. For a column of the numeric type, HBase11xsql Writer writes 0. For a column of the VARCHAR type, HBase11xsql Writer writes an empty string.
No skip

Configure HBase11xsql Writer by using the code editor

In the following code, a synchronization node is configured to write data to a HBase table by using the code editor. For more information, see Create a sync node by using the code editor.
{
  "type": "job",
  "version": "1.0",
  "configuration": {
    "setting": {
      "errorLimit": {
        "record": "0"
      },
      "speed": {
            "throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1, // The maximum number of parallel threads. 
            "mbps":"1"// The maximum transmission rate.
      }
    },
    "reader": {
      "plugin": "odps",
      "parameter": {
        "datasource": "",
        "table": "",
        "column": [],
        "partition": ""
      }
    },
    "plugin": "hbase11xsql",
    "parameter": {
      "table": "The name of the table to which you want to write data. The table name is case-sensitive.",
      "hbaseConfig": {
        "hbase.zookeeper.quorum": "The IP addresses of ZooKeeper ensemble servers of the destination HBase cluster. Obtain the IP addresses from product engineers (PEs).",
        "zookeeper.znode.parent": "The root znode of the destination HBase cluster. Obtain the znode information from PEs."
      },
      "column": [
        "columnName"
      ],
      "batchSize": 256,
      "nullMode": "skip"
    }
  }
}

FAQ

Q: What is the appropriate number of parallel threads? Can I increase the number of parallel threads to speed up the data synchronization?

A: The recommended number of parallel threads is 5 to 10. In the data import process, the default size of a Java virtual machine (JVM) heap is 2 GB. Parallel synchronization requires multiple threads. However, if excessive threads are run at the same time, data synchronization cannot speed up and the job performance may deteriorate due to frequent garbage collection (GC). We recommend that you set the number of parallel threads in the range of 5 to 10.

Q: What is the appropriate value for the batchSize parameter?

A: The default value of the batchSize parameter is 256. You can set the batchSize parameter based on the amount of data in each row. In most cases, each write operation writes 2 MB to 4 MB of data. You can set this parameter to the data volume of a write operation divided by the data volume of a row.