This topic describes the features, data types, and parameters supported by HBase11xsql Writer and how to configure it by using the code editor.

HBase11xsql Writer allows you to write data in batches to HBase tables created through Phoenix. Phoenix can encode the primary key to rowkey. If you directly use the HBase API to write data to an HBase table created through Phoenix, you must manually convert data, which is troublesome and error-prone. HBase11xsql Writer allows you to write data to HBase tables that packs all values into a single cell per column family.

Specifically, HBase11xsql Writer connects to a remote HBase data store through Java Database Connectivity (JDBC), and runs an UPSERT statement to write data to the HBase data store.

Features

HBase11xsql Writer supports writing data of an indexed table and synchronously updating all indexed tables.

Limits

The limits of the HBase11xsql Writer are as follows:
  • HBase11xsql Writer can write data only to HBase 1.x.
  • HBase11xsql Writer only supports tables created through Phoenix. Native HBase tables are not supported.
  • HBase11xsql Writer cannot write data with timestamps.

How it works

HBase11xsql Writer connects to an HBase data store through Phoenix, which is a JDBC driver, and runs an UPSERT statement to write data in batches to the destination table. With Phoenix, you can synchronously update indexed tables when you write data.

Parameters

Parameter Description Required Default value
plugin The writer type. Set this value to hbase11xsql. Yes None
table The name of the destination table. The name is case-sensitive. Generally, the name of a table created through Phoenix consists of uppercase letters. Yes None
column The name of the column. The name is case-sensitive. Generally, the name of each column in a table created through Phoenix consists of uppercase letters.
Note
  • HBase11xsql Writer writes data strictly in accordance with the order of the columns obtained from the reader.
  • You do not need to specify the data type for each column. HBase11xsql Writer automatically obtains the metadata of columns from Phoenix.
Yes None
hbaseConfig The properties of the HBase cluster. The hbase.zookeeper.quorum parameter is required. It specifies the ZooKeeper ensemble servers.
Note
  • Separate the IP addresses with commas (,). Example: ip1, ip2, ip3.
  • The zookeeper.znode.parent parameter is optional. Default value: /hbase.
Yes None
batchSize The number of data records to write at a time. No 256
nullMode The method of processing null values. Valid values:
  • skip: HBase11xsql Writer does not write null values to the HBase data store.
  • empty: HBase11xsql Writer writes 0 or an empty string to the HBase data store instead of null values. For a column of the numeric type, HBase11xsql Writer writes 0. For a column of the VACHAR type, HBase11xsql Writer writes an empty string.
No skip

Configure HBase11xsql Writer by using the code editor

Example:
{
  "type": "job",
  "version": "1.0",
  "configuration": {
    "setting": {
      "errorLimit": {
        "record": "0"
      },
      "speed": {
        "mbps": "1",
        "concurrent": "1"
      }
    },
    "reader": {
      "plugin": "odps",
      "parameter": {
        "datasource": "",
        "table": "",
        "column": [],
        "partition": ""
      }
    },
    "plugin": "hbase11xsql",
    "parameter": {
      "table": "The case-sensitive name of the destination table",
      "hbaseConfig": {
        "hbase.zookeeper.quorum": "The IP addresses of ZooKeeper ensemble servers of the destination HBase cluster. Obtain the IP addresses from product engineers (PEs).",
        "zookeeper.znode.parent": "The root znode of the destination HBase cluster. Obtain the IP addresses from PEs."
      },
      "column": [
        "columnName"
      ],
      "batchSize": 256,
      "nullMode": "skip"
    }
  }
}

Column order

The column order specified in the writer must match that specified in the reader. When you configure the column order in the reader, you specify the order of columns in each row for the output data. When you configure the column order in the writer, you specify the expected order of columns for the input data. Example:

Column order specified in the reader: c1, c2, c3, c4.

Column order specified in the writer: x1, x2, x3, x4.

In this case, the value of column c1 is assigned to column x1 in the writer. If the column order specified in the writer is x1, x2, x4, x3, the value of column c3 is assigned to column x4 and the value of column c4 is assigned to column x3.

FAQ

  • Q: What is the proper number of concurrent threads? Can I increase the number of concurrent threads to speed up the synchronization?

    A: The recommended number of concurrent threads is 5 to 10. Increasing the number of concurrent threads does not help speed up the synchronization. In the data import process, the default size of a Java virtual machine (JVM) heap is 2 GB. Concurrent synchronization requires multiple threads. However, too many threads sometimes cannot speed up the synchronization and may even deteriorate the performance because of frequent garbage collection (GC).

  • Q: What is the proper value for the batchSize parameter?

    A: The default value of the batchSize parameter is 256. You can set a proper value for the batchSize parameter based on the data volume of each row. Generally, the data volume of each write operation is 2 MB to 4 MB. You can set the value to the data volume of a write operation divided by the data volume of a row.