This topic describes the parameters that are supported by HBase11xsql Writer and how to configure HBase11xsql Writer by using the codeless user interface (UI) and code editor.
HBase11xsql Writer writes large amounts of data to HBase tables that are created based on Phoenix. Phoenix can encode the primary key to rowkey. If you use the HBase API to write data to an HBase table that is created based on Phoenix, you must manually convert data, which is time-consuming and error-prone. However, HBase11xsql Writer writes data to HBase tables without manual data conversions.
Column order
The column order in the writer must match the column order in the reader. The column order in the reader defines the order of columns in each row of the output data. However, the column order in the writer is the expected order of columns in each row of the input data. Example:
Specified column order in the reader: c1, c2, c3, c4.
Specified column order in the writer: x1, x2, x3, x4.
In this case, the value of Column c1 in the reader is assigned to Column x1 in the writer. If the specified column order in the writer is x1, x2, x4, x3, the value of Column c3 is assigned to Column x4 and the value of Column c4 is assigned to Column x3.
Features
HBase11xsql Writer can write data of an indexed table to an HBase table and synchronously update all the indexed tables.
Limits
- HBase11xsql Writer can write data only to HBase 1.x.
- HBase11xsql Writer can write data only to the tables that are created based on Phoenix but not to native HBase tables.
- HBase11xsql Writer cannot write data with timestamps.
How it works
HBase11xsql Writer connects to an HBase table by using the Phoenix JDBC driver, and executes an UPSERT statement to write large amounts of data to the table. Phoenix can synchronously update indexed tables when HBase11xsql Writer writes data to an HBase table.
Parameters
Parameter | Description | Required | Default value |
---|---|---|---|
plugin | The writer type. Set this parameter to hbase11xsql. | Yes | No default value |
table | The name of the table to which you want to write data. The name is case-sensitive. In normal cases, the name of a table that is created based on Phoenix is all capitalized. | Yes | No default value |
column | The names of the columns to which you want to write data. The name is case-sensitive.
In normal cases, the name of each column in a table that is created based on Phoenix
is all capitalized.
Note
|
Yes | No default value |
hbaseConfig | The properties of the HBase cluster. The hbase.zookeeper.quorum parameter is required.
It specifies the ZooKeeper ensemble servers.
Note
|
Yes | No default value |
batchSize | The maximum number of data records to write at a time. | No | 256 |
nullMode | The method to process null values. Valid values:
|
No | skip |
Configure HBase11xsql Writer by using the code editor
{
"type": "job",
"version": "1.0",
"configuration": {
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true.
"concurrent":1, // The maximum number of parallel threads.
"mbps":"1"// The maximum transmission rate.
}
},
"reader": {
"plugin": "odps",
"parameter": {
"datasource": "",
"table": "",
"column": [],
"partition": ""
}
},
"plugin": "hbase11xsql",
"parameter": {
"table": "The name of the table to which you want to write data. The table name is case-sensitive.",
"hbaseConfig": {
"hbase.zookeeper.quorum": "The IP addresses of ZooKeeper ensemble servers of the destination HBase cluster. Obtain the IP addresses from product engineers (PEs).",
"zookeeper.znode.parent": "The root znode of the destination HBase cluster. Obtain the znode information from PEs."
},
"column": [
"columnName"
],
"batchSize": 256,
"nullMode": "skip"
}
}
}
FAQ
Q: What is the appropriate number of parallel threads? Can I increase the number of parallel threads to speed up the data synchronization?
A: The recommended number of parallel threads is 5 to 10. In the data import process, the default size of a Java virtual machine (JVM) heap is 2 GB. Parallel synchronization requires multiple threads. However, if excessive threads are run at the same time, data synchronization cannot speed up and the job performance may deteriorate due to frequent garbage collection (GC). We recommend that you set the number of parallel threads in the range of 5 to 10.
Q: What is the appropriate value for the batchSize parameter?
A: The default value of the batchSize parameter is 256. You can set the batchSize parameter based on the amount of data in each row. In most cases, each write operation writes 2 MB to 4 MB of data. You can set this parameter to the data volume of a write operation divided by the data volume of a row.