HBase20xsql Reader reads data from Phoenix tables that are mapped to HBase SQL tables. This topic describes the data types and parameters that are supported by HBase20xsql Reader and how to configure HBase20xsql Reader by using the code editor.

Prerequisites

Before you configure HBase20xsql Reader, you must configure an HBase data source. For more information, see Add an HBase data source.

How it works

HBase20xsql Reader connects to the query server of Phoenix by using the Phoenix thin client, generates an SQL statement based on your configurations, and then sends the statement to the query server. The query server executes the statement to read data from the HBase data source and returns the obtained data to HBase20xsql Reader. Then, HBase20xsql Reader assembles the data into abstract datasets of the data types supported by Data Integration and sends the datasets to a writer.

Limits

  • Only HBase2.x data sources and Phoenix 5.x are supported if you run synchronization nodes that use HBase20xsql Reader to synchronize data.
  • You can shard a table based on a single column in the table, and the column must be the primary key column of the table.
  • If you want to evenly shard a table based on the number of parallel threads, the values in the column that is used for table sharding must be of an integer or string data type.
  • The table name, schema name, and column names for an HBase table are case-sensitive and must be in the same case as the table name, schema name, and column names for the mapped Phoenix table.
  • HBase20xsql Reader can read data only by using the Phoenix query server. You can use HBase20xsql Reader only after you start the Phoenix query server in your Phoenix service.

Data types

HBase20xsql Reader supports most Phoenix data types. Make sure that the data types of your database are supported.

The following table lists the data types that are supported by HBase20xsql Reader.
Data Integration data type Phoenix data type
long INTEGER, TINYINT, SMALLINT, and BIGINT
double FLOAT, DECIMAL, and DOUBLE
string CHAR and VARCHAR
date DATE, TIME, and TIMESTAMP
bytes BINARY and VARBINARY
boolean BOOLEAN

Parameters

Parameter Description Required Default value
queryServerAddress The address of the Phoenix query server. If you use ApsaraDB for HBase Performance-enhanced Edition (Lindorm) and you want to pass through the user and password parameters, you can append the settings of these parameters to the value of the queryServerAddress parameter. Example: http://127.0.0.1:8765;user=root;password=root. Yes No default value
serialization The serialization protocol used by the Phoenix query server. No PROTOBUF
table The name of the table from which you want to read data. The name is case-sensitive. Yes No default value
schema The schema of the table. No No default value
column The names of the columns from which you want to read data. Specify the names in a JSON array. If you leave this parameter empty, all columns in the source table are read. This parameter is empty by default. No Empty string
splitKey The column that is used for table sharding when HBase20xsql Reader reads data. If you configure this parameter, the source table is sharded based on the value of this parameter. Data Integration then runs parallel threads to read data. This improves data synchronization efficiency. You can use one of the following methods to shard a table. If the splitPoints parameter is left empty, table sharding is performed by using Method 1.
  • Method 1: Find the maximum value and minimum value in the column specified by the splitKey parameter and evenly shard the table based on the value of the concurrent parameter.
    Note You can shard a table based on a column in which values are of an integer or string data type.
  • Method 2: Shard a table based on the value of the splitPoints parameter. Then, the data is synchronized by using the parallel threads specified by the concurrent parameter.
Yes No default value
splitPoints The sharding point. If you shard a table based on the maximum value and minimum value of the column that is used for table sharding, data may be intensively distributed to specific regions. We recommend that you specify a value for the splitPoints parameter based on the start key and end key of a region to ensure that a query statement is used to query data only in a region obtained after the table sharding. No No default value
where The WHERE clause. You can configure this parameter to filter data in the source table. HBase20xsql Reader generates an SQL statement based on the settings of the column, table, and where parameters and uses the generated statement to read data. No No default value
querySql The SQL statement that is used for refined data filtering. If you configure the querySql parameter and the queryserverAddress parameter that is required, HBase20xsql Reader ignores the column, table, where, and splitKey parameters that you configured and uses the setting of this parameter for data filtering. No No default value

Configure HBase20xsql Reader by using the codeless UI

This method is not supported.

Configure HBase20xsql Reader by using the code editor

In the following code, a synchronization node is configured to read data from HBase. For more information, see Create a synchronization node by using the code editor.
{
    "type":"job",
    "version":"2.0",// The version number. 
    "steps":[
        {
            "stepType":"hbase20xsql",// The reader type. 
            "parameter":{
                "queryServerAddress": "http://127.0.0.1:8765",  // The address of the Phoenix query server. 
                "serialization": "PROTOBUF",  // The serialization protocol used by the Phoenix query server. 
                "table": "TEST",    // The name of the table from which you want to read data. 
                "column": ["ID", "NAME"],   // The names of the columns from which you want to read data. 
                "splitKey": "ID"    // The column that is used for table sharding when HBase20xsql Reader reads data. The column must be the primary key column of the source table. 
            },
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of dirty data records allowed. 
        },
        "speed":{
            "throttle":true,// Specifies whether to enable bandwidth throttling. The value false indicates that bandwidth throttling is disabled, and the value true indicates that bandwidth throttling is enabled. The mbps parameter takes effect only when the throttle parameter is set to true. 
            "concurrent":1,// The maximum number of parallel threads. 
            "mbps":"12"// The maximum transmission rate. 
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}