Use SQL statements to access an ApsaraDB for HBase table - ApsaraDB for HBase

This topic describes how to use SQL statements to access an ApsaraDB for HBase table.

Prerequisites

The version of LindormTable is 2.4.3.2 or later. For more information about how to view or upgrade the version of LindormTable, see Release notes of LindormTable and Upgrade the minor engine version of a Lindorm instance.

Background information

LindormTable allows you to directly access the tables that are created by using Lindorm Shell or ApsaraDB for HBase API for Java. An ApsaraDB for HBase table is a schema-free table. Columns in an ApsaraDB for HBase table are the dynamic columns of the VARBINARY data type, which represents a byte array. For more information about dynamic columns, see Dynamic columns. ApsaraDB for HBase provides the column mapping feature and supports HBase-compatible data types for ApsaraDB for HBase tables. This way, you can use Lindorm SQL statements for columns that are written based on ApsaraDB for HBase API for Java and use a wide range of data types and secondary indexes.

Syntax

In Lindorm SQL, you can add mappings for the qualifiers in a custom column family in an ApsaraDB for HBase table. This way, you can use the SQL statements for subsequent queries.

You can use the following syntax to add and remove mappings:

dynamic_column_mapping_statement   := ALTER TABLE table_name MAP DYNAMIC COLUMN
                                      qualifer_definition hbase_type;
dynamic_column_unmapping_statement := ALTER TABLE table_name UNMAP DYNAMIC COLUMN
                                      qualifer_definition_list;
qualifer_definition_list           := qualifer_definition
                                      (',' qualifer_definition)*
qualifer_definition                := [ family_name ':' ] qualifier_name
hbase_type                         := HLONG | HINTEGER | HSHORT | HFLOAT |
                                      HDOUBLE | HSTRING | HBOOLEAN

The following table describes the mapping data types that can be specified by the hbase_type parameter.

SQL data type	Java data type	Description
HLONG	java.lang.Long	Data is written to the column of the ApsaraDB for HBase table by using the Bytes.toBytes(long) function.
HINTEGER	java.lang.Integer	Data is written to the column of the ApsaraDB for HBase table by using the Bytes.toBytes(int) function.
HSHORT	java.lang.Short	Data is written to the column of the ApsaraDB for HBase table by using the Bytes.toBytes(short) function.
HFLOAT	java.lang.Float	Data is written to the column of the ApsaraDB for HBase table by using the Bytes.toBytes(float) function.
HDOUBLE	java.lang.Double	Data is written to the column of the ApsaraDB for HBase table by using the Bytes.toBytes(double) function.
HSTRING	java.lang.String	Data is written to the column of the ApsaraDB for HBase table by using the Bytes.toBytes(String) function.
HBOOLEAN	java.lang.Boolean	Data is written to the columns of the ApsaraDB for HBase table by using the Bytes.toBytes(boolean) function.

Note

LindormTable V2.5.1 and later support the mappings of rowkeys. For the mapping method of the rowkeys, see the mapping method of the qualifiers. When you map the rowkeys, make sure that the rowkeys are enclosed by using backticks (` `).
In other programming languages, you can use the toBytes method in the org.apache.hadoop.hbase.util.Bytes Java class to encode and write data.
In Java, the Bytes.toBytes(String) function uses UTF-8 for data encoding. In other programming languages, UTF-8 is also used for data encoding when you use the toBytes method to convert strings into bytes.

Data preparations

In this topic, ApsaraDB for HBase API for Java is used to create a sample ApsaraDB for HBase table and write data to the table. For more information, see Use ApsaraDB for HBase API for Java to develop applications.

Note

For more information about how to create an ApsaraDB for HBase table and write data to the table by using other methods, see Use Lindorm Shell to connect to LindormTable.

// Create a sample ApsaraDB for HBase table named dt and specify f1 as the column family name.           
try (Admin admin = connection.getAdmin()) {
            Table table = connection.getTable(TableName.valueOf("dt"));
            HTableDescriptor htd = new HTableDescriptor(TableName.valueOf("dt"));
            htd.addFamily(new HColumnDescriptor(Bytes.toBytes("f1")));
            admin.createTable(htd);
            }
    
// Write data to the sample table.
try (Table table = connection.getTable(TableName.valueOf("dt"))) {
    byte[] rowkey = Bytes.toBytes("row1");
    byte[] family = Bytes.toBytes("f1");
    Put put = new Put(rowkey);
    // Write data of the STRING type and specify name as the column name.
    String name = "Some one";
    put.addColumn(family, Bytes.toBytes("name"), Bytes.toBytes(name));
    // Write data of the INT type and specify age as the column name.
    int age = 25;
    put.addColumn(family, Bytes.toBytes("age"), Bytes.toBytes(age));
    // Write data of the LONG type and specify time as the column name.
    long timestamp = 1656675491000L;
    put.addColumn(family, Bytes.toBytes("time"), Bytes.toBytes(timestamp));
    // Write data of the SHORT type and specify buycode as the column name.
    short buycode = 123;
    put.addColumn(family, Bytes.toBytes("buycode"), Bytes.toBytes(buycode));
    // Write data of the FLOAT type and specify price as the column name.
    float price = 12.3f;
    put.addColumn(family, Bytes.toBytes("price"), Bytes.toBytes(price));
    // Write data of the DOUBLE type and specify price2 as the column name.
    double price2 = 12.33333;
    put.addColumn(family, Bytes.toBytes("price2"), Bytes.toBytes(price2));
    // Write data of the BOOLEAN type and specify isMale as the column name.
    boolean isMale = true;
    put.addColumn(family, Bytes.toBytes("isMale"), Bytes.toBytes(isMale));

    // Write null values for all types of data.
    //put.addColumn(family, qualifier, null);

    table.put(put);
    }

Procedure

The following example shows how to create an ApsaraDB for HBase table named dt and use SQL statements to access the ApsaraDB for HBase table.

Obtain the endpoint of an ApsaraDB for HBase cluster.
1. Log on to the ApsaraDB for HBase console.
2. On the Clusters page, find the instance that you want to manage and click the instance ID.
3. In the left-side navigation pane, click Database Connection.
4. Obtain the value of Java API Address.
Change the value of Java API Address to the endpoint that is used by SQL statements to access the ApsaraDB for HBase table. To change the value of Java API Address, perform the following steps:
1. Add the jdbc:lindorm:table:url=http:// prefix before the value of Java API Address.
2. Change the port number in the value of Java API Address from 30020 to 30060.
  Note
  If the endpoint obtained in the ApsaraDB for HBase console is ld-bp1ietqp4fby3****-proxy-hbaseue.hbaseue.rds.aliyuncs.com:30020, the actual connection string that is used by SQL statements to access the ApsaraDB for HBase table must be jdbc:lindorm:table:url=http://ld-bp1ietqp4fby3****-proxy-hbaseue.hbaseue.rds.aliyuncs.com:30060.
Use Lindorm-cli to connect to and use LindormTable. For more information about how to connect to and use LindormTable, see Use Lindorm-cli to connect to and use LindormTable.

Execute the ALTER TABLE statement to add mappings for the data that is written to the columns of the dt table.

ALTER TABLE dt MAP DYNAMIC COLUMN `ROW` HSTRING, f1:name HSTRING, f1:age HINTEGER, f1:time HLONG, f1:buycode HSHORT, f1:price HFLOAT, f1:price2 HDOUBLE, f1:isMale HBOOLEAN;

Note

Adding column mappings refers to specifying data types for the columns regardless of whether to write data to the columns.
The system converts the bytes back to the original data based on schemas. You must specify a correct data type mapped to a SQL data type in the Lindorm SQL statement.

In the following example, if you specify HINTEGER as the data type for the column named f:age2, the system may retrieve incorrect data by invoking the Bytes.toInt() function.

int age = 25;
byte[] ageValue = Bytes.toBytes(age);
put.addColumn(Bytes.toBytes("f"), Bytes.toBytes("age"), ageValue);//The data type for the column named f:age is INT, and is mapped to the HINTEGER data type in the Lindorm SQL statement. 
String age2 = "25";
byte[] age2Value = Bytes.toBytes(name);
put.addColumn(Bytes.toBytes("f"), Bytes.toBytes("age2"), age2Value);// The data type of the column named f:age2 is STRING, and is mapped to the HSTRING data type in the Lindorm SQL statement.

Execute the DESCRIBE statement to view the mappings of the current schema.
```
DESCRIBE dt;
```
Note
For more information about the syntax of the DESCRIBE statement, see DESCRIBE/SHOW/USE.

Query data in the dt table by using the following SQL statements:

SELECT * FROM dt LIMIT 1;
SELECT * FROM dt WHERE f1:isMale=true LIMIT 1;
SELECT * FROM dt WHERE f1:name='Some one' LIMIT 1;
SELECT * FROM dt WHERE f1:time>1656675490000 and f1:time<1656675492000 LIMIT 1;

Create a secondary index.
Secondary indexes are used to reduce data query time at the cost of storage space. Secondary indexes allow you to query data based on non-primary key columns. This improves query efficiency but requires more storage space. For more information about syntax limits on secondary indexes, see CREATE INDEX and Secondary indexes.
1. Modify the attribute of the primary table named dt.
```
ALTER TABLE dt SET 'MUTABILITY' = 'MUTABLE_LATEST';
```
  Note
  If you use a custom timestamp, you must set the attribute of the primary table to MUTABLE_ALL.
2. Create a secondary index.
```
CREATE INDEX idx ON dt(f1:age) 'INDEX_COVERED_TYPE'='COVERED_DYNAMIC_COLUMNS';
```
3. If you specify the async parameter when you create the secondary index, build historical data from the primary table to an index table. Then, you can query historical data by using the secondary index. Skip this step if you do not specify the async parameter when you create the secondary index.
```
BUILD INDEX idx ON dt;
```
4. View the index.
```
SHOW INDEX FROM dt;
```
  The following result is returned:
```
+---------------+----------- -+-------------+--------------+------------------+---------------+-----------------+----------------+-------------+
| TABLE_SCHEMA  | DATA_TABLE  | INDEX_NAME  | INDEX_STATE  |  INDEX_PROGRESS  |  INDEX_TYPE   |  INDEX_COVERED  |  INDEX_COLUMN  |  INDEX_TTL  |
+---------------+-------------+-------------+--------------+------------------+---------------+-----------------+----------------+-------------+
| default       | dt          | idx         | ACTIVE       | 100%             | SECONDARY     |  TRUE           |  f1:age,ROW    |             |
+---------------+-------------+-------------+--------------+------------------+---------------+-----------------+----------------+-------------+
```
  Note
  If the value of INDEX_STATE in the returned result is ACTIVE, the index is built.
  The value of INDEX_PROGRESS in the returned result specifies the progress of index building.
5. (Optional) Execute the EXPLAIN statement to view the execution plan. You can check whether the secondary index is hit.
```
EXPLAIN SELECT * FROM dt WHERE f1:age=23 LIMIT 1;
```
(Optional) Remove one or more column mappings.
- Remove one column mapping. The following sample code provides an example:
```
ALTER TABLE dt UNMAP DYNAMIC COLUMN f1:isMale;
```
- Remove multiple column mappings. The following sample code provides an example:
```
ALTER TABLE dt UNMAP DYNAMIC COLUMN f1:price2, f1:price2;
```