This topic describes how to use Spark Streaming SQL to perform data analysis and interactive development on the HBase data source.

CREATE TABLE syntax

CREATE TABLE tbName
USING hbase
OPTIONS(propertyName=propertyValue[,propertyName=propertyValue]*);

Table schema

When you create an HBase data table, you do not need to explicitly define the fields in the data table. Example:
spark-sql> CREATE DATABASE IF NOT EXISTS default;
spark-sql> USE default;
spark-sql> DROP TABLE IF EXISTS hbase_table_test;
spark-sql> CREATE TABLE hbase_table_test
         > USING hbase
         > OPTIONS(
         > catalog='{"table":{"namespace":"default","name":"test"},"rowkey":"key", "columns":{"key":{"cf":"rowkey", "col":"key", "type":"string"},"data":{"cf":"info", "col":"data", "type":"string"}}}',
         > hbaseConfiguration='{"hbase.zookeeper.quorum":"a.b.c.d:2181"}');

spark-sql> DESC hbase_table_test;
key string  NULL
data  string  NULL
Time taken: 0.436 seconds, Fetched 2 row(s)

Parameters

Parameter Description Required
catalog The description of fields in an HBase data table, in the JSON format. Yes
hbaseConfiguration The HBase configuration, in the JSON format. For example, set the HBase endpoint to {"hbase.zookeeper.quorum":"a.b.c.d:2181"}. Yes
The following example shows the catalog configuration for the schema of an HBase data table named table1:
{
  "table":{"namespace":"default", "name":"table1"},
  "rowkey":"key",
  "columns":{
    "col0":{"cf":"rowkey", "col":"key", "type":"string"},
    "col1":{"cf":"cf1", "col":"col1", "type":"boolean"},
    "col2":{"cf":"cf2", "col":"col2", "type":"double"},
    "col3":{"cf":"cf3", "col":"col3", "type":"float"},
    "col4":{"cf":"cf4", "col":"col4", "type":"int"},
    "col5":{"cf":"cf5", "col":"col5", "type":"bigint"},
    "col6":{"cf":"cf6", "col":"col6", "type":"smallint"},
    "col7":{"cf":"cf7", "col":"col7", "type":"string"},
    "col8":{"cf":"cf8", "col":"col8", "type":"tinyint"}
  }
}