This topic describes how to use Spark SQL to perform data analysis and interactive development on the HBase data source.

CREATE TABLE syntax

CREATE TABLE tbName
USING hbase
OPTIONS(propertyName=propertyValue[,propertyName=propertyValue]*);

Parameters

Parameter Description Required
catalog The description of fields in an HBase data table, in JSON format. Yes
hbaseConfiguration The HBase configuration, in JSON format. For example, set the HBase endpoint to {"hbase.zookeeper.quorum":"a.b.c.d:2181"}. Yes
The following code provides an example of the catalog configuration for the schema of an HBase data table named table1:
{
  "table":{"namespace":"default", "name":"table1"},
  "rowkey":"key",
  "columns":{
    "col0":{"cf":"rowkey", "col":"key", "type":"string"},
    "col1":{"cf":"cf1", "col":"col1", "type":"boolean"},
    "col2":{"cf":"cf2", "col":"col2", "type":"double"},
    "col3":{"cf":"cf3", "col":"col3", "type":"float"},
    "col4":{"cf":"cf4", "col":"col4", "type":"int"},
    "col5":{"cf":"cf5", "col":"col5", "type":"bigint"},
    "col6":{"cf":"cf6", "col":"col6", "type":"smallint"},
    "col7":{"cf":"cf7", "col":"col7", "type":"string"},
    "col8":{"cf":"cf8", "col":"col8", "type":"tinyint"}
  }
}

Table schema

When you create an HBase data table, you do not need to explicitly define the fields in the data table. Example:
spark-sql> CREATE DATABASE IF NOT EXISTS default;
spark-sql> USE default;
spark-sql> DROP TABLE IF EXISTS hbase_table_test;
spark-sql> CREATE TABLE hbase_table_test
         > USING hbase
         > OPTIONS(
         > catalog='{"table":{"namespace":"default","name":"test"},"rowkey":"key", "columns":{"key":{"cf":"rowkey", "col":"key", "type":"string"},"data":{"cf":"info", "col":"data", "type":"string"}}}',
         > hbaseConfiguration='{"hbase.zookeeper.quorum":"a.b.c.d:2181"}');

spark-sql> DESC hbase_table_test;
key string  NULL
data  string  NULL
Time taken: 0.436 seconds, Fetched 2 row(s)