This topic describes how to use Spark Streaming SQL to perform data analysis and interactive development on the HBase data source.
CREATE TABLE syntax
CREATE TABLE tbName
USING hbase
OPTIONS(propertyName=propertyValue[,propertyName=propertyValue]*);
Table schema
When you create an HBase data table, you do not need to explicitly define the fields
in the data table. Example:
spark-sql> CREATE DATABASE IF NOT EXISTS default;
spark-sql> USE default;
spark-sql> DROP TABLE IF EXISTS hbase_table_test;
spark-sql> CREATE TABLE hbase_table_test
> USING hbase
> OPTIONS(
> catalog='{"table":{"namespace":"default","name":"test"},"rowkey":"key", "columns":{"key":{"cf":"rowkey", "col":"key", "type":"string"},"data":{"cf":"info", "col":"data", "type":"string"}}}',
> hbaseConfiguration='{"hbase.zookeeper.quorum":"a.b.c.d:2181"}');
spark-sql> DESC hbase_table_test;
key string NULL
data string NULL
Time taken: 0.436 seconds, Fetched 2 row(s)
Parameters
Parameter | Description | Required |
---|---|---|
catalog | The description of fields in an HBase data table, in the JSON format. | Yes |
hbaseConfiguration | The HBase configuration, in the JSON format. For example, set the HBase endpoint to
{"hbase.zookeeper.quorum":"a.b.c.d:2181"} .
|
Yes |
The following example shows the catalog configuration for the schema of an HBase data
table named table1:
{
"table":{"namespace":"default", "name":"table1"},
"rowkey":"key",
"columns":{
"col0":{"cf":"rowkey", "col":"key", "type":"string"},
"col1":{"cf":"cf1", "col":"col1", "type":"boolean"},
"col2":{"cf":"cf2", "col":"col2", "type":"double"},
"col3":{"cf":"cf3", "col":"col3", "type":"float"},
"col4":{"cf":"cf4", "col":"col4", "type":"int"},
"col5":{"cf":"cf5", "col":"col5", "type":"bigint"},
"col6":{"cf":"cf6", "col":"col6", "type":"smallint"},
"col7":{"cf":"cf7", "col":"col7", "type":"string"},
"col8":{"cf":"cf8", "col":"col8", "type":"tinyint"}
}
}