This topic describes how to use Spark to access HBase or Lindorm.
Use Spark SQL statements to access HBase
Sample SQL statement:
spark-sql --jars alihbase-connector-2.1.0.jar,alihbase-client-2.1.0.jar,hbase-spark-1.0.1-SNAPSHOT.jar,/hbase_home/hbase-shaded-client-2.1.0.jar,/hbase_home/hbase-shaded-mapreduce-2.1.0.jarNote
- You can replace the versions in
alihbase-connector-2.1.0.jarandalihbase-client-2.1.0.jarbased on your business requirements. - 1.0.1 in
hbase-spark-1.0.1-SNAPSHOT.jarindicates the version of Spark DataSource of org.apache.hadoop.hbase.spark. You can specify a version that is compatible with your cluster based on the version of HBase Connector. /hbase_home/hbase-shaded-client-2.1.0.jarand/hbase_home/hbase-shaded-mapreduce-2.1.0.jaare installation files of open source HBase.
The following example shows how to create a table and read data from the table.
CREATE TABLE test_hbase
USING org.apache.hadoop.hbase.spark
OPTIONS (
'catalog'=
'{
"table": {
"namespace": "default",
"name": "test1"
},
"rowkey": "rowkey",
"columns": {
"rowkey": {
"cf": "rowkey",
"col": "rowkey",
"type": "string"
},
"name": {
"cf": "f1",
"col": "name",
"type": "string"
}
}
}'
,'hbase.spark.use.hbasecontext'='false');
select * from test_hbase;
The procedure of using Spark to access Lindorm is the same as the procedure of using Spark to access HBase.