This topic describes how to use Spark to access HBase or Lindorm.

Use Spark SQL statements to access HBase

Sample SQL statement:
spark-sql --jars alihbase-connector-2.1.0.jar,alihbase-client-2.1.0.jar,hbase-spark-1.0.1-SNAPSHOT.jar,/hbase_home/hbase-shaded-client-2.1.0.jar,/hbase_home/hbase-shaded-mapreduce-2.1.0.jar
Note
  • You can replace the versions in alihbase-connector-2.1.0.jar and alihbase-client-2.1.0.jar based on your business requirements.
  • 1.0.1 in hbase-spark-1.0.1-SNAPSHOT.jar indicates the version of Spark DataSource of org.apache.hadoop.hbase.spark. You can specify a version that is compatible with your cluster based on the version of HBase Connector.
  • /hbase_home/hbase-shaded-client-2.1.0.jar and /hbase_home/hbase-shaded-mapreduce-2.1.0.ja are installation files of open source HBase.
The following example shows how to create a table and read data from the table.
CREATE TABLE test_hbase
USING org.apache.hadoop.hbase.spark
OPTIONS (
  'catalog'=
        '{
            "table": {
                "namespace": "default",
                "name": "test1"
            },
            "rowkey": "rowkey",
            "columns": {
                "rowkey": {
                    "cf": "rowkey",
                    "col": "rowkey",
                    "type": "string"
                },
                "name": {
                    "cf": "f1",
                    "col": "name",
                    "type": "string"
                }
            }
        }'
,'hbase.spark.use.hbasecontext'='false');

select * from test_hbase;
                

The procedure of using Spark to access Lindorm is the same as the procedure of using Spark to access HBase.

References