Access Table Store tables with Spark or Spark SQL

You can use Spark and Spark SQL to access data in Table Store directly by using the dependency package released by Table Store and E-MapReduce.

Install Spark/Spark SQL

  1. Download the Spark installation package that complies with the following requirements:
    • Release version: 1.6.2
    • Package type: Pre-built for Hadoop 2.6
    • Download type: Direct Download
  2. Unpack the installation package as follows.
    $ cd /home/admin/spark-1.6.2
    $ tar -zxvf spark-1.6.2-bin-hadoop2.6.tgz

Install JDK-7+

  1. Download and install the installation package of JDK-7+.
  2. Check the installation status as follows.
    $ java -version
    java version "1.8.0_77"
    Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
    Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)

Download Java SDK for Table Store

  1. Download the Java SDK dependency package (version 4.1.0 or later).
    Note The SDK dependency package is updated with Java SDK. Download the dependency package according to the latest Java SDK.
  2. Copy the SDK to the Spark directory as follows.
    $ mv tablestore-4.1.0-jar-with-dependencies.jar /home/admin/spark-1.6.2/

Download EMR dependency package

  • Download the Alibaba Cloud EMR dependency package.
    Note For more information on EMR, click here.
  • Rename the emr-sdk_2.10-1.3.0-20161025.065936-1.jar file.
    mv emr-sdk_2.10-1.3.0-20161025.065936-1.jar /home/admin/spark-1.6.2/emr-sdk_2.10-1.3.0-SNAPSHOT.jar

Run Spark SQL

$ cd /home/admin/spark-1.6.2/
$ bin/spark-sql --master local --jars tablestore-4.3.1-jar-with-dependencies.jar,emr-tablestore-1.4.2.jar