This topic describes how to make environment preparations to access a Tablestore table
by using Hive and HadoopMR.
Use Hive and HadoopMR to access a Tablestore table
You can use Hive and HadoopMR to access and analyze data in Tablestore directly by
using the dependency package released by the official teams of Tablestore and E-MapReduce.
Install JDK V7 or later
- Download the installation package of JDK V7 or later. Install JDK V7 or later.
- Follow the example to check the installation.
$ java -version
java version "1.8.0_77"
Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
Install Hadoop and start the Hadoop environment
- Download the Hadoop installation package whose version is later than 2.6.0. For more
information, visit Index of /apache/hadoop/common.
- Decompress the package and install Hadoop based on the cluster conditions.
- Follow the example to start the Hadoop environment.
$ bin/start-all.sh
# Check whether the service is started.
$ jps
24017 NameNode
24835 Jps
24131 DataNode
24438 ResourceManager
5114 HMaster
24287 SecondaryNameNode
24527 NodeManager
- Add the path of Hadoop in /etc/profile. Run the source /etc/profile command to make the configurations take effect.
export HADOOP_HOME=/data/hadoop/hadoop-2.6.0
export PATH=$PATH:$HADOOP_HOME/bin
Download the Hive installation package and install Hive
- Download the Hive installation package of the bin.tar.gz type. For more information
about the download path, visit DOWNLOADS.
- Follow the example to decompress the installation package.
$ mkdir /home/admin/hive-2.1.0
$ tar -zxvf apache-hive-2.1.0-bin.tar.gz -C /home/admin/
$ mv /home/admin/apache-hive-2.1.0-bin /home/admin/hive-2.1.0/
- Follow the example to initialize schema.
# Go to the specified directory.
$ cd /home/admin/hive-2.1.0/
# Start initialization. For mysql, replace derby with mysql.
# If an error occurs in the execution process, delete rm -rf metastore_db/ before you execute the code again.
$ ./bin/schematool -initSchema -dbType derby
- Follow the example to start the Hive environment.
$ ./bin/hive
# Check whether the service is started.
hive> show databases;
OK
default
Time taken: 0.207 seconds, Fetched: 1 row(s)
Download Tablestore SDK for Java
- Download related dependent packages of Java SDK whose version is later than 4.1.0
from the Maven library. For more information about the download path, see Tablestore SDK for Java.
Related dependent packages of Java SDK are released with the latest Java SDK. Download
the latest related dependent packages.
- Follow the example to copy the SDK to the directory of Hive.
$ mv tablestore-4.1.0-jar-with-dependencies.jar /home/admin/hive-2.1.0/
Download Alibaba Cloud EMR SDK
Download the dependent packages of EMR SDK. For more information about the specific
download path, visit aliyun-emapreduce-datasources.