edit-icon download-icon


Last Updated: Mar 15, 2018

Using Hive/HadoopMR to access Table Store

Data preparation

Name a table in Table Store as pet and import the following data:

Name Owner Species Sex Birth Death
Fluffy Harold cat f 1993-02-04
Claws Gwen cat m 1994-03-17
Buffy Harold dog f 1989-05-13
Fang Benny dog m 1990-08-27
Bowser Diane dog m 1979-08-31 1995-07-29
Chirpy Gwen bird f 1998-09-11
Whistler Gwen bird 1997-12-09
Slim Benny snake m 1996-04-29
Puffball Diane hamster f 1999-03-30

Note that column Name is the only Primary Key.

As Table Store is schemafree (according to the Data model topic), you do not need to input anything (such as NULL) into blank cells.

Example for accessing by Hive


Prepare the environment for Hadoop, Hive, JDK, and dependency package of Table Store SDK and EMR as Prerequisites.


  1. # You can add HADOOP_HOME and HADOOP_CLASSPATH into /etc/profile
  2. $ export HADOOP_HOME=${Your Hadoop Path}
  3. $ export HADOOP_CLASSPATH=emr-tablestore-1.4.2.jar:tablestore-4.3.1-jar-with-dependencies.jar:joda-time-2.9.4.jar
  4. $ bin/hive
  6. (name STRING, owner STRING, species STRING, sex STRING, birth STRING, death STRING)
  7. STORED BY 'com.aliyun.openservices.tablestore.hive.TableStoreStorageHandler'
  9. "tablestore.columns.mapping"="name,owner,species,sex,birth,death")
  11. "tablestore.endpoint"="YourEndpoint",
  12. "tablestore.access_key_id"="YourAccessKeyId",
  13. "tablestore.access_key_secret"="YourAccessKeySecret",
  14. "tablestore.table.name"="pet");
  15. hive> SELECT * FROM pet;
  16. Bowser Diane dog m 1979-08-31 1995-07-29
  17. Buffy Harold dog f 1989-05-13 NULL
  18. Chirpy Gwen bird f 1998-09-11 NULL
  19. Claws Gwen cat m 1994-03-17 NULL
  20. Fang Benny dog m 1990-08-27 NULL
  21. Fluffy Harold cat f 1993-02-04 NULL
  22. Puffball Diane hamster f 1999-03-30 NULL
  23. Slim Benny snake m 1996-04-29 NULL
  24. Whistler Gwen bird NULL 1997-12-09 NULL
  25. Time taken: 5.045 seconds, Fetched 9 row(s)
  26. hive> SELECT * FROM pet WHERE birth > "1995-01-01";
  27. Chirpy Gwen bird f 1998-09-11 NULL
  28. Puffball Diane hamster f 1999-03-30 NULL
  29. Slim Benny snake m 1996-04-29 NULL
  30. Whistler Gwen bird NULL 1997-12-09 NULL
  31. Time taken: 1.41 seconds, Fetched 4 row(s)

Parameters explanation


    • tablestore.columns.mapping (optional): By default, the field names of the external tables (written in lower case according to Hive conventions) are the same as the column names (names of Primary Key or Attribute columns) in Table Store. However, due to case-sensitivity or charsets, the names may be different. In this case, tablestore.columns.mapping needs to be specified. This parameter is a comma-separated string. A blank space cannot be added before or after a comma. Each item is a column name and the order is the same as the field names of the external tables.

      Note: Table Store supports column names with blank characters, which means a blank space is considered part of the column name.


    • tablestore.endpoint (required): The endpoint. You can view the endpoint information of the instance on the Table Store console.

    • tablestore.instance (optional): The instance name. If it is not specified, it is the first field of tablestore.endpoint.

    • tablestore.table.name (required): The table name in Table Store.

    • tablestore.access_key_id, tablestore.access_key_secret (required): See Access control.

    • tablestore.sts_token (optional): See Security Token.

Example for accessing by HadoopMR

The following example illustrates how to count rows in pet using HadoopMR.

Code examples

  • Construct Mappers and Reducers.

    1. public class RowCounter {
    2. public static class RowCounterMapper
    3. extends Mapper<PrimaryKeyWritable, RowWritable, Text, LongWritable> {
    4. private final static Text agg = new Text("TOTAL");
    5. private final static LongWritable one = new LongWritable(1);
    6. @Override
    7. public void map(
    8. PrimaryKeyWritable key, RowWritable value, Context context)
    9. throws IOException, InterruptedException {
    10. context.write(agg, one);
    11. }
    12. }
    13. public static class IntSumReducer
    14. extends Reducer<Text,LongWritable,Text,LongWritable> {
    15. @Override
    16. public void reduce(
    17. Text key, Iterable<LongWritable> values, Context context)
    18. throws IOException, InterruptedException {
    19. long sum = 0;
    20. for (LongWritable val : values) {
    21. sum += val.get();
    22. }
    23. context.write(key, new LongWritable(sum));
    24. }
    25. }
    26. }

    Each time HadoopMR fetches a row from pet, it calls map() of the mapper. The first two parameters, PrimaryKeyWritable and RowWritable, correspond to the row’s Primary Key and the contents of this row, respectively. You can get the Primary Key object and the row object defined by Table Store JAVA SDK by invoking PrimaryKeyWritable.getPrimaryKey() and RowWritable.getRow().

  • Configure Table Store as data source of mapper.

    1. private static RangeRowQueryCriteria fetchCriteria() {
    2. RangeRowQueryCriteria res = new RangeRowQueryCriteria("YourTableName");
    3. res.setMaxVersions(1);
    4. List<PrimaryKeyColumn> lower = new ArrayList<PrimaryKeyColumn>();
    5. List<PrimaryKeyColumn> upper = new ArrayList<PrimaryKeyColumn>();
    6. lower.add(new PrimaryKeyColumn("YourPkeyName", PrimaryKeyValue.INF_MIN));
    7. upper.add(new PrimaryKeyColumn("YourPkeyName", PrimaryKeyValue.INF_MAX));
    8. res.setInclusiveStartPrimaryKey(new PrimaryKey(lower));
    9. res.setExclusiveEndPrimaryKey(new PrimaryKey(upper));
    10. return res;
    11. }
    12. public static void main(String[] args) throws Exception {
    13. Configuration conf = new Configuration();
    14. Job job = Job.getInstance(conf, "row count");
    15. job.addFileToClassPath(new Path("hadoop-connector.jar"));
    16. job.setJarByClass(RowCounter.class);
    17. job.setMapperClass(RowCounterMapper.class);
    18. job.setCombinerClass(IntSumReducer.class);
    19. job.setReducerClass(IntSumReducer.class);
    20. job.setOutputKeyClass(Text.class);
    21. job.setOutputValueClass(LongWritable.class);
    22. job.setInputFormatClass(TableStoreInputFormat.class);
    23. TableStoreInputFormat.setEndpoint(job, "https://YourInstance.Region.ots.aliyuncs.com/");
    24. TableStoreInputFormat.setCredential(job, "YourAccessKeyId", "YourAccessKeySecret");
    25. TableStoreInputFormat.addCriteria(job, fetchCriteria());
    26. FileOutputFormat.setOutputPath(job, new Path("output"));
    27. System.exit(job.waitForCompletion(true) ? 0 : 1);
    28. }

    In the preceding example, job.setInputFormatClass(TableStoreInputFormat.class) is used to set Table Store as the data source. To complete the example, the following steps are also required:

    • Deploy hadoop-connector.jar to the cluster and add it to classpath. The local path of hadoop-connector.jar is specified by addFileToClassPath(). In the code example, is assumes that hadoop-connector.jar is in the current path.

    • Specify the endpoint and Access Key when accessing Table Store. They can be set using TableStoreInputFormat.setEndpoint() and TableStoreInputFormat.setCredential().

    • Specify a table to count.


      • TableStoreInputFormat.addCriteria() can be invoked multiple times. Each invocation adds a RangeRowQueryCriteria object.
      • Set setFilter() and addColumnsToGet() to filter unrequired rows and columns in server-side to reduce the cost and improve the performance of Table Store.
      • Add RangeRowQueryCriterias to multiple tables to merge them.
      • Add multiple RangeRowQueryCriterias to a single table to tune the splits. TableStore-Hadoop Connector can then split the range of the user’s input based on specified requirements.

Run the program

  1. $ HADOOP_CLASSPATH=hadoop-connector.jar bin/hadoop jar row-counter.jar
  2. ...
  3. $ find output -type f
  4. output/_SUCCESS
  5. output/part-r-00000
  6. output/._SUCCESS.crc
  7. output/.part-r-00000.crc
  8. $ cat out/part-r-00000
  9. TOTAL 9

Data type conversion

Table Store and Hive/Spark support different sets of data types.

The following table indicates data type conversion support from Table Store (rows) to Hive (columns).

INTEGER Yes, with limited precision Yes, with limited precision Yes, with limited precision Yes Yes, with limited precision Yes, with limited precision
DOUBLE Yes, with limited precision Yes, with limited precision Yes, with limited precision Yes, with limited precision Yes, with limited precision Yes
Thank you! We've received your feedback.