You can use a Hive internal table or external table to access Phoenix data in your E-MapReduce (EMR) cluster. This topic describes how to use Hive in your EMR cluster to access EMR Phoenix data.

Prerequisites

An EMR Hadoop cluster is created, and HBase, ZooKeeper, and Phoenix are selected from the optional services when you create the cluster. For more information, see Create a cluster.
Note EMR V4.X versions and EMR V5.X versions do not support Phoenix. Therefore, this topic applies only to EMR V3.X versions.

Access Phoenix data by using a Hive internal table

If no table is created in Phoenix, you can create an internal table in Hive. This way, a table that has the same schema as the Hive internal table is automatically created in Phoenix. In this example, an internal table is created in Hive to access Phoenix data.

  1. Open the Hive CLI.
    1. Log on to the master node of your EMR cluster in SSH mode. For more information, see Log on to a cluster.
    2. Run the following command to open the Hive CLI:
      hive
      If the following information is returned, the Hive CLI is opened:
      Logging initialized using configuration in file:/etc/ecm/hive-conf-2.3.5-2.0.3/hive-log4j2.properties Async: true
      Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
  2. Create an internal table in Hive and query data in the table.
    1. Run the following command to create an internal table in Hive:
      create table phoenix_hive_create_internal(s1 string,i1 int,f1 float,d1 double)
      stored by 'org.apache.phoenix.hive.PhoenixStorageHandler'
      tblproperties(
      "phoenix.table.name" = "phoenix_hive_create_internal",
      "phoenix.rowkeys" = "s1,i1",
      "phoenix.column.mapping" = "s1:s1,i1:i1,f1:f1,d1:d1",
      "phoenix.table.options" = "SALT_BUCKETS=10,DATA_BLOCK_ENCODING='DIFF'"
      );
      Note PhoenixStorageHandler is used to store the internal table and read Phoenix data.
    2. Run the following command to insert data into the internal table:
      insert into phoenix_hive_create_internal values('wyk',1,2.3412,3.14);
    3. Run the following command to query data in the table:
      select * from phoenix_hive_create_internal;
      The following information is returned:
      OK
      wyk     1       2.3412  3.14
      Time taken: 0.569 seconds, Fetched: 1 row(s)
  3. Open the Phoenix CLI.
    1. Log on to the master node of your EMR cluster in SSH mode. For more information, see Log on to a cluster.
    2. Run the following command to go to the phoenix-current directory:
      cd /usr/lib/phoenix-current/
    3. Run the following command to open the Phoenix CLI:
      python ./bin/sqlline.py 
  4. Run the following command to check whether the table in Phoenix contains the same data as the Hive internal table:
    select * from phoenix_hive_create_internal;
    The following information is returned:
    +------+-----+---------+-------+
    |  s1  | i1  |   f1    |  d1   |
    +------+-----+---------+-------+
    | wyk  | 1   | 2.3412  | 3.14  |
    +------+-----+---------+-------+
    Note The preceding information shows that the table in Phoenix contains the same data as the Hive internal table. This indicates that you have used Hive to access Phoenix data.

Access Phoenix data by using a Hive external table

If you want to use Hive to access an existing Phoenix table, you can create an external table in Hive and establish a mapping between the Hive external table and the Phoenix table to access data in the Phoenix table.

  1. Open the Hive CLI.
    1. Log on to the master node of your EMR cluster in SSH mode. For more information, see Log on to a cluster.
    2. Run the following command to open the Hive CLI:
      hive
      If the following information is returned, the Hive CLI is opened:
      Logging initialized using configuration in file:/etc/ecm/hive-conf-2.3.5-2.0.3/hive-log4j2.properties Async: true
      Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
  2. Run the following command to create an external table named ext_table in Hive and establish a mapping between the Hive external table and the Phoenix table:
    create external table ext_table(
      s1 string,
      i1 int,
      f1 float,
      d1 double
    )
    stored by 'org.apache.phoenix.hive.PhoenixStorageHandler'
    tblproperties(
      "phoenix.table.name" = "phoenix_hive_create_internal",
      "phoenix.rowkeys" = "s1, i1",
      "phoenix.column.mapping" = "s1:s1, i1:i1, f1:f1, d1:d1"
    );
  3. Run the following command to query data in the ext_table external table in Hive:
    select * from ext_table;
    The following information is returned:
    OK
    wyk 1 2.3412  3.14
    Time taken: 0.24 seconds, Fetched: 1 row(s)
    Note The preceding information shows that the ext_table external table contains the same data as the Phoenix table. This indicates that you have used Hive to access Phoenix data.

References