This topic describes how to use Hive to perform basic operations, such as create databases and tables, in an E-MapReduce (EMR) cluster.

Prerequisites

An EMR cluster is created. For more information, see Create a cluster.

Open the Hive command line

  1. Log on to the master node of the cluster in SSH mode. For more information, see Connect to the master node of an EMR cluster in SSH mode.
  2. Run the following command to switch to the hadoop user:
    su hadoop
  3. Run the following command to open the Hive command line:
    hive
    If the following information is returned, the Hive command line is opened:
    Logging initialized using configuration in file:/etc/ecm/hive-conf-2.3.5-2.0.3/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

Manage databases

A database named testdb is used as an example.

  • Create a database
    create database if not exists testdb;

    If the returned information contains OK, the database testdb is created.

  • View database information
    desc database testdb;
    Information similar to that shown in the following figure is returned.View database information
  • Use a database
    use testdb;
    Information similar to that shown in the following figure is returned.use database
  • Delete a database
    drop database if exists testdb;

    If the returned information contains OK, the database is deleted.

Manage tables

A table named t is used as an example.

  • Create a table
    create table if not exists t (id bigint, value string);

    If the returned information contains OK, the table t is created.

  • View table information
    desc formatted t;
    Information similar to that shown in the following figure is returned.View table information
  • Query all existing tables
    show tables;
    The following information is returned:
    OK
    t
  • Delete a table
    drop table if exists t;

    If the returned information contains OK, the table is deleted.

Execute SQL statements

  • Insert data
    insert into table t select 1, 'value-1';
    If the returned information contains OK, the data is inserted.
    OK
    Time taken: 14.73 seconds
  • Query the first 10 records in a table
    select * from t limit 10;
    The following information is returned:
    OK
    1       value-1
    Time taken: 11.48 seconds, Fetched: 1 row(s)
  • Aggregate data
    select value, count(id) from t group by value;
    The following information is returned:
    OK
    value-1 1
    Time taken: 20.11 seconds, Fetched: 1 row(s)