This topic describes how to use Hive to perform basic operations, such as create databases and tables, in an E-MapReduce (EMR) cluster.

Prerequisites

An EMR cluster is created. For more information, see Create a cluster.

Open the Hive CLI

  1. Log on to the EMR cluster in SSH mode. For more information, see Log on to a cluster.
  2. Run the following command to switch to the hadoop user:
    su hadoop
  3. Run the following command to open the Hive CLI:
    hive

Manage databases

In this example, a database named testdb is used.

  1. Run the following command to create a database:
    create database if not exists testdb;

    If the returned information contains OK, the testdb database is created.

  2. Run the following command to query the information about a database:
    desc database testdb;
  3. Run the following command to use a database:
    use testdb;
  4. Run the following command to delete a database:
    drop database if exists testdb;

    If the returned information contains OK, the database is deleted.

Manage tables

In this example, a table named t is used.

  1. Run the following command to create a table:
    create table if not exists t (id bigint, value string);

    If the returned information contains OK, the t table is created.

  2. Run the following command to query the information about a table:
    desc formatted t;
  3. Run the following command to query all existing tables:
    show tables;
    The following information is returned:
    OK
    t
  4. Run the following command to delete a table:
    drop table if exists t;

    If the returned information contains OK, the table is deleted.

Execute SQL statements

  1. Execute the following statement to insert data:
    insert into table t select 1, 'value-1';
    If the returned information contains OK, the data is inserted.
    OK
    Time taken: 14.73 seconds
  2. Execute the following statement to query the first 10 records in a table:
    select * from t limit 10;
    The following information is returned:
    OK
    1       value-1
    Time taken: 11.48 seconds, Fetched: 1 row(s)
  3. Execute the following statement to aggregate data:
    select value, count(id) from t group by value;
    The following information is returned:
    OK
    value-1 1
    Time taken: 20.11 seconds, Fetched: 1 row(s)