All Products
Search
Document Center

Object Storage Service:Use Impala on an EMR cluster to query data stored in OSS-HDFS

Last Updated:Mar 20, 2026

This topic describes how to use Impala on an E-MapReduce (EMR) cluster to query data stored in OSS-HDFS.

Prerequisites

Before you begin, ensure that you have:

Usage notes

  • OSS-HDFS paths use the format oss://<bucket-name>.<endpoint>/<path>. Specify this format in the LOCATION clause when creating a database or table.

Query data stored in OSS-HDFS

Step 1: Log on to the EMR cluster

  1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

  2. Click the EMR cluster that you created.

  3. On the Nodes tab, click p480359.jpg on the left side of the node group.

  4. Click the ECS instance ID. On the Instances page, click Connect next to the instance ID.

For details on connecting via SSH key pair or SSH password from Windows or Linux, see Log on to a cluster.

Step 2: Connect to Impala

Run the following command to open an Impala shell session. For more information, see Connect to Impala.

impala-shell -i core-1-1

Step 3: Create a database, table, and query data

  1. Create a database with its storage location set to an OSS-HDFS path.

       CREATE DATABASE store LOCATION 'oss://<bucket-name>.<endpoint>/impala';

    Replace <bucket-name> and <endpoint> with your actual bucket name and endpoint.

  2. Switch to the database.

       USE store;
  3. Create an external table in Parquet format.

       CREATE EXTERNAL TABLE customer_demographics (
         `cd_demo_sk`           INT,
         `cd_gender`            STRING,
         `cd_marital_status`    STRING,
         `cd_education_status`  STRING,
         `cd_purchase_estimate` INT,
         `cd_credit_rating`     STRING,
         `cd_dep_count`         INT,
         `cd_dep_employed_count` INT,
         `cd_dep_college_count`  INT)
       STORED AS PARQUET;
  4. Insert sample data.

       INSERT INTO customer_demographics
       VALUES
         (1, 'Male',   'Single',  'Graduate',     1000, 'AAA', 2, 1, 1),
         (2, 'Female', 'Married', 'Undergraduate', 2000, 'BBB', 3, 2, 2);
  5. Query the data.

       SELECT * FROM customer_demographics;