All Products
Search
Document Center

E-MapReduce:Use Hive to access Iceberg data

Last Updated:Mar 26, 2026

Use Hive internal tables or external tables to read from and write to Iceberg tables in your E-MapReduce (EMR) Hadoop cluster.

Supported operations

The following operations are supported when using Hive with Iceberg in EMR:

  • Creating a table (internal or external)

  • Inserting data

  • Querying data

Limitations

  • Hive can read from and write to Iceberg tables only on Hadoop clusters running EMR V3.38.0, EMR V5.4.0, or later minor versions.

  • On clusters running EMR V3.38.0 or EMR V5.4.0, a compatibility issue affects the Hive–Iceberg integration. Clear the hive.metastore.event.listeners (V3.38.0) or metastore.event.listeners (V5.4.0) parameter before proceeding. See Step 1 below.

  • When the cluster Metadata is set to DLF Unified Metadata:

    • On EMR V3.38.X and EMR V5.3.X–V5.4.X, only Hive external tables are supported.

    • On EMR V3.39.0+ and EMR V5.5.0+, both internal and external tables are supported.

Prerequisites

Before you begin, ensure that you have:

  • A Hadoop cluster running EMR V3.38.0, EMR V5.4.0, or a later minor version. To create one, see Create a cluster.

Step 1: Fix the Hive–Iceberg compatibility issue (EMR V3.38.0 and V5.4.0 only)

Skip this step if your cluster runs a version other than EMR V3.38.0 or EMR V5.4.0.

  1. Go to the Hive service configuration page.

    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

    2. In the top navigation bar, select the region where your cluster resides and select a resource group.

    3. Find the target cluster and click Services in the Actions column.

    4. On the Services tab, click Configure in the Hive section.

  2. Clear the listener parameter.

    • EMR V3.38.0: Search for hive.metastore.event.listeners and delete its value. hive_listeners

    • EMR V5.4.0: Search for metastore.event.listeners and delete its value.

  3. Save the configuration.

    1. Click Save.

    2. In the dialog box, set the Execution Reason parameter and click Save.

  4. Restart the Hive service. For details, see Restart a service.

Step 2: Open the Hive CLI

  1. Log on to the master node of your EMR cluster over SSH. For details, see Log on to a cluster.

  2. Run the following command to start the Hive CLI:

    hive

    The following output confirms that the Hive CLI has started:

    Logging initialized using configuration in file:/etc/ecm/hive-conf-2.3.5-2.0.3/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

Step 3: Create a table

Follow the path that matches your cluster's Metadata setting.

DLF Unified Metadata

When the cluster Metadata is set to DLF Unified Metadata, follow these steps.

Configure the DLF catalog (EMR V3.38.X and V5.3.X–V5.4.X only)

For clusters on EMR V3.39.0+ or V5.5.0+, the Data Lake Formation (DLF) catalog is pre-configured. Skip to Create the table.

Run the following commands in the Hive CLI. Replace all placeholder values with your actual settings.

SET iceberg.catalog=dlf_catalog;
SET iceberg.catalog.dlf_catalog.type=custom;
SET iceberg.catalog.dlf_catalog.io-impl=org.apache.iceberg.hadoop.HadoopFileIO;
SET iceberg.catalog.dlf_catalog.catalog-impl=org.apache.iceberg.aliyun.dlf.DlfCatalog;
SET iceberg.catalog.dlf_catalog.access.key.id=<yourAccessKeyId>;
SET iceberg.catalog.dlf_catalog.access.key.secret=<yourAccessKeySecret>;
SET iceberg.catalog.dlf_catalog.warehouse=<yourOSSWarehousePath>
SET iceberg.catalog.dlf_catalog.dlf.catalog-id=<yourCatalogId>;
SET iceberg.catalog.dlf_catalog.dlf.endpoint=<yourDLFEndpoint>;
SET iceberg.catalog.dlf_catalog.dlf.region-id=<yourDLFRegionId>;

Object Storage Service (OSS) is used as the warehouse in this example. The parameters are described in the following table. For additional parameters, see Configuration of DLF metadata.

ParameterDescription
iceberg.catalogName of the Iceberg catalog. Set to dlf_catalog.
iceberg.catalog.dlf_catalog.typeCatalog type. Set to custom for DLF.
iceberg.catalog.dlf_catalog.io-implFile I/O implementation class.
iceberg.catalog.dlf_catalog.catalog-implCatalog implementation class for DLF.
<yourAccessKeyId>Your Alibaba Cloud AccessKey ID.
<yourAccessKeySecret>Your Alibaba Cloud AccessKey Secret.
<yourOSSWarehousePath>OSS path for the data warehouse, for example, oss://mybucket/warehouse.
<yourCatalogId>DLF catalog ID.
<yourDLFEndpoint>DLF service endpoint.
<yourDLFRegionId>Region ID where DLF is deployed, for example, cn-hangzhou.

Create the table

EMR V3.39.0+ and V5.5.0+: Create an internal Iceberg table.

CREATE DATABASE iceberg_db;
USE iceberg_db;

CREATE TABLE hive_iceberg (
    id   BIGINT,
    data STRING
) PARTITIONED BY (
    dt STRING
) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
  LOCATION 'hdfs://master-1-1.c-****:9000/user/hive/warehouse/iceberg_db/hive_iceberg'
  TBLPROPERTIES ('iceberg.catalog'='dlf');
insert into hive_iceberg values(1,"abc","20230407"),(2,"hello", "20230407");
select * from hive_iceberg;
The LOCATION path can be an HDFS or OSS path. Update it to match your environment.

EMR V3.38.X and V5.3.X–V5.4.X: Create a Hive external table that maps to an existing Iceberg table.

CREATE EXTERNAL TABLE iceberg_db.sample_tbl
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION 'oss://some_bucket/some_path/sample'
TBLPROPERTIES (
  'iceberg.catalog' = '<yourCatalogName>',
  'name'            = 'iceberg_db.sample'
);

Replace <yourCatalogName> with the catalog name that matches your cluster version. For the correct catalog name, see Configuration of DLF metadata. The table iceberg_db.sample must already exist. To create it, see Basic usage.

Built-in MySQL or Self-managed RDS

When the cluster Metadata is set to Built-in MySQL or Self-managed RDS, choose one of the following approaches.

Default catalog: Run the following statement directly.

CREATE TABLE iceberg_db.sample_tbl (
  id   BIGINT,
  name STRING
)
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';

Custom catalog (example: Hive catalog): Iceberg supports several custom catalog types. This example uses a Hive catalog.

  1. Register the Hive catalog.

    SET iceberg.catalog.hive_catalog.type=hive;
  2. Create the table with the catalog reference.

    CREATE TABLE iceberg_db.sample_tbl (
      id   BIGINT,
      name STRING
    )
    STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
    TBLPROPERTIES ('iceberg.catalog'='hive_catalog');

Step 4: Insert data

INSERT INTO iceberg_db.sample_tbl VALUES (4, 'd'), (5, 'e'), (6, 'f');

Troubleshooting

return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

This error occurs because the fb303 JAR is missing from the classpath. Add it before running the insert statement.

  • Non-DataLake clusters:

    add jar /usr/lib/hive-current/lib/libfb303-0.9.3.jar
  • DataLake clusters:

    add jar /opt/apps/HIVE/hive-current/lib/libfb303-0.9.3.jar

Step 5: Query data

SELECT * FROM iceberg_db.sample_tbl;

Examples

Example 1: External table with DLF metadata (EMR V5.4.0)

This example creates a Hive external table on an EMR V5.4.0 cluster using DLF Unified Metadata and maps it to an existing Iceberg table.

Setup: Create a Hadoop cluster of EMR V5.4.0 with Metadata set to DLF Unified Metadata. For details, see Create a cluster.

  1. Complete Step 1 and Step 2.

  2. Configure the DLF catalog. Replace all placeholder values with your actual settings.

    SET iceberg.catalog=dlf_catalog;
    SET iceberg.catalog.dlf_catalog.type=custom;
    SET iceberg.catalog.dlf_catalog.io-impl=org.apache.iceberg.hadoop.HadoopFileIO;
    SET iceberg.catalog.dlf_catalog.catalog-impl=org.apache.iceberg.aliyun.dlf.DlfCatalog;
    SET iceberg.catalog.dlf_catalog.access.key.id=<yourAccessKeyId>;
    SET iceberg.catalog.dlf_catalog.access.key.secret=<yourAccessKeySecret>;
    SET iceberg.catalog.dlf_catalog.warehouse=<yourOSSWarehousePath>
    SET iceberg.catalog.dlf_catalog.dlf.catalog-id=<yourCatalogId>;
    SET iceberg.catalog.dlf_catalog.dlf.endpoint=<yourDLFEndpoint>;
    SET iceberg.catalog.dlf_catalog.dlf.region-id=<yourDLFRegionId>;

    In this example, dlf_catalog is the catalog name and OSS is the warehouse. For parameter details, see Configuration of DLF metadata.

  3. Create a Hive external table named iceberg_db.sample_ext that maps to the existing iceberg_db.sample Iceberg table.

    CREATE EXTERNAL TABLE iceberg_db.sample_ext
    STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
    LOCATION 'oss://mybucket/iceberg-test/warehouse/iceberg_db.db/sample'
    TBLPROPERTIES (
        'iceberg.catalog' = 'dlf_catalog',
        'name'            = 'iceberg_db.sample'
    );
  4. Query the table.

    SELECT * FROM iceberg_db.sample_ext;

    Output:

    OK
    1    a
    2    b
    3    c
    Time taken: 19.075 seconds, Fetched: 3 row(s)
  5. Insert additional rows.

    INSERT INTO iceberg_db.sample_ext VALUES (4, 'd'), (5, 'e'), (6, 'f');
  6. Query again to confirm the insert.

    SELECT * FROM iceberg_db.sample_ext;

    Output:

    OK
    1    a
    2    b
    3    c
    4    d
    5    e
    6    f
    Time taken: 18.908 seconds, Fetched: 6 row(s)

Example 2: Internal table with built-in MySQL metadata (EMR V5.4.0)

This example creates a Hive internal table in Iceberg format on an EMR V5.4.0 cluster using the default Hive metadata.

Setup: Create a Hadoop cluster of EMR V5.4.0 with Metadata set to Built-in MySQL. For details, see Create a cluster.

  1. Complete Step 1 and Step 2.

  2. Create a database.

    CREATE DATABASE IF NOT EXISTS iceberg_db;
  3. Create the internal table.

    CREATE TABLE iceberg_db.sample_tbl (
      id   BIGINT,
      name STRING
    )
    STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';
  4. Insert rows.

    INSERT INTO iceberg_db.sample_tbl VALUES (1, 'a'), (2, 'b'), (3, 'c');
  5. Query the table.

    SELECT * FROM iceberg_db.sample_tbl;

    Output:

    OK
    1    a
    2    b
    3    c
    Time taken: 0.233 seconds, Fetched: 3 row(s)

What's next