Use Hive internal tables or external tables to read from and write to Iceberg tables in your E-MapReduce (EMR) Hadoop cluster.
Supported operations
The following operations are supported when using Hive with Iceberg in EMR:
Creating a table (internal or external)
Inserting data
Querying data
Limitations
Hive can read from and write to Iceberg tables only on Hadoop clusters running EMR V3.38.0, EMR V5.4.0, or later minor versions.
On clusters running EMR V3.38.0 or EMR V5.4.0, a compatibility issue affects the Hive–Iceberg integration. Clear the
hive.metastore.event.listeners(V3.38.0) ormetastore.event.listeners(V5.4.0) parameter before proceeding. See Step 1 below.When the cluster Metadata is set to DLF Unified Metadata:
On EMR V3.38.X and EMR V5.3.X–V5.4.X, only Hive external tables are supported.
On EMR V3.39.0+ and EMR V5.5.0+, both internal and external tables are supported.
Prerequisites
Before you begin, ensure that you have:
A Hadoop cluster running EMR V3.38.0, EMR V5.4.0, or a later minor version. To create one, see Create a cluster.
Step 1: Fix the Hive–Iceberg compatibility issue (EMR V3.38.0 and V5.4.0 only)
Skip this step if your cluster runs a version other than EMR V3.38.0 or EMR V5.4.0.
Go to the Hive service configuration page.
Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.
In the top navigation bar, select the region where your cluster resides and select a resource group.
Find the target cluster and click Services in the Actions column.
On the Services tab, click Configure in the Hive section.
Clear the listener parameter.
EMR V3.38.0: Search for hive.metastore.event.listeners and delete its value.

EMR V5.4.0: Search for metastore.event.listeners and delete its value.
Save the configuration.
Click Save.
In the dialog box, set the Execution Reason parameter and click Save.
Restart the Hive service. For details, see Restart a service.
Step 2: Open the Hive CLI
Log on to the master node of your EMR cluster over SSH. For details, see Log on to a cluster.
Run the following command to start the Hive CLI:
hiveThe following output confirms that the Hive CLI has started:
Logging initialized using configuration in file:/etc/ecm/hive-conf-2.3.5-2.0.3/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Step 3: Create a table
Follow the path that matches your cluster's Metadata setting.
DLF Unified Metadata
When the cluster Metadata is set to DLF Unified Metadata, follow these steps.
Configure the DLF catalog (EMR V3.38.X and V5.3.X–V5.4.X only)
For clusters on EMR V3.39.0+ or V5.5.0+, the Data Lake Formation (DLF) catalog is pre-configured. Skip to Create the table.
Run the following commands in the Hive CLI. Replace all placeholder values with your actual settings.
SET iceberg.catalog=dlf_catalog;
SET iceberg.catalog.dlf_catalog.type=custom;
SET iceberg.catalog.dlf_catalog.io-impl=org.apache.iceberg.hadoop.HadoopFileIO;
SET iceberg.catalog.dlf_catalog.catalog-impl=org.apache.iceberg.aliyun.dlf.DlfCatalog;
SET iceberg.catalog.dlf_catalog.access.key.id=<yourAccessKeyId>;
SET iceberg.catalog.dlf_catalog.access.key.secret=<yourAccessKeySecret>;
SET iceberg.catalog.dlf_catalog.warehouse=<yourOSSWarehousePath>
SET iceberg.catalog.dlf_catalog.dlf.catalog-id=<yourCatalogId>;
SET iceberg.catalog.dlf_catalog.dlf.endpoint=<yourDLFEndpoint>;
SET iceberg.catalog.dlf_catalog.dlf.region-id=<yourDLFRegionId>;Object Storage Service (OSS) is used as the warehouse in this example. The parameters are described in the following table. For additional parameters, see Configuration of DLF metadata.
| Parameter | Description |
|---|---|
iceberg.catalog | Name of the Iceberg catalog. Set to dlf_catalog. |
iceberg.catalog.dlf_catalog.type | Catalog type. Set to custom for DLF. |
iceberg.catalog.dlf_catalog.io-impl | File I/O implementation class. |
iceberg.catalog.dlf_catalog.catalog-impl | Catalog implementation class for DLF. |
<yourAccessKeyId> | Your Alibaba Cloud AccessKey ID. |
<yourAccessKeySecret> | Your Alibaba Cloud AccessKey Secret. |
<yourOSSWarehousePath> | OSS path for the data warehouse, for example, oss://mybucket/warehouse. |
<yourCatalogId> | DLF catalog ID. |
<yourDLFEndpoint> | DLF service endpoint. |
<yourDLFRegionId> | Region ID where DLF is deployed, for example, cn-hangzhou. |
Create the table
EMR V3.39.0+ and V5.5.0+: Create an internal Iceberg table.
CREATE DATABASE iceberg_db;
USE iceberg_db;
CREATE TABLE hive_iceberg (
id BIGINT,
data STRING
) PARTITIONED BY (
dt STRING
) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION 'hdfs://master-1-1.c-****:9000/user/hive/warehouse/iceberg_db/hive_iceberg'
TBLPROPERTIES ('iceberg.catalog'='dlf');
insert into hive_iceberg values(1,"abc","20230407"),(2,"hello", "20230407");
select * from hive_iceberg;The LOCATION path can be an HDFS or OSS path. Update it to match your environment.EMR V3.38.X and V5.3.X–V5.4.X: Create a Hive external table that maps to an existing Iceberg table.
CREATE EXTERNAL TABLE iceberg_db.sample_tbl
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
LOCATION 'oss://some_bucket/some_path/sample'
TBLPROPERTIES (
'iceberg.catalog' = '<yourCatalogName>',
'name' = 'iceberg_db.sample'
);Replace <yourCatalogName> with the catalog name that matches your cluster version. For the correct catalog name, see Configuration of DLF metadata. The table iceberg_db.sample must already exist. To create it, see Basic usage.
Built-in MySQL or Self-managed RDS
When the cluster Metadata is set to Built-in MySQL or Self-managed RDS, choose one of the following approaches.
Default catalog: Run the following statement directly.
CREATE TABLE iceberg_db.sample_tbl (
id BIGINT,
name STRING
)
STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';Custom catalog (example: Hive catalog): Iceberg supports several custom catalog types. This example uses a Hive catalog.
Register the Hive catalog.
SET iceberg.catalog.hive_catalog.type=hive;Create the table with the catalog reference.
CREATE TABLE iceberg_db.sample_tbl ( id BIGINT, name STRING ) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' TBLPROPERTIES ('iceberg.catalog'='hive_catalog');
Step 4: Insert data
INSERT INTO iceberg_db.sample_tbl VALUES (4, 'd'), (5, 'e'), (6, 'f');Troubleshooting
return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
This error occurs because the fb303 JAR is missing from the classpath. Add it before running the insert statement.
Non-DataLake clusters:
add jar /usr/lib/hive-current/lib/libfb303-0.9.3.jarDataLake clusters:
add jar /opt/apps/HIVE/hive-current/lib/libfb303-0.9.3.jar
Step 5: Query data
SELECT * FROM iceberg_db.sample_tbl;Examples
Example 1: External table with DLF metadata (EMR V5.4.0)
This example creates a Hive external table on an EMR V5.4.0 cluster using DLF Unified Metadata and maps it to an existing Iceberg table.
Setup: Create a Hadoop cluster of EMR V5.4.0 with Metadata set to DLF Unified Metadata. For details, see Create a cluster.
Configure the DLF catalog. Replace all placeholder values with your actual settings.
SET iceberg.catalog=dlf_catalog; SET iceberg.catalog.dlf_catalog.type=custom; SET iceberg.catalog.dlf_catalog.io-impl=org.apache.iceberg.hadoop.HadoopFileIO; SET iceberg.catalog.dlf_catalog.catalog-impl=org.apache.iceberg.aliyun.dlf.DlfCatalog; SET iceberg.catalog.dlf_catalog.access.key.id=<yourAccessKeyId>; SET iceberg.catalog.dlf_catalog.access.key.secret=<yourAccessKeySecret>; SET iceberg.catalog.dlf_catalog.warehouse=<yourOSSWarehousePath> SET iceberg.catalog.dlf_catalog.dlf.catalog-id=<yourCatalogId>; SET iceberg.catalog.dlf_catalog.dlf.endpoint=<yourDLFEndpoint>; SET iceberg.catalog.dlf_catalog.dlf.region-id=<yourDLFRegionId>;In this example,
dlf_catalogis the catalog name and OSS is the warehouse. For parameter details, see Configuration of DLF metadata.Create a Hive external table named
iceberg_db.sample_extthat maps to the existingiceberg_db.sampleIceberg table.CREATE EXTERNAL TABLE iceberg_db.sample_ext STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler' LOCATION 'oss://mybucket/iceberg-test/warehouse/iceberg_db.db/sample' TBLPROPERTIES ( 'iceberg.catalog' = 'dlf_catalog', 'name' = 'iceberg_db.sample' );Query the table.
SELECT * FROM iceberg_db.sample_ext;Output:
OK 1 a 2 b 3 c Time taken: 19.075 seconds, Fetched: 3 row(s)Insert additional rows.
INSERT INTO iceberg_db.sample_ext VALUES (4, 'd'), (5, 'e'), (6, 'f');Query again to confirm the insert.
SELECT * FROM iceberg_db.sample_ext;Output:
OK 1 a 2 b 3 c 4 d 5 e 6 f Time taken: 18.908 seconds, Fetched: 6 row(s)
Example 2: Internal table with built-in MySQL metadata (EMR V5.4.0)
This example creates a Hive internal table in Iceberg format on an EMR V5.4.0 cluster using the default Hive metadata.
Setup: Create a Hadoop cluster of EMR V5.4.0 with Metadata set to Built-in MySQL. For details, see Create a cluster.
Create a database.
CREATE DATABASE IF NOT EXISTS iceberg_db;Create the internal table.
CREATE TABLE iceberg_db.sample_tbl ( id BIGINT, name STRING ) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';Insert rows.
INSERT INTO iceberg_db.sample_tbl VALUES (1, 'a'), (2, 'b'), (3, 'c');Query the table.
SELECT * FROM iceberg_db.sample_tbl;Output:
OK 1 a 2 b 3 c Time taken: 0.233 seconds, Fetched: 3 row(s)
What's next
For an overview of Iceberg support in EMR, see Iceberg.
For the full list of DLF catalog parameters, see Configuration of DLF metadata.