All Products
Search
Document Center

E-MapReduce:Use Hive to access Iceberg data

Last Updated:Jul 21, 2025

You can use a Hive internal table or a Hive external table to access Iceberg data. This topic describes how to use Hive in your E-MapReduce (EMR) cluster to access EMR Iceberg data.

Prerequisites

An EMR Hadoop cluster is created. For more information, see Create a cluster.

Note

This topic applies only to a Hadoop cluster of EMR V3.38.0, EMR V5.4.0, or a minor version later than EMR V3.38.0 or EMR V5.4.0.

Limits

You can use Hive to read data from or write data to an Iceberg table only if your Hadoop cluster is of EMR V3.38.0, EMR V5.4.0, or a minor version later than EMR V3.38.0 or EMR V5.4.0.

Procedure

  1. Optional:If your Hadoop cluster is of EMR V3.38.0 or EMR V5.4.0, modify configuration items.

    This is because compatibility issues occur in the integration between Hive and Iceberg in EMR V3.38.0 or EMR V5.4.0.

    1. Go to the Hive service page.

      1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

      2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.

      3. On the EMR on ECS page, find the desired cluster and click Services in the Actions column.

      4. On the Services tab, click Configure in the Hive section.

    2. Modify configuration items based on the version of the cluster.

      • If the cluster is of EMR V3.38.0, search for the hive.metastore.event.listeners parameter on the Configure tab of the Hive service page and delete the value of this parameter. hive_listeners

      • If the cluster is of EMR V5.4.0, search for the metastore.event.listeners parameter on the Configure tab of the Hive service page and delete the value of this parameter.

    3. Save the configurations.

      1. Click Save.

      2. In the dialog box that appears, configure the Execution Reason parameter and click Save.

    4. Restart the Hive service. For more information, see Restart a service.

  2. Open the Hive CLI.

    1. Log on to the master node of your EMR cluster in SSH mode. For more information, see Log on to a cluster.

    2. Run the following command to open the Hive CLI:

      hive

      If the following information is returned, the Hive CLI is started:

      Logging initialized using configuration in file:/etc/ecm/hive-conf-2.3.5-2.0.3/hive-log4j2.properties Async: true
      Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
  3. Create a table.

    • If you set the Metadata parameter to DLF Unified Metadata when you create the cluster, perform the following steps to create a table:

      In this case, you can use only a Hive external table to access data in an Iceberg table.

      1. For clusters of EMR V3.38.X and clusters of EMR V5.3.X to EMR V5.4.X, you must run the following commands to allow Hive to access the metadata of Iceberg tables in Data Lake Formation (DLF). For clusters of other versions, the required settings are configured by default.

        SET iceberg.catalog=dlf_catalog;
        SET iceberg.catalog.dlf_catalog.type=custom;
        SET iceberg.catalog.dlf_catalog.io-impl=org.apache.iceberg.hadoop.HadoopFileIO;
        SET iceberg.catalog.dlf_catalog.catalog-impl=org.apache.iceberg.aliyun.dlf.DlfCatalog;
        SET iceberg.catalog.dlf_catalog.access.key.id=<yourAccessKeyId>;
        SET iceberg.catalog.dlf_catalog.access.key.secret=<yourAccessKeySecret>;
        SET iceberg.catalog.dlf_catalog.warehouse=<yourOSSWarehousePath>
        SET iceberg.catalog.dlf_catalog.dlf.catalog-id=<yourCatalogId>;
        SET iceberg.catalog.dlf_catalog.dlf.endpoint=<yourDLFEndpoint>;
        SET iceberg.catalog.dlf_catalog.dlf.region-id=<yourDLFRegionId>;
        Note

        In the example, dlf_catalog is the name of the catalog that you created. Object Storage Service (OSS) is used as the data warehouse. For more information about other parameters, see Configuration of DLF metadata.

      2. Create an Iceberg table.

        • For clusters of EMR V3.39.0 or a later minor version and clusters of EMR V5.5.0 or a later minor version, create an internal Iceberg table.

          create database iceberg_db;
          use iceberg_db;
          CREATE TABLE hive_iceberg (
              id BIGINT,
              data STRING
          ) PARTITIONED BY (
              dt STRING
          ) STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
            LOCATION 'hdfs://master-1-1.c-****:9000/user/hive/warehouse/iceberg_db/hive_iceberg'
            TBLPROPERTIES ('iceberg.catalog'='dlf');
          insert into hive_iceberg values(1,"abc","20230407"),(2,"hello", "20230407");
          select * from hive_iceberg;
          Note

          LOCATION 'hdfs://master-1-1.c-****:9000/user/hive/warehouse/iceberg_db/hive_iceberg': the path where the data is stored. HDFS and OSS paths are supported. Change the path based on your business requirements.

        • For clusters of EMR V3.38.X and clusters of EMR V5.3.X to EMR V5.4.X, create an external Iceberg table to map the iceberg_db.sample table.

          CREATE EXTERNAL TABLE iceberg_db.sample_tbl
          STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
          LOCATION 'oss://some_bucket/some_path/sample'
          TBLPROPERTIES (
            'iceberg.catalog'='<yourCatalogName>',
            'name'='iceberg_db.sample'
          );
          Note

          Parameter settings:

          • <yourCatalogName> specifies the name of the catalog, which varies based on the version of your cluster. You can view the catalog name that matches the version of your cluster in Configuration of DLF metadata.

          • iceberg_db.sample is the name of an existing Iceberg table. You can create the iceberg_db.sample table in advance. For more information, see Basic usage.

    • If you set the Metadata parameter to Built-in MySQL or Self-managed RDS when you create the cluster, perform the following steps to create a table:

      • If you use the default catalog, execute the following statement to create a table:

        CREATE TABLE iceberg_db.sample_tbl (
          id BIGINT,
          name STRING
        )
        STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';
      • If you use a custom catalog, perform the following steps:

        Note

        Iceberg supports various types of custom catalogs, such as Hive and Hadoop catalogs. In this example, a Hive catalog is used.

        1. Run the following command to allow Hive to access Iceberg data:

          SET iceberg.catalog.hive_catalog.type=hive;
        2. Create a Hive table.

          CREATE TABLE iceberg_db.sample_tbl (
            id bigint,
            name string
          )
          STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
          TBLPROPERTIES ('iceberg.catalog'='hive_catalog');
  4. Execute the following statement to insert data into the table that you created:

    INSERT INTO iceberg_db.sample_tbl VALUES (4, 'd'), (5, 'e'), (6, 'f');

    The return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask error message may appear when you insert data. This is because a fb303-related classpath cannot be found. To resolve this issue, run the following command in the Hive command line:

    • Non-DataLake clusters

      add jar /usr/lib/hive-current/lib/libfb303-0.9.3.jar
    • DataLake clusters

      add jar /opt/apps/HIVE/hive-current/lib/libfb303-0.9.3.jar
  5. Execute the following statement to view data in the table that you created:

    SELECT * FROM iceberg_db.sample_tbl;

Example 1

Use a Hive external table to read data from or write data to an existing Iceberg table. In this example, DLF is used to manage metadata.

  1. Create a Hadoop cluster of EMR V5.4.0. Set the Metadata parameter to DLF Unified Metadata when you create the cluster. For more information, see Create a cluster.

  2. Modify configuration items. For more information, see Step 1.

  3. Open the Hive CLI. For more information, see Step 2.

  4. Run the following commands to allow Hive to access the metadata of Iceberg tables in DLF. You must replace the settings of the parameters in the commands with actual values.

    SET iceberg.catalog=dlf_catalog;
    SET iceberg.catalog.dlf_catalog.type=custom;
    SET iceberg.catalog.dlf_catalog.io-impl=org.apache.iceberg.hadoop.HadoopFileIO;
    SET iceberg.catalog.dlf_catalog.catalog-impl=org.apache.iceberg.aliyun.dlf.DlfCatalog;
    SET iceberg.catalog.dlf_catalog.access.key.id=<yourAccessKeyId>;
    SET iceberg.catalog.dlf_catalog.access.key.secret=<yourAccessKeySecret>;
    SET iceberg.catalog.dlf_catalog.warehouse=<yourOSSWarehousePath>
    SET iceberg.catalog.dlf_catalog.dlf.catalog-id=<yourCatalogId>;
    SET iceberg.catalog.dlf_catalog.dlf.endpoint=<yourDLFEndpoint>;
    SET iceberg.catalog.dlf_catalog.dlf.region-id=<yourDLFRegionId>;
    Note

    In the example, dlf_catalog is the default name of the catalog. OSS is used as the data warehouse. For more information about other parameters, see Configuration of DLF metadata.

  5. Execute the following statement to create a Hive external table named iceberg_db.sample_ext and map the table to the existing iceberg_db.sample table:

    CREATE EXTERNAL TABLE iceberg_db.sample_ext
    STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler'
    LOCATION 'oss://mybucket/iceberg-test/warehouse/iceberg_db.db/sample'
    TBLPROPERTIES (
        'iceberg.catalog'='dlf_catalog',
         'name'='iceberg_db.sample'
       );
  6. Execute the following statement to query data from the iceberg_db.sample_ext table:

    SELECT * FROM iceberg_db.sample_ext;

    The following output is returned:

    OK
    1 a
    2 b
    3 c
    Time taken: 19.075 seconds, Fetched: 3 row(s)
  7. Execute the following statement to insert data into the iceberg_db.sample_ext table:

    INSERT INTO iceberg_db.sample_ext VALUES (4, 'd'), (5, 'e'), (6, 'f');
  8. Execute the following statement to query data from the iceberg_db.sample_ext table:

    SELECT * FROM iceberg_db.sample_ext;

    The following output is returned:

    OK
    1    a
    2    b
    3    c
    4    d
    5    e
    6    f
    Time taken: 18.908 seconds, Fetched: 6 row(s)

Example 2

Create a Hive internal table in the Iceberg format and read data from or write data to the table. In this example, the default metadata of Hive is used.

  1. Create a Hadoop cluster of EMR V5.4.0. Set the Metadata parameter to Built-in MySQL when you create the cluster. For more information, see Create a cluster.

  2. Modify configuration items. For more information, see Step 1.

  3. Open the Hive CLI. For more information, see Step 2.

  4. Execute the following statement to create a database named iceberg_db:

    CREATE DATABASE IF NOT EXISTS iceberg_db;
  5. Execute the following statement to create a Hive internal table named iceberg_db.sample_tbl:

    CREATE TABLE iceberg_db.sample_tbl (
      id BIGINT,
      name STRING
    )
    STORED BY 'org.apache.iceberg.mr.hive.HiveIcebergStorageHandler';
  6. Execute the following statement to insert data into the iceberg_db.sample_tbl table:

    INSERT INTO iceberg_db.sample_tbl VALUES (1, 'a'), (2, 'b'), (3, 'c');
  7. Execute the following statement to query data from the iceberg_db.sample_tbl table:

    SELECT * FROM iceberg_db.sample_tbl;

    The following output is returned:

    OK
    1    a
    2    b
    3    c
    Time taken: 0.233 seconds, Fetched: 3 row(s)

References

  • For more information about Iceberg, see Iceberg.

  • For more information about the parameters that you must configure when you use DLF to manage the metadata of an Iceberg table, see Configuration of DLF metadata.