All Products
Search
Document Center

E-MapReduce:Integrate Paimon with Hive

Last Updated:Mar 26, 2026

E-MapReduce (EMR) lets you query Paimon tables directly from Hive using two approaches: through a Hive catalog (where Spark writes metadata to Hive Metastore) or through a Hive external table mapped to a Paimon table path in OSS.

Prerequisites

Before you begin, ensure that you have:

  • A DataLake or custom EMR cluster with both Hive and Paimon services installed

  • EMR V3.46.0 or a later minor version, or EMR V5.12.0 or a later minor version

For instructions on creating a cluster, see Create a cluster.

Query via Hive catalog or DLF catalog

Use Spark to write Paimon data with a Hive catalog, then query the data from Hive. If you selected DLF Unified Metadata for Metadata when creating your EMR cluster, you can use a DLF catalog instead — it synchronizes metadata to Data Lake Formation (DLF) and makes it queryable from Hive.

Step 1: Start Spark SQL with a Hive catalog

spark-sql \
  --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
  --conf spark.sql.catalog.paimon.metastore=hive \
  --conf spark.sql.catalog.paimon.uri=thrift://master-1-1:9083 \
  --conf spark.sql.catalog.paimon.warehouse=oss://<yourBucketName>/warehouse
Parameter Description
spark.sql.catalog.paimon Defines a Spark catalog named paimon, backed by org.apache.paimon.spark.SparkCatalog
spark.sql.catalog.paimon.metastore Set to hive to synchronize metadata to Hive Metastore
spark.sql.catalog.paimon.uri Address and port of Hive Metastore. The value thrift://master-1-1:9083 connects to Hive Metastore running on the master-1-1 node at port 9083
spark.sql.catalog.paimon.warehouse Physical location of the data warehouse in OSS. Replace <yourBucketName> with your bucket name

Step 2: Create a Paimon table and write data

-- Switch to the paimon catalog.
USE paimon;

-- Create a test database and switch to it.
CREATE DATABASE test_db;
USE test_db;

-- Create a Paimon table with a primary key.
CREATE TABLE test_tbl (
    uuid int,
    name string,
    price double
) TBLPROPERTIES (
    'primary-key' = 'uuid'
);

-- Write data to the table.
INSERT INTO test_tbl VALUES (1, 'apple', 3.5), (2, 'banana', 4.0), (3, 'cherry', 20.5);

Step 3: Start the Hive CLI

hive

Step 4: Query the Paimon table

select * from test_db.test_tbl;

Query via an external table

Map a Hive external table to an existing Paimon table path in OSS, then query it directly.

CREATE EXTERNAL TABLE test_ext_tbl
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
LOCATION 'oss://<yourBucketName>/warehouse/test_db.db/test_tbl';

SELECT * FROM test_ext_tbl;

Replace <yourBucketName> with your OSS bucket name.