E-MapReduce (EMR) lets you query Paimon tables directly from Hive using two approaches: through a Hive catalog (where Spark writes metadata to Hive Metastore) or through a Hive external table mapped to a Paimon table path in OSS.
Prerequisites
Before you begin, ensure that you have:
-
A DataLake or custom EMR cluster with both Hive and Paimon services installed
-
EMR V3.46.0 or a later minor version, or EMR V5.12.0 or a later minor version
For instructions on creating a cluster, see Create a cluster.
Query via Hive catalog or DLF catalog
Use Spark to write Paimon data with a Hive catalog, then query the data from Hive. If you selected DLF Unified Metadata for Metadata when creating your EMR cluster, you can use a DLF catalog instead — it synchronizes metadata to Data Lake Formation (DLF) and makes it queryable from Hive.
Step 1: Start Spark SQL with a Hive catalog
spark-sql \
--conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
--conf spark.sql.catalog.paimon.metastore=hive \
--conf spark.sql.catalog.paimon.uri=thrift://master-1-1:9083 \
--conf spark.sql.catalog.paimon.warehouse=oss://<yourBucketName>/warehouse
| Parameter | Description |
|---|---|
spark.sql.catalog.paimon |
Defines a Spark catalog named paimon, backed by org.apache.paimon.spark.SparkCatalog |
spark.sql.catalog.paimon.metastore |
Set to hive to synchronize metadata to Hive Metastore |
spark.sql.catalog.paimon.uri |
Address and port of Hive Metastore. The value thrift://master-1-1:9083 connects to Hive Metastore running on the master-1-1 node at port 9083 |
spark.sql.catalog.paimon.warehouse |
Physical location of the data warehouse in OSS. Replace <yourBucketName> with your bucket name |
Step 2: Create a Paimon table and write data
-- Switch to the paimon catalog.
USE paimon;
-- Create a test database and switch to it.
CREATE DATABASE test_db;
USE test_db;
-- Create a Paimon table with a primary key.
CREATE TABLE test_tbl (
uuid int,
name string,
price double
) TBLPROPERTIES (
'primary-key' = 'uuid'
);
-- Write data to the table.
INSERT INTO test_tbl VALUES (1, 'apple', 3.5), (2, 'banana', 4.0), (3, 'cherry', 20.5);
Step 3: Start the Hive CLI
hive
Step 4: Query the Paimon table
select * from test_db.test_tbl;
Query via an external table
Map a Hive external table to an existing Paimon table path in OSS, then query it directly.
CREATE EXTERNAL TABLE test_ext_tbl
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
LOCATION 'oss://<yourBucketName>/warehouse/test_db.db/test_tbl';
SELECT * FROM test_ext_tbl;
Replace <yourBucketName> with your OSS bucket name.