All Products
Search
Document Center

E-MapReduce:Integrate Paimon with Hive

Last Updated:Apr 17, 2026

E-MapReduce lets you query Paimon data from Hive. This topic provides examples.

Prerequisites

You must have a DataLake or custom cluster with both Hive and Paimon selected. For more information, see Create a cluster.

Limitations

Querying Paimon data from Hive is supported only on E-MapReduce (EMR) clusters running EMR-3.46.0 or later and EMR-5.12.0 or later.

Procedure

  1. Connect to the cluster's master node over SSH. For more information, see Log on to the master node of a cluster.

  2. Use Spark to write data to Paimon.

    When services synchronize metadata to the Hive Metastore through a Hive catalog, you can directly query those tables in Hive. Alternatively, if you selected Metadata for the DLF Unified Metadata option during cluster creation, services can also synchronize metadata to a DLF catalog, which Hive can then query.

    This example demonstrates how to write data to the Hive catalog using Spark.

    1. Start Spark SQL.

      Use the spark-sql command-line tool to connect to a Paimon catalog and specify the Hive Metastore endpoint and data warehouse path.

      spark-sql \
        --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \
        --conf spark.sql.catalog.paimon.metastore=hive \
        --conf spark.sql.catalog.paimon.uri=thrift://master-1-1:9083 \
        --conf spark.sql.catalog.paimon.warehouse=oss://<yourBucketName>/warehouse
      Note
      • spark.sql.catalog.paimon: Defines a catalog named paimon.

      • spark.sql.catalog.paimon.metastore: Specifies the metastore type for the catalog. Set this to hive to synchronize metadata with a Hive Metastore.

      • spark.sql.catalog.paimon.uri: The address and port of the Hive Metastore service. A value of thrift://master-1-1:9083 means Spark SQL connects to the Hive Metastore service running on the master-1-1 host and listening on port 9083 to retrieve metadata.

      • spark.sql.catalog.paimon.warehouse: Configures the root path for the data warehouse. Replace the value based on your environment. <yourBucketName> is your OSS bucket name. For more information about how to create a bucket, see Create a bucket.

    2. Create a database and a table.

      Create a test database named test_db and a table named test_tbl in Paimon, and then insert data.

      -- Switch to the paimon catalog.
      USE paimon;
      
      -- Create and use a test database in the paimon catalog.
      CREATE DATABASE test_db;
      USE test_db;
      
      -- Create a Paimon table.
      CREATE TABLE test_tbl (
          uuid int,
          name string,
          price double
      ) TBLPROPERTIES (
          'primary-key' = 'uuid'
      );
      
      -- Write data to the Paimon table.
      INSERT INTO test_tbl VALUES (1, 'apple', 3.5), (2, 'banana', 4.0), (3, 'cherry', 20.5);
  3. Query Paimon data from Hive.

    1. Connect to Hive.

    2. Run the following HiveQL statement to query the newly inserted data:

      select * from test_db.test_tbl;

      The statement returns the following output:

      OK
      1       apple   3.5
      2       banana  4.0
      3       cherry  20.5
  4. Query Paimon data through a Hive external table.

    You can also map a Paimon table from a specified path as a Hive external table and query its data directly.

    1. Connect to Hive.

    2. In Hive, run the following command to create the external table:

      CREATE EXTERNAL TABLE test_ext_tbl
      STORED BY 'org.apache.paimon.hive.PaimonStorageHandler'
      LOCATION 'oss:// <yourBucketName>/warehouse/test_db.db/test_tbl';

      Parameters

      • STORED BY 'org.apache.paimon.hive.PaimonStorageHandler': Specifies the Paimon storage handler.

      • LOCATION: Specifies the storage path of the Paimon table.

        Replace oss://<yourBucketName>/warehouse/test_db.db/test_tbl with the actual storage path of your Paimon table.

    3. Run the following SQL statement to query the external table:

      SELECT * FROM test_ext_tbl;