E-MapReduce lets you query Paimon data from Hive. This topic provides examples.
Prerequisites
You must have a DataLake or custom cluster with both Hive and Paimon selected. For more information, see Create a cluster.
Limitations
Querying Paimon data from Hive is supported only on E-MapReduce (EMR) clusters running EMR-3.46.0 or later and EMR-5.12.0 or later.
Procedure
-
Connect to the cluster's master node over SSH. For more information, see Log on to the master node of a cluster.
-
Use Spark to write data to Paimon.
When services synchronize metadata to the Hive Metastore through a Hive catalog, you can directly query those tables in Hive. Alternatively, if you selected Metadata for the DLF Unified Metadata option during cluster creation, services can also synchronize metadata to a DLF catalog, which Hive can then query.
This example demonstrates how to write data to the Hive catalog using Spark.
-
Start Spark SQL.
Use the
spark-sqlcommand-line tool to connect to a Paimon catalog and specify the Hive Metastore endpoint and data warehouse path.spark-sql \ --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog \ --conf spark.sql.catalog.paimon.metastore=hive \ --conf spark.sql.catalog.paimon.uri=thrift://master-1-1:9083 \ --conf spark.sql.catalog.paimon.warehouse=oss://<yourBucketName>/warehouseNote-
spark.sql.catalog.paimon: Defines a catalog named paimon. -
spark.sql.catalog.paimon.metastore: Specifies the metastore type for the catalog. Set this tohiveto synchronize metadata with a Hive Metastore. -
spark.sql.catalog.paimon.uri: The address and port of the Hive Metastore service. A value ofthrift://master-1-1:9083means Spark SQL connects to the Hive Metastore service running on themaster-1-1host and listening on port 9083 to retrieve metadata. -
spark.sql.catalog.paimon.warehouse: Configures the root path for the data warehouse. Replace the value based on your environment.<yourBucketName>is your OSS bucket name. For more information about how to create a bucket, see Create a bucket.
-
-
Create a database and a table.
Create a test database named
test_dband a table namedtest_tblin Paimon, and then insert data.-- Switch to the paimon catalog. USE paimon; -- Create and use a test database in the paimon catalog. CREATE DATABASE test_db; USE test_db; -- Create a Paimon table. CREATE TABLE test_tbl ( uuid int, name string, price double ) TBLPROPERTIES ( 'primary-key' = 'uuid' ); -- Write data to the Paimon table. INSERT INTO test_tbl VALUES (1, 'apple', 3.5), (2, 'banana', 4.0), (3, 'cherry', 20.5);
-
-
Query Paimon data from Hive.
-
Run the following HiveQL statement to query the newly inserted data:
select * from test_db.test_tbl;The statement returns the following output:
OK 1 apple 3.5 2 banana 4.0 3 cherry 20.5
-
Query Paimon data through a Hive external table.
You can also map a Paimon table from a specified path as a Hive external table and query its data directly.
-
In Hive, run the following command to create the external table:
CREATE EXTERNAL TABLE test_ext_tbl STORED BY 'org.apache.paimon.hive.PaimonStorageHandler' LOCATION 'oss:// <yourBucketName>/warehouse/test_db.db/test_tbl';Parameters
-
STORED BY 'org.apache.paimon.hive.PaimonStorageHandler': Specifies the Paimon storage handler. -
LOCATION: Specifies the storage path of the Paimon table.Replace
oss://<yourBucketName>/warehouse/test_db.db/test_tblwith the actual storage path of your Paimon table.
-
-
Run the following SQL statement to query the external table:
SELECT * FROM test_ext_tbl;