All Products
Search
Document Center

E-MapReduce:Integrate Paimon with Trino

Last Updated:Jun 26, 2023

E-MapReduce (EMR) allows you to query data of Paimon in Trino. This topic describes how to query data of Paimon in Trino.

Limits

Only clusters of EMR V3.46.0 or a later minor version, or EMR V5.12.0 or a later minor version allow you to query data of Paimon in Trino.

Procedure

  1. Modify the warehouse parameter.

    Paimon stores data and metadata in a file system such as Hadoop Distributed File System (HDFS) or an object storage system such as Object Storage Service (OSS). The root path for storage is specified by the warehouse parameter.

    1. Go to the Configure tab of the Trino service page.

      1. Log on to the EMR on ECS console.

      2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.

      3. On the EMR on ECS page, find the desired cluster and click Services in the Actions column.

      4. On the Services tab, find the Trino service and click Configure.

    2. On the Configure tab, click the paimon.properties tab.

    3. Change the value of the warehouse parameter to the root path for storage.

    4. Save the configuration.

      1. Click Save.

      2. In the dialog box that appears, configure the Execution Reason parameter and click Save.

  2. Optional. Modify the metastore parameter.

    The type of Metastore that is used by Trino is automatically specified based on the services that you selected when you create the cluster. If you want to change the type of Metastore that is used by Trino, you can change the value of the metastore parameter on the paimon.properties tab of the Configure tab on the Trino service page.

    Valid values of the metastore parameter for Paimon:

    • filesystem: Metadata is stored in a file system or an object storage system.

    • hive: Metadata is synchronized to the specified Hive Metastore.

    • dlf: Metadata is synchronized to Data Lake Formation (DLF).

  3. Restart the Trino service.

    1. In the upper-right corner of the Configure tab on the Trino service page, choose More > Restart.

    2. In the dialog box that appears, configure the Execution Reason parameter and click OK.

    3. In the Confirm message, click OK.

  4. Query data of Paimon.

    The following example shows how to use Spark to write data to a file system catalog and query data of Paimon in Trino.

    1. Run the following command to start Spark SQL:

      spark-sql --conf spark.sql.catalog.paimon=org.apache.paimon.spark.SparkCatalog --conf spark.sql.catalog.paimon.metastore=filesystem --conf spark.sql.catalog.paimon.warehouse=oss://<yourBucketName>/warehouse
    2. Execute the following Spark SQL statements to create a Paimon table in the created catalog and write data to the table:

      -- Create a test database in the created catalog and use the database. 
      CREATE DATABASE paimon.test_db;
      USE paimon.test_db;
      
      -- Create a Paimon table. 
      CREATE TABLE test_tbl (
          uuid int,
          name string,
          price double
      ) TBLPROPERTIES (
          'primary-key' = 'uuid'
      );
      
      -- Write data to the Paimon table. 
      INSERT INTO test_tbl VALUES (1, 'apple', 3.5), (2, 'banana', 4.0), (3, 'cherry', 20.5);
    3. Run the following command to start Trino:

      trino --server master-1-1:9090 --catalog paimon --schema default --user hadoop
    4. Execute the following statements to query the data that is written to the Paimon table:

      USE test_db;
      
      SELECT * FROM test_tbl;