Apache Hudi is a data lake framework that allows you to update and delete data in Hadoop compatible file systems. Hudi also allows you to consume changed data. For more information, see Apache Hudi. This topic describes how to read data from and write data to a Hudi table in EMR Serverless Spark.
Prerequisites
A workspace is created. For more information, see Create a workspace.
Procedure
Step 1: Create an SQL session
Go to the Sessions page.
Log on to the EMR console.
In the left-side navigation pane, choose .
On the Spark page, click the name of the workspace that you want to manage.
In the left-side navigation pane of the EMR Serverless Spark page, choose Operation Center > Sessions.
On the SQL Sessions tab, click Create SQL Session.
In the Spark Configuration section of the Create SQL Session page, add the following code and click Create. For more information, see Manage SQL sessions.
In the following code, the default catalog of the workspace is used. For information about how to use an external Hive Metastore as a catalog, see Use EMR Serverless Spark to connect to an external Hive Metastore.
spark.sql.extensions org.apache.spark.sql.hudi.HoodieSparkSessionExtension spark.sql.catalog.spark_catalog org.apache.spark.sql.hudi.catalog.HoodieCatalog spark.serializer org.apache.spark.serializer.KryoSerializer
Step 2: Read data from and write data to a Hudi table
Go to the data development page of EMR Serverless Spark.
In the left-side navigation pane of the EMR Serverless Spark page, click Data Development.
On the Development tab, click the
icon. In the Create dialog box, set the Name parameter to users_task, use the default value SparkSQL for the Type parameter, and then click OK.
On the users_task tab, copy the following code to the code editor:
CREATE DATABASE IF NOT EXISTS ss_hudi_db; CREATE TABLE ss_hudi_db.hudi_tbl (id INT, name STRING) USING hudi TBLPROPERTIES ( type = 'cow', primaryKey = 'id' ); INSERT INTO ss_hudi_db.hudi_tbl VALUES (1, "a"), (2, "b"); SELECT id, name FROM ss_hudi_db.hudi_tbl ORDER BY id; DROP TABLE ss_hudi_db.hudi_tbl; DROP DATABASE ss_hudi_db;Select a database from the Default Database drop-down list and the created SQL session from the SQL Sessions drop-down list.
Click Run. The following figure shows the output.

References
For information about how to develop and orchestrate SQL jobs, see Get started with the development of Spark SQL jobs.
For more information about Hudi, see Apache Hudi.