All Products
Search
Document Center

E-MapReduce:DDL statements

Last Updated:Jul 18, 2025

This topic describes the table creation statements that are supported after Hudi is integrated with Spark SQL.

Background information

When you use Spark SQL to create a Hudi table, you can configure parameters in options to configure the table. The following table describes the parameters.

Important

In a version later than Hudi 0.10, options is replaced with tblproperties.

Parameter

Description

Required

primaryKey

The primary key column. If you use multiple fields as primary key columns, separate the fields with commas (,).

Yes.

type

The type of the table. Valid values:

  • cow: the Copy-On-Write type. This is the default value.

  • mor: the Merge-On-Read type.

No.

preCombineField

The version field.

This parameter corresponds to the DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY field in the Hudi table.

We recommend that you configure this parameter. Otherwise, the upsert operation is not supported.

payloadClass

Default value: DefaultHoodieRecordPayload.

This parameter corresponds to the DataSourceWriteOptions.PAYLOAD_CLASS_OPT_KEY field in the Hudi table.

No.

Prerequisites

An E-MapReduce (EMR) cluster that contains the Spark and Hudi services is created. For more information, see Create a cluster.

limits

Only clusters of EMR V3.36.0 or a later minor version and clusters of EMR V5.2.0 or a later minor version allow you to use Spark SQL to read data from or write data to Hudi.

Methods to start Spark SQL

  • Spark 2 or Spark 3 and Hudi of a version earilier than 0.11
    spark-sql \
    --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
    --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension'
  • Spark 3 and Hudi 0.11 or later
    spark-sql \
    --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' \
    --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' \
    --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'

Create a non-partitioned table

Examples:

  • Create a non-partitioned table that is of the cow type and uses id as the primary key column.

    create table if not exists h0(
      id bigint, 
      name string, 
      price double
    ) using hudi
    options (
      type = 'cow',
      primaryKey = 'id'
    );
  • Create a non-partitioned table that is of the mor type and uses id and name as the primary key columns.

    create table if not exists h0(
      id bigint, 
      name string, 
      price double
    ) using hudi
    options (
      type = 'mor',
      primaryKey = 'id,name'
    );
  • Create a non-partitioned table of the cow type.

    create table if not exists h0(
      id bigint, 
      name string, 
      price double
    ) using hudi
    options (
      type = 'cow'
    );

Create a partitioned table

Example:

create table if not exists h_p0 (
id bigint,
name string,
dt string, 
hh string
) using hudi
location 'oss://xxx/h_p0'
options (
  type = 'cow',
  primaryKey = 'id',
  preCombineField = 'id'
 )
partitioned by (dt, hh);
Note

In the command, the location parameter specifies the storage path of the table. The path can be an Object Storage Service (OSS) storage path or a Hadoop Distributed File System (HDFS) storage path. The primary key column is id, the partition fields are dt and hh, and the version field is id.

Create an external table

You can create an external table based on an existing Hudi table. Example:

create table h0
using hudi
location '/xx/xx/h0';

CTAS statement

This section provides examples on how to use the CREATE TABLE AS SELECT (CTAS) statement.

  • Example 1:

    create table if not exists h1 using hudi 
    as 
    select 1 as id, 'a1' as name, 10 as price;
  • Example 2:

    create table if not exists h2 using hudi
    partitioned by (dt)
    location '/xx/xx/h2'
    options (
      type = 'mor',
      primaryKey = 'id,name'
    ) as
    select 1 as id, 'a1' as name, 20 as price, '2021-01-03' as dt;