Configure Kyuubi Engines to Run Spark Jobs on EMR - E-MapReduce

Kyuubi in an E-MapReduce (EMR) cluster runs Spark 3.x on YARN. Each Spark 3.x engine maps to one Spark application on YARN. Flink, Trino, and Spark 2.x are not supported.

Prerequisites

Before you begin, ensure that you have:

YARN and Spark 3.x installed in the EMR cluster
All users authenticated via Lightweight Directory Access Protocol (LDAP) or Kerberos

Share levels

The share level determines how many users share a single Kyuubi engine. Set kyuubi.engine.share.level on the kyuubi-defaults.conf tab of the Kyuubi service page in the EMR console.

Share level	Engine scope	Isolation degree	Sharability	Use case
CONNECTION	One engine per session	High	Low	Large-scale ETL, ad hoc queries
USER	One engine per user	Medium	Medium	—
GROUP	One engine per resource group	Low	High	—
SERVER	One engine per cluster	Highest (high-security cluster) / Lowest (standard cluster)	High-security clusters: administrators only	Administrators

Submit jobs to Kyuubi engines

The Kyuubi server starts and stops engines automatically. When a new user connects via kyuubi-beeline for the first time, the server launches a new Spark 3.x engine — no manual start is required.

The following examples use the USER share level. All users have passed LDAP or Kerberos authentication.

Submit a job as a new user

When user1 connects for the first time, the Kyuubi server automatically starts a new Spark 3.x engine:

kyuubi-beeline -n user1 \
  -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000" \
  -f query1.sql

Configure Spark executor resources

Two methods are available to configure the resources used by Spark 3.x engines.

Method 1: Set resources in the JDBC URL

Pass Spark parameters directly in the connection URL:

# Set user config via JDBC connection URL
kyuubi-beeline -n user2 \
  -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000?spark.dynamicAllocation.enabled=false;spark.executor.cores=2;spark.executor.memory=4g;spark.executor.instances=4" \
  -f query1.sql

Method 2: Set per-user defaults in kyuubi-defaults.conf

Add user-specific entries in the format ___username___.spark.param=value on the kyuubi-defaults.conf tab:

# Set user default config in kyuubi-defaults.conf
# ___user2___.spark.dynamicAllocation.enabled=false
# ___user2___.spark.executor.memory=5g
# ___user2___.spark.executor.cores=2
# ___user2___.spark.executor.instances=10

kyuubi-beeline -n user2 \
  -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000" \
  -f query1.sql

Reuse a running engine

After a job completes, the Spark 3.x engine stays alive for a period before shutting down. Submitting another job within that window reuses the existing engine instead of launching a new YARN application, which reduces job startup time.

The idle timeout is controlled by kyuubi.session.engine.idle.timeout (default: PT30M, 30 minutes). To change the timeout, update this parameter on the kyuubi-defaults.conf tab.

Submit jobs to different engines from the same user

To run workloads in separate engines for different business lines, use kyuubi.engine.share.level.subdomain in the JDBC URL:

kyuubi-beeline -n user4 \
  -u "jdbc:hive2://master-1-1:10009/biz1?kyuubi.engine.share.level.subdomain=biz1" \
  -f query1.sql

kyuubi-beeline -n user4 \
  -u "jdbc:hive2://master-1-1:10009/biz2?kyuubi.engine.share.level.subdomain=biz2" \
  -f query2.sql

kyuubi-beeline -n user4 \
  -u "jdbc:hive2://master-1-1:10009/biz3?kyuubi.engine.share.level.subdomain=biz3" \
  -f query3.sql

Each subdomain maps to a distinct Spark engine, so biz1, biz2, and biz3 run in isolation.

Share one engine across multiple sessions

Multiple sessions from the same user share a single Spark 3.x engine. For example, if user1 submits two jobs from separate terminals simultaneously, both jobs run on the same engine:

# Terminal 1
kyuubi-beeline -n user1 \
  -u "jdbc:hive2://master-1-1:10009/biz1" \
  -f query1.sql

# Terminal 2
kyuubi-beeline -n user1 \
  -u "jdbc:hive2://master-1-1:10009/biz2" \
  -f query2.sql

Executor resources are allocated according to Spark's default scheduling rules.

E-MapReduce:Manage Kyuubi compute engines