All Products
Search
Document Center

E-MapReduce:Manage Kyuubi compute engines

Last Updated:Mar 26, 2026

Kyuubi in an E-MapReduce (EMR) cluster runs Spark 3.x on YARN. Each Spark 3.x engine maps to one Spark application on YARN. Flink, Trino, and Spark 2.x are not supported.

Prerequisites

Before you begin, ensure that you have:

  • YARN and Spark 3.x installed in the EMR cluster

  • All users authenticated via Lightweight Directory Access Protocol (LDAP) or Kerberos

Share levels

The share level determines how many users share a single Kyuubi engine. Set kyuubi.engine.share.level on the kyuubi-defaults.conf tab of the Kyuubi service page in the EMR console.

Share level Engine scope Isolation degree Sharability Use case
CONNECTION One engine per session High Low Large-scale ETL, ad hoc queries
USER One engine per user Medium Medium
GROUP One engine per resource group Low High
SERVER One engine per cluster Highest (high-security cluster) / Lowest (standard cluster) High-security clusters: administrators only Administrators

Submit jobs to Kyuubi engines

The Kyuubi server starts and stops engines automatically. When a new user connects via kyuubi-beeline for the first time, the server launches a new Spark 3.x engine — no manual start is required.

The following examples use the USER share level. All users have passed LDAP or Kerberos authentication.

Submit a job as a new user

When user1 connects for the first time, the Kyuubi server automatically starts a new Spark 3.x engine:

kyuubi-beeline -n user1 \
  -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000" \
  -f query1.sql

Configure Spark executor resources

Two methods are available to configure the resources used by Spark 3.x engines.

Method 1: Set resources in the JDBC URL

Pass Spark parameters directly in the connection URL:

# Set user config via JDBC connection URL
kyuubi-beeline -n user2 \
  -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000?spark.dynamicAllocation.enabled=false;spark.executor.cores=2;spark.executor.memory=4g;spark.executor.instances=4" \
  -f query1.sql

Method 2: Set per-user defaults in kyuubi-defaults.conf

Add user-specific entries in the format ___username___.spark.param=value on the kyuubi-defaults.conf tab:

# Set user default config in kyuubi-defaults.conf
# ___user2___.spark.dynamicAllocation.enabled=false
# ___user2___.spark.executor.memory=5g
# ___user2___.spark.executor.cores=2
# ___user2___.spark.executor.instances=10

kyuubi-beeline -n user2 \
  -u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000" \
  -f query1.sql

Reuse a running engine

After a job completes, the Spark 3.x engine stays alive for a period before shutting down. Submitting another job within that window reuses the existing engine instead of launching a new YARN application, which reduces job startup time.

The idle timeout is controlled by kyuubi.session.engine.idle.timeout (default: PT30M, 30 minutes). To change the timeout, update this parameter on the kyuubi-defaults.conf tab.

Submit jobs to different engines from the same user

To run workloads in separate engines for different business lines, use kyuubi.engine.share.level.subdomain in the JDBC URL:

kyuubi-beeline -n user4 \
  -u "jdbc:hive2://master-1-1:10009/biz1?kyuubi.engine.share.level.subdomain=biz1" \
  -f query1.sql

kyuubi-beeline -n user4 \
  -u "jdbc:hive2://master-1-1:10009/biz2?kyuubi.engine.share.level.subdomain=biz2" \
  -f query2.sql

kyuubi-beeline -n user4 \
  -u "jdbc:hive2://master-1-1:10009/biz3?kyuubi.engine.share.level.subdomain=biz3" \
  -f query3.sql

Each subdomain maps to a distinct Spark engine, so biz1, biz2, and biz3 run in isolation.

Share one engine across multiple sessions

Multiple sessions from the same user share a single Spark 3.x engine. For example, if user1 submits two jobs from separate terminals simultaneously, both jobs run on the same engine:

# Terminal 1
kyuubi-beeline -n user1 \
  -u "jdbc:hive2://master-1-1:10009/biz1" \
  -f query1.sql

# Terminal 2
kyuubi-beeline -n user1 \
  -u "jdbc:hive2://master-1-1:10009/biz2" \
  -f query2.sql

Executor resources are allocated according to Spark's default scheduling rules.

What's next