Kyuubi in an E-MapReduce (EMR) cluster runs Spark 3.x on YARN. Each Spark 3.x engine maps to one Spark application on YARN. Flink, Trino, and Spark 2.x are not supported.
Prerequisites
Before you begin, ensure that you have:
-
YARN and Spark 3.x installed in the EMR cluster
-
All users authenticated via Lightweight Directory Access Protocol (LDAP) or Kerberos
Share levels
The share level determines how many users share a single Kyuubi engine. Set kyuubi.engine.share.level on the kyuubi-defaults.conf tab of the Kyuubi service page in the EMR console.
| Share level | Engine scope | Isolation degree | Sharability | Use case |
|---|---|---|---|---|
| CONNECTION | One engine per session | High | Low | Large-scale ETL, ad hoc queries |
| USER | One engine per user | Medium | Medium | — |
| GROUP | One engine per resource group | Low | High | — |
| SERVER | One engine per cluster | Highest (high-security cluster) / Lowest (standard cluster) | High-security clusters: administrators only | Administrators |
Submit jobs to Kyuubi engines
The Kyuubi server starts and stops engines automatically. When a new user connects via kyuubi-beeline for the first time, the server launches a new Spark 3.x engine — no manual start is required.
The following examples use the USER share level. All users have passed LDAP or Kerberos authentication.
Submit a job as a new user
When user1 connects for the first time, the Kyuubi server automatically starts a new Spark 3.x engine:
kyuubi-beeline -n user1 \
-u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000" \
-f query1.sql
Configure Spark executor resources
Two methods are available to configure the resources used by Spark 3.x engines.
Method 1: Set resources in the JDBC URL
Pass Spark parameters directly in the connection URL:
# Set user config via JDBC connection URL
kyuubi-beeline -n user2 \
-u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000?spark.dynamicAllocation.enabled=false;spark.executor.cores=2;spark.executor.memory=4g;spark.executor.instances=4" \
-f query1.sql
Method 2: Set per-user defaults in kyuubi-defaults.conf
Add user-specific entries in the format ___username___.spark.param=value on the kyuubi-defaults.conf tab:
# Set user default config in kyuubi-defaults.conf
# ___user2___.spark.dynamicAllocation.enabled=false
# ___user2___.spark.executor.memory=5g
# ___user2___.spark.executor.cores=2
# ___user2___.spark.executor.instances=10
kyuubi-beeline -n user2 \
-u "jdbc:hive2://master-1-1:10009/tpcds_parquet_1000" \
-f query1.sql
Reuse a running engine
After a job completes, the Spark 3.x engine stays alive for a period before shutting down. Submitting another job within that window reuses the existing engine instead of launching a new YARN application, which reduces job startup time.
The idle timeout is controlled by kyuubi.session.engine.idle.timeout (default: PT30M, 30 minutes). To change the timeout, update this parameter on the kyuubi-defaults.conf tab.
Submit jobs to different engines from the same user
To run workloads in separate engines for different business lines, use kyuubi.engine.share.level.subdomain in the JDBC URL:
kyuubi-beeline -n user4 \
-u "jdbc:hive2://master-1-1:10009/biz1?kyuubi.engine.share.level.subdomain=biz1" \
-f query1.sql
kyuubi-beeline -n user4 \
-u "jdbc:hive2://master-1-1:10009/biz2?kyuubi.engine.share.level.subdomain=biz2" \
-f query2.sql
kyuubi-beeline -n user4 \
-u "jdbc:hive2://master-1-1:10009/biz3?kyuubi.engine.share.level.subdomain=biz3" \
-f query3.sql
Each subdomain maps to a distinct Spark engine, so biz1, biz2, and biz3 run in isolation.
Share one engine across multiple sessions
Multiple sessions from the same user share a single Spark 3.x engine. For example, if user1 submits two jobs from separate terminals simultaneously, both jobs run on the same engine:
# Terminal 1
kyuubi-beeline -n user1 \
-u "jdbc:hive2://master-1-1:10009/biz1" \
-f query1.sql
# Terminal 2
kyuubi-beeline -n user1 \
-u "jdbc:hive2://master-1-1:10009/biz2" \
-f query2.sql
Executor resources are allocated according to Spark's default scheduling rules.