All Products
Search
Document Center

E-MapReduce:Manage Kyuubi Gateways

Last Updated:Mar 26, 2026

Kyuubi Gateway provides Java Database Connectivity (JDBC) and ODBC interfaces to connect EMR Serverless Spark with SQL query and business intelligence (BI) tools such as Tableau and Power BI. It supports multi-tenancy and resource isolation for enterprise workloads.

Create a Kyuubi Gateway

Prerequisites

Before you begin, make sure you have:

Steps

  1. Go to the Gateway page.

    1. Log on to the EMR console.

    2. In the left navigation pane, choose EMR Serverless > Spark.

    3. Click the name of the workspace you want to manage.

    4. In the left navigation pane of the workspace page, click O&M Center > Gateway.

  2. On the Kyuubi Gateway page, click Create Kyuubi Gateway.

  3. Configure the following parameters, then click Create

    A large number of Spark configuration items reduces instantaneous concurrency for Spark job submission.
    ParameterDescription
    NameThe gateway name. Use only lowercase letters, digits, and hyphens (-). The name must start and end with a letter or digit.
    Kyuubi Gateway ResourceThe compute resource allocated to the gateway. Default: 2 CPU, 8 GB. See the table below for specs and recommended maximum concurrent connections.
    Kyuubi VersionThe Kyuubi version for the gateway. If you use DLF (formerly DLF 2.5) as the Data Catalog, set this to 1.9.2-0.0.1 or later.
    Engine VersionThe Spark engine version. For details, see Engine versions.
    Associated QueueThe queue where the gateway is deployed. Spark jobs submitted through the gateway run under the gateway creator's identity.
    Authentication MethodToken-based authentication only. After creating the gateway, generate a token for it. See Manage tokens.
    Service High AvailabilityWhen enabled, deploys three or more Kyuubi servers for high availability. Configure Number Of Kyuubi Servers and Zookeeper cluster address (comma-separated endpoints, for example, zk1:2181,zk2:2181,zk3:2181). Make sure the network is reachable.
    Network ConnectionThe VPC network connection for accessing data sources or external services. See Network connectivity between EMR Serverless Spark and other VPCs.
    Endpoint(Public)Disabled by default. When enabled, the gateway is accessible via a public endpoint. Otherwise, access is through an internal endpoint.
    Kyuubi ConfigurationKyuubi configuration key-value pairs, space-separated (for example, kyuubi.engine.pool.size 1). See the supported configurations below.
    Spark ConfigurationSpark configuration key-value pairs, space-separated (for example, spark.sql.catalog.paimon.metastore dlf). All parameters are supported except spark.kubernetes.*.

    Resource specs and recommended concurrent connections

    SpecRecommended max concurrent connections
    1 CPU, 4 GB10
    2 CPU, 8 GB (default)20
    4 CPU, 16 GB30
    8 CPU, 32 GB45
    16 CPU, 64 GB85
    32 CPU, 128 GB135

    Supported Kyuubi configurations

    kyuubi.engine.pool.size
    kyuubi.engine.pool.size.threshold
    kyuubi.engine.share.level
    kyuubi.engine.single.spark.session
    kyuubi.session.engine.idle.timeout
    kyuubi.session.engine.initialize.timeout
    kyuubi.engine.security.token.max.lifetime
    kyuubi.session.engine.check.interval
    kyuubi.session.idle.timeout
    kyuubi.session.engine.request.timeout
    kyuubi.session.engine.login.timeout
    kyuubi.backend.engine.exec.pool.shutdown.timeout
    kyuubi.backend.server.exec.pool.shutdown.timeout
    kyuubi.backend.server.exec.pool.keepalive.time
    kyuubi.frontend.thrift.login.timeout
    kyuubi.operation.status.polling.timeout
    kyuubi.engine.pool.selectPolicy
    kyuubi.authentication
    kyuubi.kinit.principal
    kyuubi.kinit.keytab
    kyuubi.authentication.ldap.*
    kyuubi.hadoop.proxyuser.hive.hosts
    kyuubi.hadoop.proxyuser.hive.groups
    kyuubi.hadoop.proxyuser.kyuubi.hosts
    kyuubi.hadoop.proxyuser.kyuubi.groups
    kyuubi.ha.*
  4. On the Kyuubi Gateway page, find the gateway and click Start in the Actions column.

Manage tokens

Tokens are required for authentication when connecting to a Kyuubi Gateway.

  1. On the Kyuubi Gateway page, find the target gateway and click Tokens in the Actions column.

  2. Click Create Token.

  3. Configure the following parameters, then click OK.

    ParameterDescription
    NameThe token name.
    Expired AtThe token expiration time in days (minimum: 1). Enabled by default; expires after 365 days.
    Assigned ToThe Resource Access Management (RAM) user or RAM role to which the token is assigned. Required when using DLF (formerly DLF 2.5) as the default on the Catalog tab. Select from the drop-down list of users and roles added in Access control. The assigned identity is used to access DLF when submitting Spark jobs through the gateway. Make sure the RAM user or RAM role has DLF access permissions. See Add an authorization.
  4. Copy the token immediately after creation.

    Important

    The token value is only shown once and cannot be retrieved later. If the token expires or is lost, create a new token or reset the existing one.

Connect to a Kyuubi Gateway

Gather the following values before connecting. All connection methods use the same placeholders.

PlaceholderWhere to find it
<endpoint>Overview tab of the gateway
<port>443 for public endpoints; 80 for internal same-region endpoints
<token>Token Management page — copy immediately after creation
<tokenname>Token Management page
<UserName/RoleName>Access control — the RAM user or RAM role added to the gateway
If your gateway uses DLF (formerly DLF 2.5) as the default Data Catalog, you must include the user=<UserName/RoleName> parameter in connection strings. Each method below notes where to add it.

Connect using Beeline

Before connecting, make sure your Beeline version is compatible with the Kyuubi server version. If Beeline is not installed, see Getting Started - Apache Kyuubi.

The JDBC URL format is:

jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>

If you use DLF, add user=<UserName/RoleName> after transportMode=http;:

beeline -u "jdbc:hive2://<endpoint>:<port>/;transportMode=http;user=<UserName/RoleName>;httpPath=cliservice/token/<token>"

For other catalogs:

beeline -u "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>"

To set session parameters, append them after #:

beeline -u "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>;#spark.sql.shuffle.partitions=100;spark.executor.instances=2;"

Connect using Java

  1. Add the following dependencies to your pom.xml file. Replace version numbers as needed.

    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.0.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-jdbc</artifactId>
            <version>2.3.9</version>
        </dependency>
    </dependencies>
  2. Connect to the gateway with the following code. The JDBC URL format is: jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token> If you use DLF, append ;user=<UserName/RoleName> to the URL.

    import org.apache.hive.jdbc.HiveStatement;
    import java.sql.Connection;
    import java.sql.DriverManager;
    import java.sql.ResultSet;
    import java.sql.ResultSetMetaData;
    
    public class Main {
        public static void main(String[] args) throws Exception {
            // For DLF: append ;user=<UserName/RoleName> to the URL
            String url = "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>";
            Class.forName("org.apache.hive.jdbc.HiveDriver");
            Connection conn = DriverManager.getConnection(url);
            HiveStatement stmt = (HiveStatement) conn.createStatement();
    
            String sql = "select * from students;";
            System.out.println("Running " + sql);
            ResultSet res = stmt.executeQuery(sql);
    
            ResultSetMetaData md = res.getMetaData();
            String[] columns = new String[md.getColumnCount()];
            for (int i = 0; i < columns.length; i++) {
                columns[i] = md.getColumnName(i + 1);
            }
            while (res.next()) {
                System.out.print("Row " + res.getRow() + "=[");
                for (int i = 0; i < columns.length; i++) {
                    if (i != 0) {
                        System.out.print(", ");
                    }
                    System.out.print(columns[i] + "='" + res.getObject(i + 1) + "'");
                }
                System.out.println(")]");
            }
    
            conn.close();
        }
    }

Connect using Python

  1. Install the required packages.

    pip3 install pyhive thrift
  2. Connect to the gateway. Store sensitive values in environment variables rather than hardcoding them in your script. Set the following environment variables before running:

    • KYUUBI_ENDPOINT — the gateway endpoint

    • KYUUBI_PORT — the port number

    • KYUUBI_TOKEN — the token value

    • KYUUBI_USERNAME — the RAM user or role name (required when using DLF; for other catalogs, set this to the token name)

    import os
    from pyhive import hive
    
    if __name__ == '__main__':
        cursor = hive.connect(
            os.environ['KYUUBI_ENDPOINT'],
            port=os.environ['KYUUBI_PORT'],
            scheme='http',
            username=os.environ['KYUUBI_USERNAME'],
            password=os.environ['KYUUBI_TOKEN']
        ).cursor()
        cursor.execute('show databases')
        print(cursor.fetchall())
        cursor.close()

    When using DLF, set KYUUBI_USERNAME to the RAM user or role name (<UserName/RoleName>). For other catalogs, set KYUUBI_USERNAME to the token name (<tokenname>).

Connect using the REST API

Kyuubi Gateway supports open-source-compatible Representational State Transfer (REST) APIs over HTTP. The following API paths are supported:

  • /api/v1/sessions/*

  • /api/v1/operations/*

  • /api/v1/batches/*

Example 1: Run an SQL query

All four steps use token-based authentication in the URL path (/token/<token>). If you use DLF, add -u '<UserName/RoleName>:<password>' to the curl command in each step, where <password> is a placeholder and can be any value.

  1. Create a session and specify Spark configurations. Replace <dev_queue> with the actual queue name. If you use DLF:

    curl -X 'POST' \
      'http://<endpoint>:<port>/api/v1/sessions/token/<token>' \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -u '<UserName/RoleName>:<password>' \
      -d '{
      "configs": {
        "set:hivevar:spark.emr.serverless.kyuubi.engine.queue": "<dev_queue>"
      }
    }'

    If you use another catalog:

    curl -X 'POST' \
      'http://<endpoint>:<port>/api/v1/sessions/token/<token>' \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -d '{
      "configs": {
        "set:hivevar:spark.emr.serverless.kyuubi.engine.queue": "<dev_queue>"
      }
    }'

    The response returns a session handle (identifier) that identifies this session:

    {"identifier":"619e6ded-xxxx-xxxx-xxxx-c2a43f6fac46","kyuubiInstance":"0.0.0.0:10099"}

    Use the identifier value as <sessionHandle> in subsequent requests.

  2. Submit a SQL statement. If you use DLF:

    curl -X 'POST' \
      'http://<endpoint>:<port>/api/v1/sessions/<sessionHandle>/operations/statement/token/<token>' \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -u '<UserName/RoleName>:<password>' \
      -d '{
      "statement": "select * from test;",
      "runAsync": true,
      "queryTimeout": 0,
      "confOverlay": {
        "additionalProp1": "string",
        "additionalProp2": "string"
      }
    }'

    If you use another catalog:

    curl -X 'POST' \
      'http://<endpoint>:<port>/api/v1/sessions/<sessionHandle>/operations/statement/token/<token>' \
      -H 'accept: application/json' \
      -H 'Content-Type: application/json' \
      -d '{
      "statement": "select * from test;",
      "runAsync": true,
      "queryTimeout": 0,
      "confOverlay": {
        "additionalProp1": "string",
        "additionalProp2": "string"
      }
    }'

    The response returns an operation handle:

    {"identifier":"a743e8ff-xxxx-xxxx-xxxx-a66fec66cfa4"}

    Use the identifier value as <operationHandle> in subsequent requests.

  3. Check the statement status. If you use DLF:

    curl --location -X 'GET' \
      'http://<endpoint>:<port>/api/v1/operations/<operationHandle>/event/token/<token>' \
      -H 'accept: application/json' \
      -u '<UserName/RoleName>:<password>'

    If you use another catalog:

    curl --location -X 'GET' \
      'http://<endpoint>:<port>/api/v1/operations/<operationHandle>/event/token/<token>' \
      -H 'accept: application/json'
  4. Retrieve the results. If you use DLF:

    curl --location -X 'GET' \
      'http://<endpoint>:<port>/api/v1/operations/<operationHandle>/rowset/token/<token>/?maxrows=100&fetchorientation=FETCH_NEXT' \
      -H 'accept: application/json' \
      -u '<UserName/RoleName>:<password>'

    If you use another catalog:

    curl --location -X 'GET' \
      'http://<endpoint>:<port>/api/v1/operations/<operationHandle>/rowset/token/<token>/?maxrows=100&fetchorientation=FETCH_NEXT' \
      -H 'accept: application/json'

Example 2: Submit a batch job

Use the batches API to submit a Spark batch job. The gateway starts a Spark application and runs the specified task.

Download the test JAR package: spark-examples_2.12-3.3.1.jar. This is the SparkPi example that comes with Spark and calculates the value of Pi (π).

If you use DLF:

curl --location \
  --request POST 'http://<endpoint>:<port>/api/v1/batches/token/<token>' \
  --user '<UserName/RoleName>:<password>' \
  --form 'batchRequest="{
    \"batchType\": \"SPARK\",
    \"className\": \"org.apache.spark.examples.SparkPi\",
    \"name\": \"kyuubi-spark-pi\",
    \"resource\": \"oss://bucket/path/to/spark-examples_2.12-3.3.1.jar\"
  }";type=application/json'

If you use another catalog:

curl --location \
  --request POST 'http://<endpoint>:<port>/api/v1/batches/token/<token>' \
  --form 'batchRequest="{
    \"batchType\": \"SPARK\",
    \"className\": \"org.apache.spark.examples.SparkPi\",
    \"name\": \"kyuubi-spark-pi\",
    \"resource\": \"oss://bucket/path/to/spark-examples_2.12-3.3.1.jar\"
  }";type=application/json'

Configure a high-availability Kyuubi Gateway

A high-availability (HA) Kyuubi Gateway uses ZooKeeper for coordination and requires a custom JDBC driver for HA-aware connections.

Prerequisites

Enable high availability

When creating or editing a Kyuubi Gateway, enable Service High Availability, set the number of Kyuubi servers and the ZooKeeper cluster address, and select a network connection for Network Connection.

Connect to the HA gateway

After enabling HA, the gateway coordinates through ZooKeeper. Use the same placeholders described in Connect to a Kyuubi Gateway.

Connect using Beeline

The HA Beeline connection requires a custom JDBC Driver JAR.

  1. Download kyuubi-hive-jdbc-1.9.2.jar.

  2. Replace the default JDBC Driver JAR file.

    1. Back up the existing driver:

      mv /your_path/apache-kyuubi-1.9.2-bin/beeline-jars /bak_path
      Note

      On EMR on ECS, the default Kyuubi path is /opt/apps/KYUUBI/kyuubi-1.9.2-1.0.0/beeline-jars. To find the installation path, run env | grep KYUUBI_HOME.

    2. Copy the downloaded driver to the Beeline directory:

      cp /download/serverless-spark-kyuubi-hive-jdbc-1.9.2.jar /your_path/apache-kyuubi-1.9.2-bin/beeline-jars
  3. Connect using Beeline:

    /your_path/apache-kyuubi-1.9.2-bin/bin/beeline -u 'jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>'

Connect using Java

The HA Java connection uses a shaded JDBC driver from Serverless Spark instead of the standard Hive JDBC driver.

  1. Download serverless-spark-kyuubi-hive-jdbc-shaded-1.9.2.jar.

  2. Install the driver to your local Maven repository:

    mvn install:install-file \
      -Dfile=/download/serverless-spark-kyuubi-hive-jdbc-shaded-1.9.2.jar \
      -DgroupId=org.apache.kyuubi \
      -DartifactId=kyuubi-hive-jdbc-shaded \
      -Dversion=1.9.2-ss \
      -Dpackaging=jar
  3. Add the following dependencies to your pom.xml file:

    <dependencies>
        <dependency>
          <groupId>org.apache.hadoop</groupId>
          <artifactId>hadoop-common</artifactId>
          <version>3.0.0</version>
        </dependency>
        <dependency>
          <groupId>org.apache.kyuubi</groupId>
          <artifactId>kyuubi-hive-jdbc-shaded</artifactId>
          <version>1.9.2-ss</version>
        </dependency>
    </dependencies>
  4. Connect with the following code. Note that the driver class and import differ from the standard (non-HA) Java example.

    import org.apache.kyuubi.jdbc.hive.KyuubiStatement;
    import java.sql.Connection;
    import java.sql.DriverManager;
    import java.sql.ResultSet;
    import java.sql.ResultSetMetaData;
    
    public class Main {
        public static void main(String[] args) throws Exception {
            String url = "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>";
            Class.forName("org.apache.kyuubi.jdbc.KyuubiHiveDriver");
            Connection conn = DriverManager.getConnection(url);
            KyuubiStatement stmt = (KyuubiStatement) conn.createStatement();
    
            String sql = "select * from test;";
            ResultSet res = stmt.executeQuery(sql);
    
            ResultSetMetaData md = res.getMetaData();
            String[] columns = new String[md.getColumnCount()];
            for (int i = 0; i < columns.length; i++) {
                columns[i] = md.getColumnName(i + 1);
            }
            while (res.next()) {
                System.out.print("Row " + res.getRow() + "=[");
                for (int i = 0; i < columns.length; i++) {
                    if (i != 0) {
                        System.out.print(", ");
                    }
                    System.out.print(columns[i] + "='" + res.getObject(i + 1) + "'");
                }
                System.out.println(")]");
            }
    
            conn.close();
        }
    }

View Spark jobs submitted through Kyuubi

Monitor Spark jobs submitted through a Kyuubi Gateway on the Kyuubi Application tab of the Job History page. The tab shows Application ID, Application Name, Application Status, and Start At for each job.

The Application ID (in spark-xxxx format) is generated by the Spark engine and matches the ID returned by the Kyuubi client, uniquely identifying each task instance.

To view jobs for a specific gateway:

  1. On the Kyuubi Gateway page, click the target gateway name.

  2. In the upper-right corner, click Applications. The page lists all Spark jobs submitted through this gateway.

    image

    image