All Products
Search
Document Center

E-MapReduce:Manage Kyuubi Gateways

Last Updated:Nov 29, 2025

Kyuubi Gateway provides Java Database Connectivity (JDBC) and ODBC interfaces to seamlessly connect Serverless Spark with SQL query and business intelligence (BI) tools, such as Tableau and Power BI. This enables efficient data access and analysis. The gateway also features multitenancy and resource isolation to meet the demands of enterprise applications.

Create a Kyuubi Gateway

  1. Go to the Gateway page.

    1. Log on to the EMR console.

    2. In the navigation pane on the left, choose EMR Serverless > Spark.

    3. On the Spark page, click the name of the workspace that you want to manage.

    4. On the EMR Serverless Spark page, in the navigation pane on the left, click O&M Center > Gateway.

  2. On the Kyuubi Gateway page, click Create Kyuubi Gateway.

  3. On the Create Kyuubi Gateway page, configure the following parameters and click Create.

    Parameter

    Description

    Name

    The name of the new gateway. The name can contain only lowercase letters, digits, and hyphens (-). It must start and end with a letter or a digit.

    Kyuubi Gateway Resource

    The default is 2 CPU, 8 GB.

    The supported specifications and their recommended maximum concurrent connections are as follows:

    • 1 CPU, 4 GB: 10

    • 2 CPU, 8 GB: 20

    • 4 CPU, 16 GB: 30

    • 8 CPU, 32 GB: 45

    • 16 CPU, 64 GB: 85

    • 32 CPU, 128 GB: 135

    Note

    If there are too many Spark configuration items, the instantaneous concurrency for Spark job submission decreases.

    Kyuubi Version

    The Kyuubi version that the gateway uses.

    Note

    If you use DLF (formerly DLF 2.5) in the Data Catalog, you must set Kyuubi Version to 1.9.2-0.0.1 or later.

    Engine Version

    The engine version that the gateway uses. For more information about engine version numbers, see Engine versions.

    Associated Queue

    The gateway is deployed in the selected queue. When you submit a Spark job through the gateway, the job is submitted using the identity of the gateway creator.

    Authentication Method

    Only token-based authentication is supported.

    After you create a gateway, you must generate a unique authentication token for it. This token is used for identity verification and access control in subsequent requests. For more information about how to create a token, see Manage gateways.

    Service High Availability

    If you enable service high availability, three or more Kyuubi Servers are deployed to achieve high availability.

    After you enable this feature, configure the following parameters:

    • Number Of Kyuubi Servers: The number of Kyuubi servers.

    • Zookeeper cluster address: A high-availability Kyuubi Gateway depends on a Zookeeper cluster. Enter the Zookeeper cluster endpoint. If there are multiple nodes, separate them with commas (,). Make sure the network is connected. For example, zk1:2181,zk2:2181,zk3:2181.

    Network Connection

    The network connection that is used to access the data sources or external services in a virtual private cloud (VPC). For information about how to create a network connection, see Network connectivity between EMR Serverless Spark and other VPCs.

    Endpoint(Public)

    This feature is disabled by default. If you enable this feature, the system accesses Kyuubi through a public endpoint. Otherwise, Kyuubi is accessed through an internal endpoint by default.

    Kyuubi Configuration

    Enter the Kyuubi configuration information. Separate items with a space. For example: kyuubi.engine.pool.size 1.

    Only the following Kyuubi configurations are supported.

    kyuubi.engine.pool.size
    kyuubi.engine.pool.size.threshold
    kyuubi.engine.share.level
    kyuubi.engine.single.spark.session
    kyuubi.session.engine.idle.timeout
    kyuubi.session.engine.initialize.timeout
    kyuubi.engine.security.token.max.lifetime
    kyuubi.session.engine.check.interval
    kyuubi.session.idle.timeout
    kyuubi.session.engine.request.timeout
    kyuubi.session.engine.login.timeout
    kyuubi.backend.engine.exec.pool.shutdown.timeout
    kyuubi.backend.server.exec.pool.shutdown.timeout
    kyuubi.backend.server.exec.pool.keepalive.time
    kyuubi.frontend.thrift.login.timeout
    kyuubi.operation.status.polling.timeout
    kyuubi.engine.pool.selectPolicy
    kyuubi.authentication
    kyuubi.kinit.principal
    kyuubi.kinit.keytab
    kyuubi.authentication.ldap.*
    kyuubi.hadoop.proxyuser.hive.hosts
    kyuubi.hadoop.proxyuser.hive.groups
    kyuubi.hadoop.proxyuser.kyuubi.hosts
    kyuubi.hadoop.proxyuser.kyuubi.groups
    kyuubi.ha.*

    Spark Configuration

    Enter the Spark configuration information. Separate items with a space. All parameters are supported except for parameters of the spark.kubernetes.* type. For example: spark.sql.catalog.paimon.metastore dlf.

  4. On the Kyuubi Gateway page, find the gateway that you created and click Start in the Actions column.

Manage tokens

  1. On the Kyuubi Gateway page, find the target gateway and click Tokens in the Actions column.

  2. Click Create Token.

  3. In the Create Token dialog box, configure the parameters and click OK.

    Parameter

    Description

    Name

    The name of the new token.

    Expired At

    Set the expiration time for the token. The number of days must be 1 or greater. By default, this is enabled and the token expires after 365 days.

    Assigned To

    Note
    • If you use DLF (formerly DLF 2.5) by default on the Catalog tab, you must configure this parameter.

    • Make sure the configured Resource Access Management (RAM) user or RAM role has the permissions to access DLF. For more information about how to grant permissions, see Add an authorization.

    From the drop-down list, select the RAM user or RAM role that you added in Access control.

    Specify the RAM user or RAM role to which the token is assigned. This is used to access DLF when you connect to the Kyuubi Gateway to submit a Spark job.

  4. Copy the token information.

    Important

    You must immediately copy the token information after the token is created because you cannot retrieve it later. If your token expires or is lost, you must create a new token or reset the token.

Connect to a Kyuubi Gateway

When you connect to a Kyuubi Gateway, replace the placeholders in the JDBC URL:

  • <endpoint>: The endpoint information that you can obtain from the Overview tab.

  • <port>: The port number. The port number is 443 for public endpoints and 80 for internal same-region endpoints.

  • <token>: The token information that you copied from the Token Management page.

  • <tokenname>: The token name. You can obtain it from the Token Management page.

  • <UserName/RoleName>: The RAM user or RAM role that you added to Access control.

Connect using Beeline

Before you connect to a Kyuubi Gateway, make sure that your Beeline version is compatible with the Kyuubi server version. If you do not have Beeline installed, see Getting Started - Apache Kyuubi.

Select one of the following methods based on the default catalog that is configured on the Catalog page.

Use DLF (formerly DLF 2.5)

beeline -u "jdbc:hive2://<endpoint>:<port>/;transportMode=http;user=<UserName/RoleName>;httpPath=cliservice/token/<token>"

Use another catalog

beeline -u "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>"

When you connect using Beeline, you can modify session parameters. For example: beeline -u "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>;#spark.sql.shuffle.partitions=100;spark.executor.instances=2;".

Connect using Java

  • Update the pom.xml file.

    Replace hadoop-common and hive-jdbc with the appropriate dependency versions.

    <dependencies>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>3.0.0</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hive</groupId>
                <artifactId>hive-jdbc</artifactId>
                <version>2.3.9</version>
            </dependency>
        </dependencies>
  • Write Java code to connect to the Kyuubi Gateway.

    Select one of the following methods based on the default catalog that is configured on the Catalog page.

    Use DLF (formerly DLF 2.5)

    import org.apache.hive.jdbc.HiveStatement;
    import java.sql.Connection;
    import java.sql.DriverManager;
    import java.sql.ResultSet;
    import java.sql.ResultSetMetaData;
    
    public class Main {
        public static void main(String[] args) throws Exception {
            String url = "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>;user=<UserName/RoleName>";
            Class.forName("org.apache.hive.jdbc.HiveDriver");
            Connection conn = DriverManager.getConnection(url);
            HiveStatement stmt = (HiveStatement) conn.createStatement();
    
    
            String sql = "select * from students;";
            System.out.println("Running " + sql);
            ResultSet res = stmt.executeQuery(sql);
    
            ResultSetMetaData md = res.getMetaData();
            String[] columns = new String[md.getColumnCount()];
            for (int i = 0; i < columns.length; i++) {
                columns[i] = md.getColumnName(i + 1);
            }
            while (res.next()) {
                System.out.print("Row " + res.getRow() + "=[");
                for (int i = 0; i < columns.length; i++) {
                    if (i != 0) {
                        System.out.print(", ");
                    }
                    System.out.print(columns[i] + "='" + res.getObject(i + 1) + "'");
                }
                System.out.println(")]");
            }
    
            conn.close();
        }
    }

    Use another catalog

    import org.apache.hive.jdbc.HiveStatement;
    import java.sql.Connection;
    import java.sql.DriverManager;
    import java.sql.ResultSet;
    import java.sql.ResultSetMetaData;
    
    public class Main {
        public static void main(String[] args) throws Exception {
            String url = "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>";
            Class.forName("org.apache.hive.jdbc.HiveDriver");
            Connection conn = DriverManager.getConnection(url);
            HiveStatement stmt = (HiveStatement) conn.createStatement();
    
    
            String sql = "select * from students;";
            System.out.println("Running " + sql);
            ResultSet res = stmt.executeQuery(sql);
    
            ResultSetMetaData md = res.getMetaData();
            String[] columns = new String[md.getColumnCount()];
            for (int i = 0; i < columns.length; i++) {
                columns[i] = md.getColumnName(i + 1);
            }
            while (res.next()) {
                System.out.print("Row " + res.getRow() + "=[");
                for (int i = 0; i < columns.length; i++) {
                    if (i != 0) {
                        System.out.print(", ");
                    }
                    System.out.print(columns[i] + "='" + res.getObject(i + 1) + "'");
                }
                System.out.println(")]");
            }
    
            conn.close();
        }
    }

Connect using Python

  1. Run the following command to install the PyHive and Thrift packages.

    pip3 install pyhive thrift
  2. Write a Python script to connect to the Kyuubi Gateway.

    The following Python script is an example of how to connect to a Kyuubi Gateway and display a list of databases.

    Select one of the following methods based on the default catalog that is configured on the Catalog page.

    Use DLF (formerly DLF 2.5)

    from pyhive import hive
    
    if __name__ == '__main__':
        cursor = hive.connect('<endpoint>',
                              port="<port>",
                              scheme='http',
                              username='<UserName/RoleName>',
                              password='<token>').cursor()
        cursor.execute('show databases')
        print(cursor.fetchall())
        cursor.close()

    Use another catalog

    from pyhive import hive
    
    if __name__ == '__main__':
        cursor = hive.connect('<endpoint>',
                              port="<port>",
                              scheme='http',
                              username='<tokenname>',
                              password='<token>').cursor()
        cursor.execute('show databases')
        print(cursor.fetchall())
        cursor.close()

Connect using the REST API

Kyuubi Gateway provides open source-compatible Representational State Transfer (REST) APIs that support interaction with the Kyuubi service over HTTP. Currently, only the following API paths are supported:

  • /api/v1/sessions/*

  • /api/v1/operations/*

  • /api/v1/batches/*

The following examples show how to connect to a Kyuubi Gateway using the REST API.

  • Example 1: Start a session and run an SQL query.

    1. Create a session and specify Spark configurations.

      Select one of the following methods based on the default catalog that is configured on the Catalog page.

      Note
      • spark.emr.serverless.kyuubi.engine.queue specifies the queue that the Spark job uses at runtime. Replace <dev_queue> with the actual queue name.

      • <UserName/Rolename>: Replace this with the actual username or role name.

      • <password>: This is a placeholder. You can enter any value.

      Use DLF (formerly DLF 2.5)

      curl -X 'POST' \
        'http://<endpoint>:<port>/api/v1/sessions/token/<token>' \
        -H 'accept: application/json' \
        -H 'Content-Type: application/json' \
        -u '<UserName/Rolename>:<password>' \
        -d '{
        "configs": {
          "set:hivevar:spark.emr.serverless.kyuubi.engine.queue": "<dev_queue>"
        }
      }'

      Use another catalog

      curl -X 'POST' \
        'http://<endpoint>:<port>/api/v1/sessions/token/<token>' \
        -H 'accept: application/json' \
        -H 'Content-Type: application/json' \
        -d '{
        "configs": {
          "set:hivevar:spark.emr.serverless.kyuubi.engine.queue": "<dev_queue>"
        }
      }'

      A message similar to the following is returned. In the message, identifier indicates the Kyuubi session handle, which uniquely identifies a session. In this topic, this value is referred to as <sessionHandle>.

      {"identifier":"619e6ded-xxxx-xxxx-xxxx-c2a43f6fac46","kyuubiInstance":"0.0.0.0:10099"}
    2. Create a statement.

      Use DLF (formerly DLF 2.5)

      curl -X 'POST' \
        'http://<endpoint>:<port>/api/v1/sessions/<sessionHandle>/operations/statement/token/<token>' \
        -H 'accept: application/json' \
        -H 'Content-Type: application/json' \
        -u '<UserName/RoleName>:<password>' \
        -d '{
        "statement": "select * from test;",
        "runAsync": true,
        "queryTimeout": 0,
        "confOverlay": {
          "additionalProp1": "string",
          "additionalProp2": "string"
        }
      }'

      Use another catalog

      curl -X 'POST' \
        'http://<endpoint>:<port>/api/v1/sessions/<sessionHandle>/operations/statement/token/<token>' \
        -H 'accept: application/json' \
        -H 'Content-Type: application/json' \
        -d '{
        "statement": "select * from test;",
        "runAsync": true,
        "queryTimeout": 0,
        "confOverlay": {
          "additionalProp1": "string",
          "additionalProp2": "string"
        }
      }'

      A message similar to the following is returned. Here, identifier indicates the Kyuubi operation handle, which uniquely identifies a specific operation. In this topic, this value is referred to as <operationHandle>.

      {"identifier":"a743e8ff-xxxx-xxxx-xxxx-a66fec66cfa4"}
    3. Retrieve the statement status.

      Use DLF (formerly DLF 2.5)

      curl --location -X 'GET' \
        'http://<endpoint>:<port>/api/v1/operations/<operationHandle>/event/token/<token>' \
        -H 'accept: application/json' \
        -u '<UserName/RoleName>:<password>'

      Use another catalog

      curl --location -X 'GET' \
        'http://<endpoint>:<port>/api/v1/operations/<operationHandle>/event/token/<token>' \
        -H 'accept: application/json'
    4. Retrieve the statement result.

      Use DLF (formerly DLF 2.5)

      curl --location -X 'GET' \
        'http://<endpoint>:<port>/api/v1/operations/<operationHandle>/rowset/token/<token>/?maxrows=100&fetchorientation=FETCH_NEXT' \
        -H 'accept: application/json'  \
        -u '<UserName/RoleName>:<password>'

      Use another catalog

      curl --location -X 'GET' \
        'http://<endpoint>:<port>/api/v1/operations/<operationHandle>/rowset/token/<token>/?maxrows=100&fetchorientation=FETCH_NEXT' \
        -H 'accept: application/json'
  • Example 2: Use the batches API to submit a batch job.

    You can submit a Spark batch processing job to the Kyuubi Gateway using the REST API. The Kyuubi Gateway starts a Spark application and runs the specified task based on the parameters in the request.

    In this example, in addition to replacing information such as <endpoint>, <port>, and <token>, you must also download the test JAR package by clicking spark-examples_2.12-3.3.1.jar.

    Note

    This JAR package is a simple example that comes with Spark. It is used to calculate the value of Pi (π).

    Use DLF (formerly DLF 2.5)

    curl --location \
      --request POST 'http://<endpoint>:<port>/api/v1/batches/token/<token>' \
      --user '<UserName/RoleName>:<password>' \
      --form 'batchRequest="{
        \"batchType\": \"SPARK\",
        \"className\": \"org.apache.spark.examples.SparkPi\",
        \"name\": \"kyuubi-spark-pi\",
        \"resource\": \"oss://bucket/path/to/spark-examples_2.12-3.3.1.jar\"
      }";type=application/json'

    Use another catalog

    curl --location \
      --request POST 'http://<endpoint>:<port>/api/v1/batches/token/<token>' \
      --form 'batchRequest="{
        \"batchType\": \"SPARK\",
        \"className\": \"org.apache.spark.examples.SparkPi\",
        \"name\": \"kyuubi-spark-pi\",
        \"resource\": \"oss://bucket/path/to/spark-examples_2.12-3.3.1.jar\"
      }";type=application/json'

Configure and connect to a high-availability Kyuubi Gateway

  1. Establish network connectivity.

    For more information, see Network connectivity between EMR Serverless Spark and other VPCs. Make sure that your client can access the Zookeeper cluster in the target VPC. For example, you can use the Zookeeper component of Alibaba Cloud MSE or EMR on ECS.

  2. Enable high availability for the Kyuubi Gateway.

    When you create or edit a Kyuubi Gateway, enable Service High Availability, configure the relevant parameters, and select an established network connection for Network Connection.

  3. Connect to the high-availability Kyuubi Gateway.

    After you complete the preceding configurations, the Kyuubi Gateway is configured for high availability through Zookeeper. You can verify its availability by connecting to it through the REST API or JDBC.

    When you connect to a Kyuubi Gateway, replace the placeholders in the JDBC URL:

    • <endpoint>: The endpoint information that you can obtain from the Overview tab.

    • <port>: The port number. The port number is 443 for public endpoints and 80 for internal same-region endpoints.

    • <token>: The token information that you copied from the Token Management page.

    • <tokenname>: The token name. You can obtain it from the Token Management page.

    • <UserName/RoleName>: The RAM user or RAM role that you added to Access control.

    The following examples show how to connect to a high-availability Kyuubi Gateway.

    Connect using Beeline

    1. Download the JDBC Driver JAR file by clicking kyuubi-hive-jdbc-1.9.2.jar.

    2. Replace the JDBC Driver JAR file.

      1. Back up and move the original JDBC Driver JAR file.

        mv /your_path/apache-kyuubi-1.9.2-bin/beeline-jars /bak_path
        Note

        If you are using EMR on ECS, the default path for Kyuubi is /opt/apps/KYUUBI/kyuubi-1.9.2-1.0.0/beeline-jars. If you do not know the Kyuubi installation path, you can find it by running the env | grep KYUUBI_HOME command.

      2. Replace it with the new JDBC Driver JAR file.

        cp /download/serverless-spark-kyuubi-hive-jdbc-1.9.2.jar /your_path/apache-kyuubi-1.9.2-bin/beeline-jars
    3. Connect using Beeline.

      /your_path/apache-kyuubi-1.9.2-bin/bin/beeline -u 'jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>'

    Connect using Java

    1. Download the shaded package by clicking serverless-spark-kyuubi-hive-jdbc-shaded-1.9.2.jar.

    2. Install the JDBC Driver to the Maven repository.

      Run the following command to install the JDBC Driver provided by Serverless Spark to your local Maven repository.

      mvn install:install-file \
        -Dfile=/download/serverless-spark-kyuubi-hive-jdbc-shaded-1.9.2.jar \
        -DgroupId=org.apache.kyuubi \
        -DartifactId=kyuubi-hive-jdbc-shaded \
        -Dversion=1.9.2-ss \
        -Dpackaging=jar
    3. Modify the pom.xml file.

      Add the following dependencies to your project's pom.xml file.

      <dependencies>
          <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.0.0</version>
          </dependency>
          <dependency>
            <groupId>org.apache.kyuubi</groupId>
            <artifactId>kyuubi-hive-jdbc-shaded</artifactId>
            <version>1.9.2-ss</version>
          </dependency>
      </dependencies>
    4. Write the example Java code.

      import org.apache.kyuubi.jdbc.hive.KyuubiStatement;
      
      import java.sql.Connection;
      import java.sql.DriverManager;
      import java.sql.ResultSet;
      import java.sql.ResultSetMetaData;
      
      
      public class Main {
          public static void main(String[] args) throws Exception {
              String url = "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>";
              Class.forName("org.apache.kyuubi.jdbc.KyuubiHiveDriver");
              Connection conn = DriverManager.getConnection(url);
              KyuubiStatement stmt = (KyuubiStatement) conn.createStatement();
      
      
              String sql = "select * from test;";
              ResultSet res = stmt.executeQuery(sql);
      
              ResultSetMetaData md = res.getMetaData();
              String[] columns = new String[md.getColumnCount()];
              for (int i = 0; i < columns.length; i++) {
                  columns[i] = md.getColumnName(i + 1);
              }
              while (res.next()) {
                  System.out.print("Row " + res.getRow() + "=[");
                  for (int i = 0; i < columns.length; i++) {
                      if (i != 0) {
                          System.out.print(", ");
                      }
                      System.out.print(columns[i] + "='" + res.getObject(i + 1) + "'");
                  }
                  System.out.println(")]");
              }
      
              conn.close();
          }
      }

View the list of Spark jobs submitted by Kyuubi

For Spark jobs submitted through Kyuubi, you can view detailed job information on the Kyuubi Application tab of the Job History page. This information includes Application ID, Application Name, Application Status, and Start At. This information helps you understand and manage Spark jobs submitted by Kyuubi.

  1. On the Kyuubi Gateway page, click the target Kyuubi Gateway.

  2. In the upper-right corner, click Applications.

    image

    On this page, you can view the details of all Spark jobs submitted through this Kyuubi Gateway. The Application ID (spark-xxxx) is generated by the Spark engine and is identical to the Application ID that is returned when you connect with the Kyuubi client. This ID uniquely identifies the task instance.

    image