Kyuubi Gateway provides Java Database Connectivity (JDBC) and ODBC interfaces to seamlessly connect Serverless Spark with SQL query and business intelligence (BI) tools, such as Tableau and Power BI. This enables efficient data access and analysis. The gateway also features multitenancy and resource isolation to meet the demands of enterprise applications.
Create a Kyuubi Gateway
Go to the Gateway page.
Log on to the EMR console.
In the navigation pane on the left, choose .
On the Spark page, click the name of the workspace that you want to manage.
On the EMR Serverless Spark page, in the navigation pane on the left, click .
On the Kyuubi Gateway page, click Create Kyuubi Gateway.
On the Create Kyuubi Gateway page, configure the following parameters and click Create.
Parameter
Description
Name
The name of the new gateway. The name can contain only lowercase letters, digits, and hyphens (-). It must start and end with a letter or a digit.
Kyuubi Gateway Resource
The default is
2 CPU, 8 GB.The supported specifications and their recommended maximum concurrent connections are as follows:
1 CPU, 4 GB: 102 CPU, 8 GB: 204 CPU, 16 GB: 308 CPU, 32 GB: 4516 CPU, 64 GB: 8532 CPU, 128 GB: 135
NoteIf there are too many Spark configuration items, the instantaneous concurrency for Spark job submission decreases.
Kyuubi Version
The Kyuubi version that the gateway uses.
NoteIf you use DLF (formerly DLF 2.5) in the Data Catalog, you must set Kyuubi Version to 1.9.2-0.0.1 or later.
Engine Version
The engine version that the gateway uses. For more information about engine version numbers, see Engine versions.
Associated Queue
The gateway is deployed in the selected queue. When you submit a Spark job through the gateway, the job is submitted using the identity of the gateway creator.
Authentication Method
Only token-based authentication is supported.
After you create a gateway, you must generate a unique authentication token for it. This token is used for identity verification and access control in subsequent requests. For more information about how to create a token, see Manage gateways.
Service High Availability
If you enable service high availability, three or more Kyuubi Servers are deployed to achieve high availability.
After you enable this feature, configure the following parameters:
Number Of Kyuubi Servers: The number of Kyuubi servers.
Zookeeper cluster address: A high-availability Kyuubi Gateway depends on a Zookeeper cluster. Enter the Zookeeper cluster endpoint. If there are multiple nodes, separate them with commas (,). Make sure the network is connected. For example,
zk1:2181,zk2:2181,zk3:2181.
Network Connection
The network connection that is used to access the data sources or external services in a virtual private cloud (VPC). For information about how to create a network connection, see Network connectivity between EMR Serverless Spark and other VPCs.
Endpoint(Public)
This feature is disabled by default. If you enable this feature, the system accesses Kyuubi through a public endpoint. Otherwise, Kyuubi is accessed through an internal endpoint by default.
Kyuubi Configuration
Enter the Kyuubi configuration information. Separate items with a space. For example:
kyuubi.engine.pool.size 1.Only the following Kyuubi configurations are supported.
kyuubi.engine.pool.size kyuubi.engine.pool.size.threshold kyuubi.engine.share.level kyuubi.engine.single.spark.session kyuubi.session.engine.idle.timeout kyuubi.session.engine.initialize.timeout kyuubi.engine.security.token.max.lifetime kyuubi.session.engine.check.interval kyuubi.session.idle.timeout kyuubi.session.engine.request.timeout kyuubi.session.engine.login.timeout kyuubi.backend.engine.exec.pool.shutdown.timeout kyuubi.backend.server.exec.pool.shutdown.timeout kyuubi.backend.server.exec.pool.keepalive.time kyuubi.frontend.thrift.login.timeout kyuubi.operation.status.polling.timeout kyuubi.engine.pool.selectPolicy kyuubi.authentication kyuubi.kinit.principal kyuubi.kinit.keytab kyuubi.authentication.ldap.* kyuubi.hadoop.proxyuser.hive.hosts kyuubi.hadoop.proxyuser.hive.groups kyuubi.hadoop.proxyuser.kyuubi.hosts kyuubi.hadoop.proxyuser.kyuubi.groups kyuubi.ha.*Spark Configuration
Enter the Spark configuration information. Separate items with a space. All parameters are supported except for parameters of the
spark.kubernetes.*type. For example:spark.sql.catalog.paimon.metastore dlf.On the Kyuubi Gateway page, find the gateway that you created and click Start in the Actions column.
Manage tokens
On the Kyuubi Gateway page, find the target gateway and click Tokens in the Actions column.
Click Create Token.
In the Create Token dialog box, configure the parameters and click OK.
Parameter
Description
Name
The name of the new token.
Expired At
Set the expiration time for the token. The number of days must be 1 or greater. By default, this is enabled and the token expires after 365 days.
Assigned To
NoteIf you use DLF (formerly DLF 2.5) by default on the Catalog tab, you must configure this parameter.
Make sure the configured Resource Access Management (RAM) user or RAM role has the permissions to access DLF. For more information about how to grant permissions, see Add an authorization.
From the drop-down list, select the RAM user or RAM role that you added in Access control.
Specify the RAM user or RAM role to which the token is assigned. This is used to access DLF when you connect to the Kyuubi Gateway to submit a Spark job.
Copy the token information.
ImportantYou must immediately copy the token information after the token is created because you cannot retrieve it later. If your token expires or is lost, you must create a new token or reset the token.
Connect to a Kyuubi Gateway
When you connect to a Kyuubi Gateway, replace the placeholders in the JDBC URL:
<endpoint>: The endpoint information that you can obtain from the Overview tab.<port>: The port number. The port number is 443 for public endpoints and 80 for internal same-region endpoints.<token>: The token information that you copied from the Token Management page.<tokenname>: The token name. You can obtain it from the Token Management page.<UserName/RoleName>: The RAM user or RAM role that you added to Access control.
Connect using Beeline
Before you connect to a Kyuubi Gateway, make sure that your Beeline version is compatible with the Kyuubi server version. If you do not have Beeline installed, see Getting Started - Apache Kyuubi.
Select one of the following methods based on the default catalog that is configured on the Catalog page.
Use DLF (formerly DLF 2.5)
beeline -u "jdbc:hive2://<endpoint>:<port>/;transportMode=http;user=<UserName/RoleName>;httpPath=cliservice/token/<token>"Use another catalog
beeline -u "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>"When you connect using Beeline, you can modify session parameters. For example: beeline -u "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>;#spark.sql.shuffle.partitions=100;spark.executor.instances=2;".
Connect using Java
Update the pom.xml file.
Replace
hadoop-commonandhive-jdbcwith the appropriate dependency versions.<dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>3.0.0</version> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>2.3.9</version> </dependency> </dependencies>Write Java code to connect to the Kyuubi Gateway.
Select one of the following methods based on the default catalog that is configured on the Catalog page.
Use DLF (formerly DLF 2.5)
import org.apache.hive.jdbc.HiveStatement; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.ResultSetMetaData; public class Main { public static void main(String[] args) throws Exception { String url = "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>;user=<UserName/RoleName>"; Class.forName("org.apache.hive.jdbc.HiveDriver"); Connection conn = DriverManager.getConnection(url); HiveStatement stmt = (HiveStatement) conn.createStatement(); String sql = "select * from students;"; System.out.println("Running " + sql); ResultSet res = stmt.executeQuery(sql); ResultSetMetaData md = res.getMetaData(); String[] columns = new String[md.getColumnCount()]; for (int i = 0; i < columns.length; i++) { columns[i] = md.getColumnName(i + 1); } while (res.next()) { System.out.print("Row " + res.getRow() + "=["); for (int i = 0; i < columns.length; i++) { if (i != 0) { System.out.print(", "); } System.out.print(columns[i] + "='" + res.getObject(i + 1) + "'"); } System.out.println(")]"); } conn.close(); } }Use another catalog
import org.apache.hive.jdbc.HiveStatement; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.ResultSetMetaData; public class Main { public static void main(String[] args) throws Exception { String url = "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>"; Class.forName("org.apache.hive.jdbc.HiveDriver"); Connection conn = DriverManager.getConnection(url); HiveStatement stmt = (HiveStatement) conn.createStatement(); String sql = "select * from students;"; System.out.println("Running " + sql); ResultSet res = stmt.executeQuery(sql); ResultSetMetaData md = res.getMetaData(); String[] columns = new String[md.getColumnCount()]; for (int i = 0; i < columns.length; i++) { columns[i] = md.getColumnName(i + 1); } while (res.next()) { System.out.print("Row " + res.getRow() + "=["); for (int i = 0; i < columns.length; i++) { if (i != 0) { System.out.print(", "); } System.out.print(columns[i] + "='" + res.getObject(i + 1) + "'"); } System.out.println(")]"); } conn.close(); } }
Connect using Python
Run the following command to install the PyHive and Thrift packages.
pip3 install pyhive thriftWrite a Python script to connect to the Kyuubi Gateway.
The following Python script is an example of how to connect to a Kyuubi Gateway and display a list of databases.
Select one of the following methods based on the default catalog that is configured on the Catalog page.
Use DLF (formerly DLF 2.5)
from pyhive import hive if __name__ == '__main__': cursor = hive.connect('<endpoint>', port="<port>", scheme='http', username='<UserName/RoleName>', password='<token>').cursor() cursor.execute('show databases') print(cursor.fetchall()) cursor.close()Use another catalog
from pyhive import hive if __name__ == '__main__': cursor = hive.connect('<endpoint>', port="<port>", scheme='http', username='<tokenname>', password='<token>').cursor() cursor.execute('show databases') print(cursor.fetchall()) cursor.close()
Connect using the REST API
Kyuubi Gateway provides open source-compatible Representational State Transfer (REST) APIs that support interaction with the Kyuubi service over HTTP. Currently, only the following API paths are supported:
/api/v1/sessions/*/api/v1/operations/*/api/v1/batches/*
The following examples show how to connect to a Kyuubi Gateway using the REST API.
Example 1: Start a session and run an SQL query.
Create a session and specify Spark configurations.
Select one of the following methods based on the default catalog that is configured on the Catalog page.
Notespark.emr.serverless.kyuubi.engine.queuespecifies the queue that the Spark job uses at runtime. Replace<dev_queue>with the actual queue name.<UserName/Rolename>: Replace this with the actual username or role name.<password>: This is a placeholder. You can enter any value.
Use DLF (formerly DLF 2.5)
curl -X 'POST' \ 'http://<endpoint>:<port>/api/v1/sessions/token/<token>' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -u '<UserName/Rolename>:<password>' \ -d '{ "configs": { "set:hivevar:spark.emr.serverless.kyuubi.engine.queue": "<dev_queue>" } }'Use another catalog
curl -X 'POST' \ 'http://<endpoint>:<port>/api/v1/sessions/token/<token>' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "configs": { "set:hivevar:spark.emr.serverless.kyuubi.engine.queue": "<dev_queue>" } }'A message similar to the following is returned. In the message,
identifierindicates the Kyuubi session handle, which uniquely identifies a session. In this topic, this value is referred to as<sessionHandle>.{"identifier":"619e6ded-xxxx-xxxx-xxxx-c2a43f6fac46","kyuubiInstance":"0.0.0.0:10099"}Create a statement.
Use DLF (formerly DLF 2.5)
curl -X 'POST' \ 'http://<endpoint>:<port>/api/v1/sessions/<sessionHandle>/operations/statement/token/<token>' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -u '<UserName/RoleName>:<password>' \ -d '{ "statement": "select * from test;", "runAsync": true, "queryTimeout": 0, "confOverlay": { "additionalProp1": "string", "additionalProp2": "string" } }'Use another catalog
curl -X 'POST' \ 'http://<endpoint>:<port>/api/v1/sessions/<sessionHandle>/operations/statement/token/<token>' \ -H 'accept: application/json' \ -H 'Content-Type: application/json' \ -d '{ "statement": "select * from test;", "runAsync": true, "queryTimeout": 0, "confOverlay": { "additionalProp1": "string", "additionalProp2": "string" } }'A message similar to the following is returned. Here,
identifierindicates the Kyuubi operation handle, which uniquely identifies a specific operation. In this topic, this value is referred to as<operationHandle>.{"identifier":"a743e8ff-xxxx-xxxx-xxxx-a66fec66cfa4"}Retrieve the statement status.
Use DLF (formerly DLF 2.5)
curl --location -X 'GET' \ 'http://<endpoint>:<port>/api/v1/operations/<operationHandle>/event/token/<token>' \ -H 'accept: application/json' \ -u '<UserName/RoleName>:<password>'Use another catalog
curl --location -X 'GET' \ 'http://<endpoint>:<port>/api/v1/operations/<operationHandle>/event/token/<token>' \ -H 'accept: application/json'Retrieve the statement result.
Use DLF (formerly DLF 2.5)
curl --location -X 'GET' \ 'http://<endpoint>:<port>/api/v1/operations/<operationHandle>/rowset/token/<token>/?maxrows=100&fetchorientation=FETCH_NEXT' \ -H 'accept: application/json' \ -u '<UserName/RoleName>:<password>'Use another catalog
curl --location -X 'GET' \ 'http://<endpoint>:<port>/api/v1/operations/<operationHandle>/rowset/token/<token>/?maxrows=100&fetchorientation=FETCH_NEXT' \ -H 'accept: application/json'
Example 2: Use the batches API to submit a batch job.
You can submit a Spark batch processing job to the Kyuubi Gateway using the REST API. The Kyuubi Gateway starts a Spark application and runs the specified task based on the parameters in the request.
In this example, in addition to replacing information such as
<endpoint>,<port>, and<token>, you must also download the test JAR package by clicking spark-examples_2.12-3.3.1.jar.NoteThis JAR package is a simple example that comes with Spark. It is used to calculate the value of Pi (π).
Use DLF (formerly DLF 2.5)
curl --location \ --request POST 'http://<endpoint>:<port>/api/v1/batches/token/<token>' \ --user '<UserName/RoleName>:<password>' \ --form 'batchRequest="{ \"batchType\": \"SPARK\", \"className\": \"org.apache.spark.examples.SparkPi\", \"name\": \"kyuubi-spark-pi\", \"resource\": \"oss://bucket/path/to/spark-examples_2.12-3.3.1.jar\" }";type=application/json'Use another catalog
curl --location \ --request POST 'http://<endpoint>:<port>/api/v1/batches/token/<token>' \ --form 'batchRequest="{ \"batchType\": \"SPARK\", \"className\": \"org.apache.spark.examples.SparkPi\", \"name\": \"kyuubi-spark-pi\", \"resource\": \"oss://bucket/path/to/spark-examples_2.12-3.3.1.jar\" }";type=application/json'
Configure and connect to a high-availability Kyuubi Gateway
Establish network connectivity.
For more information, see Network connectivity between EMR Serverless Spark and other VPCs. Make sure that your client can access the Zookeeper cluster in the target VPC. For example, you can use the Zookeeper component of Alibaba Cloud MSE or EMR on ECS.
Enable high availability for the Kyuubi Gateway.
When you create or edit a Kyuubi Gateway, enable Service High Availability, configure the relevant parameters, and select an established network connection for Network Connection.
Connect to the high-availability Kyuubi Gateway.
After you complete the preceding configurations, the Kyuubi Gateway is configured for high availability through Zookeeper. You can verify its availability by connecting to it through the REST API or JDBC.
When you connect to a Kyuubi Gateway, replace the placeholders in the JDBC URL:
<endpoint>: The endpoint information that you can obtain from the Overview tab.<port>: The port number. The port number is 443 for public endpoints and 80 for internal same-region endpoints.<token>: The token information that you copied from the Token Management page.<tokenname>: The token name. You can obtain it from the Token Management page.<UserName/RoleName>: The RAM user or RAM role that you added to Access control.
The following examples show how to connect to a high-availability Kyuubi Gateway.
Connect using Beeline
Download the JDBC Driver JAR file by clicking kyuubi-hive-jdbc-1.9.2.jar.
Replace the JDBC Driver JAR file.
Back up and move the original JDBC Driver JAR file.
mv /your_path/apache-kyuubi-1.9.2-bin/beeline-jars /bak_pathNoteIf you are using EMR on ECS, the default path for Kyuubi is
/opt/apps/KYUUBI/kyuubi-1.9.2-1.0.0/beeline-jars. If you do not know the Kyuubi installation path, you can find it by running theenv | grep KYUUBI_HOMEcommand.Replace it with the new JDBC Driver JAR file.
cp /download/serverless-spark-kyuubi-hive-jdbc-1.9.2.jar /your_path/apache-kyuubi-1.9.2-bin/beeline-jars
Connect using Beeline.
/your_path/apache-kyuubi-1.9.2-bin/bin/beeline -u 'jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>'
Connect using Java
Download the shaded package by clicking serverless-spark-kyuubi-hive-jdbc-shaded-1.9.2.jar.
Install the JDBC Driver to the Maven repository.
Run the following command to install the JDBC Driver provided by Serverless Spark to your local Maven repository.
mvn install:install-file \ -Dfile=/download/serverless-spark-kyuubi-hive-jdbc-shaded-1.9.2.jar \ -DgroupId=org.apache.kyuubi \ -DartifactId=kyuubi-hive-jdbc-shaded \ -Dversion=1.9.2-ss \ -Dpackaging=jarModify the
pom.xmlfile.Add the following dependencies to your project's
pom.xmlfile.<dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>3.0.0</version> </dependency> <dependency> <groupId>org.apache.kyuubi</groupId> <artifactId>kyuubi-hive-jdbc-shaded</artifactId> <version>1.9.2-ss</version> </dependency> </dependencies>Write the example Java code.
import org.apache.kyuubi.jdbc.hive.KyuubiStatement; import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.ResultSetMetaData; public class Main { public static void main(String[] args) throws Exception { String url = "jdbc:hive2://<endpoint>:<port>/;transportMode=http;httpPath=cliservice/token/<token>"; Class.forName("org.apache.kyuubi.jdbc.KyuubiHiveDriver"); Connection conn = DriverManager.getConnection(url); KyuubiStatement stmt = (KyuubiStatement) conn.createStatement(); String sql = "select * from test;"; ResultSet res = stmt.executeQuery(sql); ResultSetMetaData md = res.getMetaData(); String[] columns = new String[md.getColumnCount()]; for (int i = 0; i < columns.length; i++) { columns[i] = md.getColumnName(i + 1); } while (res.next()) { System.out.print("Row " + res.getRow() + "=["); for (int i = 0; i < columns.length; i++) { if (i != 0) { System.out.print(", "); } System.out.print(columns[i] + "='" + res.getObject(i + 1) + "'"); } System.out.println(")]"); } conn.close(); } }
View the list of Spark jobs submitted by Kyuubi
For Spark jobs submitted through Kyuubi, you can view detailed job information on the Kyuubi Application tab of the Job History page. This information includes Application ID, Application Name, Application Status, and Start At. This information helps you understand and manage Spark jobs submitted by Kyuubi.
On the Kyuubi Gateway page, click the target Kyuubi Gateway.
In the upper-right corner, click Applications.

On this page, you can view the details of all Spark jobs submitted through this Kyuubi Gateway. The Application ID (spark-xxxx) is generated by the Spark engine and is identical to the Application ID that is returned when you connect with the Kyuubi client. This ID uniquely identifies the task instance.
