Serverless Spark supports Kerberos authentication for the Kyuubi Gateway, enforcing identity verification and access control for all job submissions. After you complete the setup, clients must authenticate with Kerberos before submitting tasks to the gateway.
How it works
Kerberos uses a ticket-based authentication model. Before a client can connect to Kyuubi Gateway, it exchanges credentials with the Key Distribution Center (KDC) to obtain tickets. The following steps describe the authentication flow:
-
The Kerberos client sends a principal and secret (password or keytab file) to the KDC.
-
The KDC returns a ticket-granting ticket (TGT).
-
The client stores the TGT in a ticket cache.
-
The JDBC client (kyuubi-beeline) reads the TGT from the cache.
-
The JDBC client sends the TGT and the Kyuubi server principal to the KDC.
-
The KDC returns a client-to-server ticket.
-
The JDBC client presents the client-to-server ticket to Kyuubi Gateway to authenticate.
Understanding this flow explains why each configuration step is required: the keytab lets Kyuubi Gateway authenticate itself to the KDC, the kinit step gets a TGT for the client, and the JDBC URL principal parameter tells the client which server to request a ticket for.
Limitations
-
The E-MapReduce (EMR) cluster and the Serverless Spark workspace must be in the same region.
-
Only one Kyuubi Gateway can be created in a workspace where Kerberos is enabled.
Prerequisites
Before you begin, make sure that you have:
-
An EMR on ECS cluster with Kerberos authentication enabled. For details, see Create a cluster.
-
A Serverless Spark workspace with Kerberos authentication enabled. For details, see Enable Kerberos authentication.
Create network connectivity
Kyuubi Gateway runs inside Serverless Spark. To reach the Kerberos cluster, configure PrivateLink to establish a private network connection between the two environments.
Create an endpoint
An endpoint is created and maintained by the service consumer. Associate it with an endpoint service to access the Kerberos cluster through PrivateLink. For details, see Endpoints.
-
Log on to the Endpoint console.
-
On the Create Endpoint page, configure the following parameters and click OK.
Parameter Description Region Select the region where the endpoint resides. Must match the region of the Kerberos cluster and the Serverless Spark workspace. Endpoint name Enter a custom name for the endpoint. Endpoint type Select Interface Endpoint. Endpoint service Click Select Service, then select or enter the target endpoint service ID. NoteTo get the endpoint service ID, submit a ticket with the following information: the Serverless Spark workspace ID (for example,
w-f8cfXXXXXX), the VPC ID of the Kerberos cluster (for example,vpc-bp1tXXXXXX), and the two zones that have available vSwitches (for example,I,J). Contact customer service in the ticket to find supported zones for your region.VPC Select the VPC of the Kerberos cluster. Security groups Select the security group to associate with the endpoint elastic network interface (ENI). By default, you can add up to nine security groups per endpoint. Zone and vSwitch Select the zones specified in the ticket and their corresponding vSwitches. IP version Select the network type: IPv4 for IPv4-only client access, or Dual-stack for both IPv4 and IPv6. Dual-stack requires the service provider to complete dual-stack configuration first. Resource group Select the resource group for the endpoint. Tag Enter a Tag key and Tag value if needed. 
-
On the Basic information page, confirm that Status is Active. The endpoint domain name is in this format:
ep-xxxxxxxxxxx.epsrv-xxxxxxxxxxx.cn-hangzhou.privatelink.aliyuncs.com -
Log on to the Kerberos cluster and verify that the network connection is working.

Configure domain name resolution (optional)
The default endpoint domain name is long. For convenience, map it to a shorter custom name using an internal authoritative DNS zone. For details, see Internal authoritative domain names.
-
Log on to the console. On the Authoritative zone tab, click User defined zones, then click Add zone.
-
Enter an authoritative zone name, select the VPCs where the domain name applies, and click OK. This guide uses
kyuubi-kerberos.abcas an example.If the Domain name type option is available, select Internal authoritative acceleration zone. If the option is not shown, the zone type defaults to internal authoritative acceleration automatically.
-
On the User defined zones tab, find the zone and click Settings in the Actions column. In the dialog box, click Add record and select Form editor mode.
-
Set Record type to CNAME. For Hostname, enter a value such as
test. For Record value, enter the endpoint domain name:ep-xxxxxxxxxxx.epsrv-xxxxxxxxxxx.cn-hangzhou.privatelink.aliyuncs.comAfter you click OK, the domain
test.kyuubi-kerberos.abcmaps to the endpoint. -
Log on to the Kerberos cluster and test connectivity:
ping test.kyuubi-kerberos.abc
Create a keytab
Run the following steps on the Kerberos cluster.
-
Open the Kerberos admin tool:
kadmin.local -
Create a principal in the format
kyuubi/<fqdn>@<REALM>. For<fqdn>, use the endpoint domain name. If you configured a CNAME, use the custom domain name instead:addprinc -randkey kyuubi/ep-xxxxxxxxxxx.epsrv-xxxxxxxxxxx.cn-hangzhou.privatelink.aliyuncs.com@EMR.C-DFD4*****C204.COM -
Export the keytab file and exit:
xst -kt /root/kyuubi.keytab kyuubi/ep-xxxxxxxxxxx.epsrv-xxxxxxxxxxx.cn-hangzhou.privatelink.aliyuncs.com@EMR.C-DFD4*****C204.COM quit -
Upload the keytab to an Object Storage Service (OSS) bucket:
hadoop fs -put /root/kyuubi.keytab oss://<YOUR_BUCKET>.<region>.oss-dls.aliyuncs.com/
Configure Kyuubi Gateway
Add the following Kyuubi configuration parameters to the gateway:
kyuubi.authentication KERBEROS
kyuubi.kinit.principal kyuubi/ep-xxxxxxxxxxx.epsrv-xxxxxxxxxxx.cn-hangzhou.privatelink.aliyuncs.com@EMR.C-DFD43******7C204.COM
kyuubi.kinit.keytab /opt/kyuubi/work-dir/kyuubi.keytab
kyuubi.files oss://bucket/path/to/kyuubi.keytab
| Parameter | Description |
|---|---|
kyuubi.authentication |
Authentication method. Set to KERBEROS. |
kyuubi.kinit.principal |
The principal Kyuubi Gateway uses for Kerberos authentication. Format: <user>/<host>@<realm>. |
kyuubi.kinit.keytab |
Local path to the keytab file inside the gateway. The path /opt/kyuubi/work-dir/ is fixed — only replace the filename. |
kyuubi.files |
OSS path of the keytab file uploaded in the previous step. Kyuubi downloads it to the path specified by kyuubi.kinit.keytab at startup. |
If your Spark jobs access a Kerberos-enabled Hive Metastore (HMS), add the following Spark configuration parameters:
spark.hadoop.hive.metastore.uris thrift://master-1-1.c-1d36*****e840c.cn-hangzhou.emr.aliyuncs.com:9083
spark.hadoop.hive.imetastoreclient.factory.class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientFactory
spark.hive.metastore.kerberos.principal hive/_HOST@EMR.C-DFD4*****C204.COM
spark.hive.metastore.sasl.enabled true
spark.emr.serverless.network.service.name <network_name>
| Parameter | Description |
|---|---|
spark.hadoop.hive.metastore.uris |
HMS address in Thrift format. |
spark.hadoop.hive.imetastoreclient.factory.class |
Factory class for creating the HMS client. Use org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientFactory. |
spark.hive.metastore.kerberos.principal |
HMS principal in the Kerberos environment. |
spark.hive.metastore.sasl.enabled |
Set to true to enable Kerberos authentication for HMS. |
spark.emr.serverless.network.service.name |
Name of the network connection between Serverless Spark and the EMR cluster. |
HMS address format rules:
-
High availability (HA) cluster: Specify multiple Thrift addresses separated by commas. Use hostnames — IP addresses are not supported in HA mode.
-
Single HMS address: You can use an IP address, but
spark.hive.metastore.kerberos.principalmust be in the formathive/<hostname-of-HMS>@<REALM>. -
The
hive/_HOST@<REALM>shorthand (where_HOSTresolves to the actual hostname at runtime) works only whenmetastore.urisuses a hostname.
Save the configuration and start Kyuubi Gateway.
Submit a job
Use a show databases query to verify that Kyuubi Gateway is working correctly with Kerberos authentication.
-
Log on to the Kerberos cluster and create a Kerberos user for job submission:
kadmin.local addprinc -randkey hadoop xst -kt /root/hadoop.keytab hadoop quit -
Authenticate using the keytab:
kinit -kt hadoop.keytab hadoopThis caches a TGT for the
hadoopprincipal, which kyuubi-beeline reads in the next step. -
Connect to Kyuubi Gateway and run a test query:
/opt/apps/KYUUBI/kyuubi-1.9.2-1.0.0/bin/kyuubi-beeline -u 'jdbc:hive2://ep-xxxxxxxxxxx.epsrv-xxxxxxxxxxx.cn-hangzhou.privatelink.aliyuncs.com:10009/;principal=kyuubi/_HOST@EMR.C-DFD43*****7C204.COM'Note about the JDBC URL:
-
principalspecifies the Kyuubi server principal. The value must matchkyuubi.kinit.principalon the server side. -
_HOSTis a placeholder that resolves to the actual hostname at runtime. -
The URL is wrapped in single quotes to prevent the shell from splitting the command at the
;character.
-
-
After connecting, run
show databasesto confirm that Spark jobs start successfully.
Configure proxy user access for HDFS and HMS
If Spark jobs need to access a Kerberos-enabled HMS or Hadoop Distributed File System (HDFS), add the following two properties to core-site.xml for the HADOOP-COMMON or HDFS component on the cluster:
hadoop.proxyuser.kyuubi.hosts = *
hadoop.proxyuser.kyuubi.groups = *
These properties allow the kyuubi service account (the user that starts Kyuubi Gateway) to impersonate other users when accessing HDFS or HMS on their behalf. The hosts value controls which host addresses the kyuubi account can connect from; the groups value controls which user groups it can impersonate. Without these properties, connections from Spark jobs running as non-kyuubi users may fail.
The wildcard values (*) grant access from all hosts and for all user groups. For production environments, replace * with specific values — for example, set hosts to the IP addresses of your Kyuubi Gateway nodes and groups to the groups that your client users belong to.
Newer versions of EMR DataLake clusters include these properties by default. After adding the properties, restart the HDFS or HMS service for the changes to take effect.