All Products
Search
Document Center

E-MapReduce:Use Kerberos with Kyuubi Gateway

Last Updated:Mar 26, 2026

Serverless Spark supports Kerberos authentication for the Kyuubi Gateway, enforcing identity verification and access control for all job submissions. After you complete the setup, clients must authenticate with Kerberos before submitting tasks to the gateway.

How it works

Kerberos uses a ticket-based authentication model. Before a client can connect to Kyuubi Gateway, it exchanges credentials with the Key Distribution Center (KDC) to obtain tickets. The following steps describe the authentication flow:

  1. The Kerberos client sends a principal and secret (password or keytab file) to the KDC.

  2. The KDC returns a ticket-granting ticket (TGT).

  3. The client stores the TGT in a ticket cache.

  4. The JDBC client (kyuubi-beeline) reads the TGT from the cache.

  5. The JDBC client sends the TGT and the Kyuubi server principal to the KDC.

  6. The KDC returns a client-to-server ticket.

  7. The JDBC client presents the client-to-server ticket to Kyuubi Gateway to authenticate.

Understanding this flow explains why each configuration step is required: the keytab lets Kyuubi Gateway authenticate itself to the KDC, the kinit step gets a TGT for the client, and the JDBC URL principal parameter tells the client which server to request a ticket for.

Limitations

  • The E-MapReduce (EMR) cluster and the Serverless Spark workspace must be in the same region.

  • Only one Kyuubi Gateway can be created in a workspace where Kerberos is enabled.

Prerequisites

Before you begin, make sure that you have:

Create network connectivity

Kyuubi Gateway runs inside Serverless Spark. To reach the Kerberos cluster, configure PrivateLink to establish a private network connection between the two environments.

Create an endpoint

An endpoint is created and maintained by the service consumer. Associate it with an endpoint service to access the Kerberos cluster through PrivateLink. For details, see Endpoints.

  1. Log on to the Endpoint console.

  2. On the Create Endpoint page, configure the following parameters and click OK.

    Parameter Description
    Region Select the region where the endpoint resides. Must match the region of the Kerberos cluster and the Serverless Spark workspace.
    Endpoint name Enter a custom name for the endpoint.
    Endpoint type Select Interface Endpoint.
    Endpoint service Click Select Service, then select or enter the target endpoint service ID.
    Note

    To get the endpoint service ID, submit a ticket with the following information: the Serverless Spark workspace ID (for example, w-f8cfXXXXXX), the VPC ID of the Kerberos cluster (for example, vpc-bp1tXXXXXX), and the two zones that have available vSwitches (for example, I,J). Contact customer service in the ticket to find supported zones for your region.

    VPC Select the VPC of the Kerberos cluster.
    Security groups Select the security group to associate with the endpoint elastic network interface (ENI). By default, you can add up to nine security groups per endpoint.
    Zone and vSwitch Select the zones specified in the ticket and their corresponding vSwitches.
    IP version Select the network type: IPv4 for IPv4-only client access, or Dual-stack for both IPv4 and IPv6. Dual-stack requires the service provider to complete dual-stack configuration first.
    Resource group Select the resource group for the endpoint.
    Tag Enter a Tag key and Tag value if needed.

    image

  3. On the Basic information page, confirm that Status is Active. The endpoint domain name is in this format:

    ep-xxxxxxxxxxx.epsrv-xxxxxxxxxxx.cn-hangzhou.privatelink.aliyuncs.com
  4. Log on to the Kerberos cluster and verify that the network connection is working.

    image

Configure domain name resolution (optional)

The default endpoint domain name is long. For convenience, map it to a shorter custom name using an internal authoritative DNS zone. For details, see Internal authoritative domain names.

  1. Log on to the console. On the Authoritative zone tab, click User defined zones, then click Add zone.

  2. Enter an authoritative zone name, select the VPCs where the domain name applies, and click OK. This guide uses kyuubi-kerberos.abc as an example.

    If the Domain name type option is available, select Internal authoritative acceleration zone. If the option is not shown, the zone type defaults to internal authoritative acceleration automatically.
  3. On the User defined zones tab, find the zone and click Settings in the Actions column. In the dialog box, click Add record and select Form editor mode.

  4. Set Record type to CNAME. For Hostname, enter a value such as test. For Record value, enter the endpoint domain name:

    ep-xxxxxxxxxxx.epsrv-xxxxxxxxxxx.cn-hangzhou.privatelink.aliyuncs.com

    After you click OK, the domain test.kyuubi-kerberos.abc maps to the endpoint.

  5. Log on to the Kerberos cluster and test connectivity:

    ping test.kyuubi-kerberos.abc

Create a keytab

Run the following steps on the Kerberos cluster.

  1. Log on to the Kerberos cluster.

  2. Open the Kerberos admin tool:

    kadmin.local
  3. Create a principal in the format kyuubi/<fqdn>@<REALM>. For <fqdn>, use the endpoint domain name. If you configured a CNAME, use the custom domain name instead:

    addprinc -randkey kyuubi/ep-xxxxxxxxxxx.epsrv-xxxxxxxxxxx.cn-hangzhou.privatelink.aliyuncs.com@EMR.C-DFD4*****C204.COM
  4. Export the keytab file and exit:

    xst -kt /root/kyuubi.keytab kyuubi/ep-xxxxxxxxxxx.epsrv-xxxxxxxxxxx.cn-hangzhou.privatelink.aliyuncs.com@EMR.C-DFD4*****C204.COM
    quit
  5. Upload the keytab to an Object Storage Service (OSS) bucket:

    hadoop fs -put /root/kyuubi.keytab oss://<YOUR_BUCKET>.<region>.oss-dls.aliyuncs.com/

Configure Kyuubi Gateway

Add the following Kyuubi configuration parameters to the gateway:

kyuubi.authentication              KERBEROS
kyuubi.kinit.principal             kyuubi/ep-xxxxxxxxxxx.epsrv-xxxxxxxxxxx.cn-hangzhou.privatelink.aliyuncs.com@EMR.C-DFD43******7C204.COM
kyuubi.kinit.keytab                /opt/kyuubi/work-dir/kyuubi.keytab
kyuubi.files                       oss://bucket/path/to/kyuubi.keytab
Parameter Description
kyuubi.authentication Authentication method. Set to KERBEROS.
kyuubi.kinit.principal The principal Kyuubi Gateway uses for Kerberos authentication. Format: <user>/<host>@<realm>.
kyuubi.kinit.keytab Local path to the keytab file inside the gateway. The path /opt/kyuubi/work-dir/ is fixed — only replace the filename.
kyuubi.files OSS path of the keytab file uploaded in the previous step. Kyuubi downloads it to the path specified by kyuubi.kinit.keytab at startup.

If your Spark jobs access a Kerberos-enabled Hive Metastore (HMS), add the following Spark configuration parameters:

spark.hadoop.hive.metastore.uris                    thrift://master-1-1.c-1d36*****e840c.cn-hangzhou.emr.aliyuncs.com:9083
spark.hadoop.hive.imetastoreclient.factory.class    org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientFactory
spark.hive.metastore.kerberos.principal             hive/_HOST@EMR.C-DFD4*****C204.COM
spark.hive.metastore.sasl.enabled                   true
spark.emr.serverless.network.service.name           <network_name>
Parameter Description
spark.hadoop.hive.metastore.uris HMS address in Thrift format.
spark.hadoop.hive.imetastoreclient.factory.class Factory class for creating the HMS client. Use org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientFactory.
spark.hive.metastore.kerberos.principal HMS principal in the Kerberos environment.
spark.hive.metastore.sasl.enabled Set to true to enable Kerberos authentication for HMS.
spark.emr.serverless.network.service.name Name of the network connection between Serverless Spark and the EMR cluster.

HMS address format rules:

  • High availability (HA) cluster: Specify multiple Thrift addresses separated by commas. Use hostnames — IP addresses are not supported in HA mode.

  • Single HMS address: You can use an IP address, but spark.hive.metastore.kerberos.principal must be in the format hive/<hostname-of-HMS>@<REALM>.

  • The hive/_HOST@<REALM> shorthand (where _HOST resolves to the actual hostname at runtime) works only when metastore.uris uses a hostname.

Save the configuration and start Kyuubi Gateway.

Submit a job

Use a show databases query to verify that Kyuubi Gateway is working correctly with Kerberos authentication.

  1. Log on to the Kerberos cluster and create a Kerberos user for job submission:

    kadmin.local
    addprinc -randkey hadoop
    xst -kt /root/hadoop.keytab hadoop
    quit
  2. Authenticate using the keytab:

    kinit -kt hadoop.keytab hadoop

    This caches a TGT for the hadoop principal, which kyuubi-beeline reads in the next step.

  3. Connect to Kyuubi Gateway and run a test query:

    /opt/apps/KYUUBI/kyuubi-1.9.2-1.0.0/bin/kyuubi-beeline -u 'jdbc:hive2://ep-xxxxxxxxxxx.epsrv-xxxxxxxxxxx.cn-hangzhou.privatelink.aliyuncs.com:10009/;principal=kyuubi/_HOST@EMR.C-DFD43*****7C204.COM'

    Note about the JDBC URL:

    • principal specifies the Kyuubi server principal. The value must match kyuubi.kinit.principal on the server side.

    • _HOST is a placeholder that resolves to the actual hostname at runtime.

    • The URL is wrapped in single quotes to prevent the shell from splitting the command at the ; character.

  4. After connecting, run show databases to confirm that Spark jobs start successfully.

    image

Configure proxy user access for HDFS and HMS

If Spark jobs need to access a Kerberos-enabled HMS or Hadoop Distributed File System (HDFS), add the following two properties to core-site.xml for the HADOOP-COMMON or HDFS component on the cluster:

hadoop.proxyuser.kyuubi.hosts = *
hadoop.proxyuser.kyuubi.groups = *

These properties allow the kyuubi service account (the user that starts Kyuubi Gateway) to impersonate other users when accessing HDFS or HMS on their behalf. The hosts value controls which host addresses the kyuubi account can connect from; the groups value controls which user groups it can impersonate. Without these properties, connections from Spark jobs running as non-kyuubi users may fail.

The wildcard values (*) grant access from all hosts and for all user groups. For production environments, replace * with specific values — for example, set hosts to the IP addresses of your Kyuubi Gateway nodes and groups to the groups that your client users belong to.

Newer versions of EMR DataLake clusters include these properties by default. After adding the properties, restart the HDFS or HMS service for the changes to take effect.