After you register an E-MapReduce (EMR) cluster with DataWorks, configure the Kyuubi connection information to control how DataWorks authenticates when running Kyuubi tasks. You can use the default EMR identity or a custom username and password.
Background information
Apache Kyuubi is a distributed and multi-tenant gateway that provides query services, such as SQL queries, for data lake query engines such as Spark, Flink, and Trino. For more information, see Kyuubi.
Prerequisites
Before you begin, make sure you have:
-
Added the Kyuubi service to your EMR cluster. For more information, see Add the Kyuubi service.
-
Attached the EMR cluster as a DataWorks computing resource and completed resource group initialization. For more information, see Data Development (new version): Attach an EMR computing resource.
You must complete resource group initialization when attaching the EMR computing resource. Without it, the Kyuubi configuration page is unavailable.
Configure the Kyuubi connection information
-
Go to the Kyuubi configuration page.
-
Log on to the DataWorks console. In the top navigation bar, select a region. In the left-side navigation pane, choose More > Management Center.
-
On the page that appears, select the target workspace from the drop-down list and click Go to Management Center.
-
In the left-side navigation pane, click Computing Resources.
-
Find the target EMR cluster and click Kyuubi Configuration > Edit Kyuubi Configuration.
-
-
Select a connection mode.
Connection mode Description When to use Connection Information of Alibaba Cloud EMR Cluster Uses the Default Access Identity you specified when registering the EMR cluster. Default. Use this when the cluster's built-in identity is sufficient for your tasks. Custom Configuration Information Uses a custom username and password to log on to Kyuubi via JDBC. Use this when you need a dedicated identity or custom credentials for Kyuubi access. -
(Optional) If you selected Custom Configuration Information, configure the JDBC URL. The JDBC URL format is:
jdbc:hive2://host:port/;user=<logon username>;password=<logon password>NoteThe first time you select Custom Configuration Information, the JDBC URL is automatically populated based on the account information you configured when registering the EMR cluster. You can modify the pre-filled URL.
How `DATAWORKS_PROXY_USER` affects the JDBC URL If you selected Pass Proxy User Information when registering the EMR cluster, DataWorks appends
hive.server2.proxy.userconfiguration to the JDBC URL each time an EMR task runs. The behavior depends on whether you include theDATAWORKS_PROXY_USERplaceholder in the URL:Scenario Behavior DATAWORKS_PROXY_USERis not in the JDBC URLDataWorks appends the hive.server2.proxy.uservalue to the end of the URL at task runtime.DATAWORKS_PROXY_USERis in the JDBC URLDataWorks replaces the placeholder with the hive.server2.proxy.uservalue at task runtime.
What's next
Follow the Data development process guide to configure component environments and run data development tasks in DataWorks.