E-MapReduce (EMR) Serverless Spark supports connecting to an external Hive Metastore service to access data stored in the Hive Metastore. This topic describes how to configure this connection in EMR Serverless Spark to efficiently manage and use data resources in your operating environment.
Prerequisites
A workspace and an SQL session have been created. For more information, see Create a workspace and Manage SQL sessions.
Limitations
You must restart existing sessions in the workspace to use Hive Metastore.
After you set Hive Metastore as the default data catalog, your flow tasks use it by default.
Procedure
Step 1: Prepare the Hive Metastore service
This topic uses the Hive Metastore of EMR on ECS as an example of an external service. If a Hive Metastore service already exists in your virtual private cloud (VPC), you can skip this step.
On the EMR on ECS page, you can create a DataLake cluster that includes the Hive service and has Metadata set to Built-in MySQL. For more information, see Create a cluster.
Log on to the master node of the cluster using Secure Shell (SSH). For more information, see Log on to a cluster.
Run the following command to enter the Hive command line.
hiveRun the following command to create a table named `dw_users` that points to Object Storage Service (OSS) and then write data to the table.
CREATE TABLE `dw_users`( `name` string) LOCATION 'oss://<yourBucket>/path/to/file'; INSERT INTO dw_users select 'Bob';
Step 2: Add a network connection
Go to the Network Connectivity page.
Log on to the EMR console.
In the navigation pane on the left, choose .
On the Spark page, click the target workspace.
On the EMR Serverless Spark page, click Network Connectivity in the navigation pane on the left.
On the Network Connectivity page, click Add Network Connection.
In the Add Network Connection dialog box, configure the following parameters and click OK.
Parameter
Description
Connection Name
Enter a name for the new connection.
VPC
Select the same VPC as the EMR cluster.
VSwitch
Select the same vSwitch that is in the same VPC as the EMR cluster.
The network connection is established when the Status is Successful.

Step 3: Open the Hive Metastore service port
Obtain the CIDR block of the vSwitch specified in the network connection.
Log in to the VPC console or the and navigate to the VSwitches page to find the CIDR block of the vSwitch.

Add a security group rule.
Log on to the EMR on ECS console.
On the EMR on ECS page, click the target cluster ID.
On the Basic Information page, click the link for Cluster Security Group.
On the Security Groups page, click Add Rule. Set Source and Port, and then click OK.
Parameter
Description
Port
Enter port 9083.
Source
Enter the CIDR block of the vSwitch that you obtained in the previous step.
ImportantTo prevent security risks from external attacks, do not set Source to 0.0.0.0/0.
Step 4: Connect to Hive Metastore
On the EMR Serverless Spark page, in the navigation pane on the left, click Data Catalog.
On the Data Catalog page, click Add Data Catalog.
In the dialog box, click External Hive Metastore. Configure the following parameters and click OK.

Parameter
Description
Network Connectivity
Select the network connection that you added in Step 2.
Metastore Endpoint
The URI of the Hive Metastore. The format is
thrift://<IP address of Hive metastore>:9083.<IP address of Hive metastore>is the internal IP address of the HMS service. In this example, it is the internal IP address of the master node of the EMR cluster. You can find it on the Node Management page of the EMR cluster.NoteIf high availability (HA) is enabled for HMS, you can enter the endpoints of multiple nodes. Separate the URIs with commas (,), for example: thrift://<IP address of Hive metastore 1>:9083,thrift://<IP address of Hive metastore 2>:9083.
Kerberos Keytab File Path
The path to the Kerberos keytab file.
Kerberos Principal
The name of the principal contained in the keytab file. It is used for identity verification with the Kerberos service. You can run the
klist -kt <keytab file>command to view the principal name in the target keytab file.
Step 5: Use Hive Metastore
On the Data Catalog page, find hive_metastore and click Set As Default in the Actions column to make it the default data catalog for the workspace.

Restart the SQL session.
If an SQL session already exists in your workspace, you must stop and then restart the session for the hive_metastore settings to take effect.
Query data from a table in the Hive Metastore using SQL Developer.
Create a new SQL job. For more information, see Develop a SparkSQL job.
Run the following command to query the `dw_users` table.
SELECT * FROM dw_users;