All Products
Search
Document Center

E-MapReduce:Set metadata for a Spark cluster

Last Updated:Mar 26, 2026

E-MapReduce (EMR) on ACK supports two metadata management options for Spark clusters: Data Lake Formation (DLF), a managed service, or a self-managed Hive metastore. This topic describes how to configure each option.

Choose a metadata management method

DLF (recommended) Self-managed Hive metastore
Management Fully managed by Alibaba Cloud Managed by you
Best for Production environments where you do not need to maintain independent metadatabases; environments with multiple big data compute engines (MaxCompute, Hologres, Machine Learning Platform for AI); or environments with multiple EMR clusters Existing Hive metastore deployments you want to reuse
Setup effort Enable with one click in the console Configure a Thrift URI and deploy client configuration

Prerequisites

Before you begin, ensure that you have:

  • A Spark cluster created on the EMR on ACK page of the E-MapReduce console. For more information, see Step 2: Create a cluster

  • (If using DLF) Data Lake Formation (DLF) activated. For more information, see Quick start

  • (If using a self-managed Hive metastore) A self-managed Hive metastore is created and accessible from the Container Service for Kubernetes (ACK) clusters you created

Method 1 (recommended): Manage metadata by using DLF

  1. Log on to the EMR on ACK console. On the EMR on ACK page, find your Spark cluster and click its name.

  2. On the Cluster Details tab, click Enable next to Data Lake Formation (DLF).

  3. In the Enable DLF dialog, click OK.

Job data submitted to the Spark cluster is automatically imported to DLF.

Method 2: Manage metadata by using a self-managed Hive metastore

  1. Log on to the EMR on ACK console. On the EMR on ACK page, find your Spark cluster and click Configure in the Actions column.

  2. On the Configure tab, click the spark-defaults.conf tab.

  3. Click Add Configuration Item and set the following parameters:

    Parameter Value
    Key spark.hadoop.hive.metastore.uris
    Value thrift://<IP address of the self-managed Hive metastore>:9083

    Replace <IP address of the self-managed Hive metastore> with the IP address of your Hive metastore. The value uses the Thrift protocol on port 9083.

  4. Click OK. In the dialog that appears, enter a reason in the Execution Reason field and click Save.

  5. At the bottom of the Configure tab, click Deploy Client Configuration. In the dialog that appears, enter a reason in the Execution Reason field, click OK, and then click OK in the Confirm dialog.

Job data submitted to the Spark cluster is automatically imported to the self-managed Hive metastore.