All Products
Search
Document Center

E-MapReduce:Configure a StarRocks cluster to query data from a high-availability cluster

Last Updated:Feb 26, 2024

If you turn on High Service Availability for an E-MapReduce (EMR) cluster, you must make additional configurations when you query Hadoop Distributed File System (HDFS) data from the cluster. To ensure that you can query data from a high-availability cluster, we recommend that you follow the procedure in this topic to configure a StarRocks cluster.

Prerequisites

  • A cluster that contains the HDFS service is created and High Service Availability is turned on for the cluster. For example, you can create a DataLake cluster or a custom cluster. For more information, see Create a cluster.

    Note

    In this example, a DataLake cluster is created.

  • A StarRocks cluster is created. For more information, see Create a StarRocks cluster.

Limits

The preceding clusters must be deployed in the same virtual private cloud (VPC) and in the same zone.

Procedure

  1. Go to the Configure tab of the StarRocks service page.

    1. Log on to the EMR console. In the left-side navigation pane, click EMR on ECS.

    2. In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.

    3. On the EMR on ECS page, find the StarRocks cluster that you want to manage and click Services in the Actions column.

    4. On the Services tab, find the StarRocks service and click Configure.

  2. Add or modify configuration items on the hdfs-site.xml tab.

    1. On the Configure tab of the StarRocks service page, click the hdfs-site.xml tab.

    2. Add or modify the configuration items that are described in the following table.

      Configuration item

      Description

      dfs.nameservices

      Set this configuration item to the value that you specified for the dfs.nameservices configuration item in the hdfs-site.xml configuration file of the HDFS service in the DataLake cluster.

      Default value: hdfs-cluster.

      dfs.ha.namenodes.[nameservice ID]

      Set this configuration item to the value that you specified for the dfs.ha.namenodes.[nameservice ID] configuration item in the hdfs-site.xml configuration file of the HDFS service in the DataLake cluster.

      For clusters of EMR V3.X, the default value is nn1,nn2. For clusters of EMR V5.X, the default value is nn1,nn2,nn3.

      dfs.namenode.rpc-address.[nameservice ID].[name node ID]

      Set this configuration item to the value that you specified for the dfs.ha.namenodes.[nameservice ID] configuration item in the hdfs-site.xml configuration file of the HDFS service in the DataLake cluster.

      • For clusters of EMR V3.X, configure the dfs.namenode.rpc-address.hdfs-cluster.nn1 and dfs.namenode.rpc-address.hdfs-cluster.nn2 configuration items.

      • For clusters of EMR V5.X, configure the dfs.namenode.rpc-address.hdfs-cluster.nn1, dfs.namenode.rpc-address.hdfs-cluster.nn2, and dfs.namenode.rpc-address.hdfs-cluster.nn3 configuration items.

      dfs.client.failover.proxy.provider.[nameservice ID]

      Set this configuration item to the value that you specified for the dfs.ha.namenodes.[nameservice ID] configuration item in the hdfs-site.xml configuration file of the HDFS service in the DataLake cluster.

  3. Save the configurations.

    1. On the Configure tab of the StarRocks service page, click Save.

    2. In the dialog box that appears, configure the Execution Reason parameter and click Save.

  4. Restart the StarRocks service.

    1. On the StarRocks service page, choose More > Restart in the upper-right corner.

    2. In the dialog box that appears, configure the Execution Reason parameter and click OK.

    3. In the Confirm message, click OK.

      After the StarRocks service is restarted, you can use the StarRocks cluster to query HDFS data in the DataLake cluster.

References

If you want to query data from a high-security cluster for which Kerberos authentication is enabled, you must use a valid Kerberos credential for authentication when you query data from the cluster. For more information, see Configure a StarRocks cluster to query data from a high-security cluster.