All Products
Search
Document Center

E-MapReduce:Create an SSH tunnel to access web UIs of open source components

Last Updated:Feb 21, 2024

In an E-MapReduce (EMR) cluster, the ports over which you can access the web UIs of open source components such as Hadoop, Spark, and Flink are disabled for security purposes. Therefore, you cannot directly access the web UIs of the open source components. EMR allows you to access the web UIs of open source components in the EMR console or by creating an SSH tunnel on your on-premises server. This topic describes how to create an SSH tunnel to access the web UIs of open source components.

Prerequisites

  • An EMR cluster is created. For more information, see Create a cluster.

  • Your on-premises server is connected to the master node of the cluster. You can turn on Assign Public Network IP during cluster creation to associate an elastic IP address (EIP) with your cluster. You can also assign a fixed public IP address or an EIP address to the master node of your cluster in the ECS console after the cluster is created. For more information, see Bind an ENI.

Obtain the name and the public IP address of the master node

  1. Go to the Nodes tab.

    1. Log on to the EMR console.

    2. In the top navigation bar, select the region in which your cluster resides and select a resource group based on your business requirements.

    3. On the EMR on ECS page, find the desired cluster and click Nodes in the Actions column.

  2. On the Nodes tab, find the master node group and click the open icon.

  3. In the Node Name/ID and Public IP Address columns, view the name and public IP address of the master node.

    IP_Address

Enable dynamic port forwarding

Create an SSH tunnel to allow communication between a port of your on-premises server and the master node of an EMR cluster. Run the on-premises SOCKS proxy server that listens on the port. The port data is forwarded to the master node of the EMR cluster by using the SSH tunnel.

  1. Create an SSH tunnel.

    Important

    Keep your on-premises server running after the tunnel is created. No responses are returned.

    • Use a key:

      ssh -i <Storage path of the key file> -N -D 8157 root@<Public IP address of the master node>
    • Use a username and a password:

      ssh -N -D 8157 root@<Public IP address of the master node>

    Parameter description:

    • 8157: Port 8157 is used in this example. You can replace this port with an unoccupied port on your on-premises server in actual configuration.

    • -D: Dynamic port forwarding is enabled. Start the SOCKS proxy process to listen on the port.

    • <Public IP address of the primary node>: For more information about how to obtain the public IP address of the master node, see Obtain the name and the public IP address of the master node.

    • <Storage path of the key file>: the path where the key file is stored.

  2. Configure the Google Chrome browser.

    You can use one of the following methods to configure the Google Chrome browser:

    • Use the CLI

      1. Open the CLI and go to the local installation directory of the Google Chrome browser client.

        The default installation directory of Google Chrome depends on the operating system.

        Operating system

        Default installation directory of Google Chrome

        macOS X

        /Applications/Google\ Chrome.app/Contents/macOS

        Linux

        /usr/bin/google-chrome

        Windows

        C:\Program Files (x86)\Google\Chrome\Application\

      2. Run the following command in the default installation directory of Google Chrome:

        Windows and Linux

        chrome --proxy-server="socks5://localhost:8157" --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" --user-data-dir=/tmp/

        Mac OS X

        ./Google\ Chrome --proxy-server="socks5://localhost:8157" --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" --user-data-dir=/tmp/

        Parameter description:

        • /tmp/: For Windows, replace /tmp/ with a path similar to /c:/tmppath/. For Linux and Mac OS X, the format of a temporary file path is /tmp/.

        • 8157: Port 8157 is used in this example. You can replace this port with an unoccupied port on your on-premises server in actual configuration.

      3. Enter http://<Name of the master node>:<Port number> in the address bar of the browser and press Enter to access a specific web UI.

        For more information about the ports of components, see Common ports of services. For more information about how to obtain the name of the master node, see Obtain the name and the public IP address of the master node.

        For example, enter http://emr-header-1:8088 in the address bar of the browser and press Enter to access the web UI of YARN.

    • Use a Google Chrome extension

      Extensions allow you to easily manage and use proxies in your web browser. You can use an extension to browse web pages and access web UIs at the same time.

      1. Add the Google Chrome extension Proxy SwitchyOmega.

      2. Click this extension and select Options from the shortcut menu.

      3. On the SwitchyOmega page, click New profile in the left-side navigation pane. In the New Profile dialog box, enter a profile name, such as SSH tunnel, in the Profile name field, select PAC Profile, and then click Create.

      4. Enter the following content in the PAC Script editor:

        function regExpMatch(url, pattern) {    
          try { return new RegExp(pattern).test(url); } catch(ex) { return false; }    
        }
        
        function FindProxyForURL(url, host) {
           
            if (shExpMatch(url, "*localhost*")) return "SOCKS5 localhost:8157";
            if (shExpMatch(url, "*emr-header*")) return "SOCKS5 localhost:8157";
            if (shExpMatch(url, "*emr-worker*")) return "SOCKS5 localhost:8157";
            if (shExpMatch(url, "*master*")) return "SOCKS5 localhost:8157";
            if (shExpMatch(url, "*core*")) return "SOCKS5 localhost:8157";
        
            return 'DIRECT';
        }
      5. In the left-side navigation pane, click Apply changes to complete the configurations.

      6. Open Google Chrome. Click the SwitchyOmega extension. Then, select the created SSH tunnel.

      7. Enter http://<Name of the master node>:<Port number> in the address bar of the browser and press Enter to access a specific web UI.

        For more information about the ports of components, see Common ports of services. For more information about how to obtain the name of the master node, see Obtain the name and the public IP address of the master node.

        For example, enter http://emr-header-1:8088 in the address bar of the browser and press Enter to access the web UI of YARN.

Enable local port forwarding

Important

If you use this method to access a web UI, you cannot go to the job details page.

You can use the local port forwarding method to forward data on a port of the master node to the local port and access the web application interface running on the master node. The SOCKS proxy is not required.

  1. Run the following command on your on-premises server to create an SSH tunnel:

    Important

    Keep your on-premises server running after the tunnel is created. No responses are returned.

    • Use a key:

      ssh -i <Storage path of the key file> -N -L 8157:<Name of the master node>:8088 root@<Public IP address of the master node>
    • Use a username and a password:

      ssh -N -L 8157:<Name of the master node>:8088 root@<Public IP address of the master node>

    Parameter description:

    • -L: Local port forwarding is enabled. You can specify a local port to forward data to the remote port that is hosted on the on-premises web server of the master node.

    • 8088: the port that is used to access ResourceManager on the master node. You can replace this port based on your business requirements.

      For more information about the ports of components, see Common ports of services. For more information about how to obtain the name of the master node, see Obtain the name and the public IP address of the master node.

    • 8157: Port 8157 is used in this example. You can replace this port with an unoccupied port on your on-premises server in actual configuration.

    • <Public IP address of the primary node>: For more information about how to obtain the public IP address of the master node, see Obtain the name and the public IP address of the master node.

    • <Storage path of the key file>: the path where the key file is stored.

  2. Keep your on-premises server running. Open a browser, enter http://localhost:8157/ in the address bar of the browser, and then press Enter.

Common ports of services

Service

Port

Description

Hadoop 2.X

50070

The web UI port of HDFS.

Parameter: dfs.namenode.http-address or dfs.http.address.

Note

The dfs.http.address parameter has expired but can still be used.

50075

The web UI port of DataNode.

50010

The service port of DataNode. This port is used to transfer data.

50020

The port of the inter-process communication (IPC) service.

8020

The remote procedure call (RPC) port of HDFS in a high-availability (HA) cluster.

8025

The port of ResourceManager.

Parameter: yarn.resourcemanager.resource-tracker.address.

9000

The RPC port of HDFS in a non-HA cluster.

Parameter: fs.defaultFS or fs.default.name.

Note

The fs.default.name parameter has expired but can still be used.

8088

The web UI port of YARN.

8485

The RPC port of JournalNode.

8019

The port of ZKFailoverController (ZKFC).

19888

The web UI port of JobHistory Server.

Parameter: mapreduce.jobhistory.webapp.address.

10020

The web UI port of JobHistory Server.

Parameter: mapreduce.jobhistory.address.

Hadoop 3.X

8020

The port of NameNode.

Parameter: dfs.namenode.http-address or dfs.http.address.

Note

The dfs.http.address parameter has expired but can still be used.

9870

9871

The port of NameNode.

9866

The port of DataNode.

9864

The port of DataNode.

9865

The port of DataNode.

8088

The port of ResourceManager.

Parameter: yarn.resourcemanager.webapp.address.

MapReduce

8021

The port of JobTracker.

Parameter: mapreduce.jobtracker.address.

Zookeeper

2181

The port that is used to connect a client to ZooKeeper.

2888

The internal communication port of a ZooKeeper cluster. The leader listens on this port.

3888

The ZooKeeper port that is used to elect a leader.

HBase

16010

The web UI port of the master node of HBase.

Parameter: hbase.master.info.port.

16000

The port of HMaster.

Parameter: hbase.master.port.

16030

The web UI management port of RegionServer of HBase.

Parameter: hbase.regionserver.info.port.

16020

The port of HRegionServer.

Parameter: hbase.regionserver.port.

9099

The port of Thrift Server.

Hive

9083

The default listening port of the MetaStore service.

10000

The Java Database Connectivity (JDBC) port of Hive.

10001

The JDBC port of Spark Thrift Sever.

Spark

7077

  • The port on which the master node of Spark communicates with the worker nodes.

  • The port on which a standalone cluster submits applications.

8080

The web UI port of the master node. This port is used to schedule resources.

8081

The web UI port of a worker node. This port is used to schedule resources.

4040

The web UI port of Driver. This port is used to schedule tasks.

18080

The web UI port of Spark History Server.

Kafka

9092

The RPC port that is used for communication among the nodes of a Kafka cluster.

Redis

6379

The port of the Redis service.

HUE

8888

The web UI port of Hue.

Oozie

11000

The web UI port of Oozie.

Druid

18888

The web UI port of Druid.

18090

The port of Overlord.

Parameter: druid.plaintextPort on the overlord.runtime tab.

18091

The port of MiddleManager.

Parameter: druid.plaintextPort on the middleManager.runtime tab.

18081

The port of Coordinator.

Parameter: druid.plaintextPort on the coordinator.runtime tab.

18083

The port of Historical.

Parameter: druid.plaintextPort on the historical.runtime tab.

18082

The port of Broker.

Parameter: druid.plaintextPort on the broker.runtime tab.

Ganglia

9292

The web UI port of Ganglia.

Ranger

6080

The web UI port of Ranger.

Kafka Manager

8085

The port of Kafka Manager.

Superset

18088

The web UI port of Superset.

Impala

21050

The JDBC port that is used to connect to Impala.

Presto

9090

The web UI port of Presto.

References

For more information about how to access the web UIs of open source components in the EMR console, see Access the web UIs of open source components.