If the jobs and workflows for a cluster in the E-MapReduce (EMR) console do not meet your business requirements, you can log on to the master node of the cluster to perform the required operations.

Prerequisites

You can access the domain chrome.google.com if you need to add a Google Chrome extension.

Environment variables of a cluster

Environment variables are configured for your reference. The following environment variables are frequently used:
  • JAVA_HOME
  • HADOOP_HOME
  • HADOOP_CONF_DIR
  • HADOOP_OG_DIR
  • YARN_LOG_DIR
  • HIVE_HOME
  • HIVE_CONF_DIR
  • PIG_HOME
  • PIG_CONF_DIR
Note We recommend that you do not change the values of these variables. Otherwise, unexpected errors may occur.

Log on to the master node

  1. Run the following command to log on to the master node:
    ssh root@ip.of.master

    Obtain the public IP address of the master node in the Instance Info section of the Cluster Overview page.

    Note The master node is configured with a public IP address by default.
  2. Enter the password you configured when you created the cluster.

Use SSH to log on to a Linux master node without a password

You may need to frequently log on to the cluster for management. To facilitate your operations, you can perform an SSH login without password from your local machine. Follow these steps if your local machine runs the Linux operating system:
  1. Log on to the master node.
  2. Switch to the hadoop or hdfs user.
  3. Copy the SSH private key to your local machine.
    sz ~/.ssh/id_rsa
  4. Go back to your local machine and log on to the master node again.
    ssh -i Private key storage path/id_rsa hadoop@120. *. *. *

    If you only have this one SSH private key, you can store it in the ~/.ssh/ directory. This way, you can log on to the master node without using the -i option to specify an SSH private key.

Use SSH to log on to a Windows master node without a password

You may need to frequently log on to the cluster for management. To facilitate your operations, you can perform an SSH login without password from your local machine. Follow one of the following methods if your local machine runs the Windows operating system:
  • Method 1: Use PuTTY for logon.
    1. Download PuTTY and PuTTYgen.
    2. Run PuTTYgen to load the SSH private key.
      Notice Keep the SSH private key safe. If you accidentally lose or disclose your SSH private key, immediately generate a new SSH private key.
    3. Click Save private key under default configurations.

      A PuTTY private key file is generated. The file name extension is ppk.

    4. Run PuTTY and enter the public IP address of the target master node on the Session page.
      Note Enter the public IP address with the logon username, for example, hadoop@Master node IP address.
    5. In the left-side navigation pane, choose Connection > SSH > Auth to configure the generated .ppk file.
    6. Click Open. You are redirected to the master node.
  • Method 2: Use Cygwin or MinGW for logon.

    Cygwin and MinGW are easy-to-use tools to simulate Linux environment in Windows. The operations are the same as those for logging on to a Linux master node. For more information, see Use SSH to log on to a Linux master node without a password.

    We recommend that you use MinGW because it is more light-weight than Cygwin. If you cannot access the MinGW official website, download Git for Windows and use the built-in Git BASH instead.

View the Web UIs of Hadoop, Spark, and Ganglia

The ports of Web UIs for monitoring Hadoop, Spark, and Ganglia in an EMR cluster are disabled for security purposes. If you want to access these Web UIs, you need to create an SSH tunnel and enable port forwarding. Port forwarding methods are described as follows:
Notice The following operations are performed on your local machine, not on a node of the cluster.
  • Method 1: dynamic port forwarding
    • Use a private key:

      Create an SSH tunnel to allow communication between your local machine and a dynamic port on the master node of an EMR cluster.

      ssh -i /path/id_xxx -ND 8157 root@masterNodeIP
    • Use a username and password:
      ssh -ND 8157 root@masterNodeIP
      Note Replace 8157 with an unoccupied port number on your local machine.
    After dynamic port forwarding is enabled, you can view a Web UI by using one of the following methods:
    • (Recommended) Google Chrome
      Run the following command to access a Web UI:
      chrome --proxy-server="socks5://localhost:8157" --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" --user-data-dir=/tmp/

      For Windows, an example of a temporary file path is d:/tmppath. For Linux and Mac OS X, the format of a temporary file path is /tmp/.

      The installation path of Google Chrome depends on the operating system. For more information, see the following table.
      Operating system Google Chrome installation path
      Mac OS X /Applications/Google Chrome.app/Contents/MacOS/Google Chrome
      Linux /usr/bin/google-chrome
      Windows C:\Program Files (x86)\Google\Chrome\Application\chrome.exe
    • Extension
      • Use a Google Chrome extension to view the Web UIs.

        By using this method, you can browse web pages and access a cluster Web UI at the same time.

        1. Add the Google Chrome extension Proxy SwitchyOmega.
        2. Click this extension and choose Options from the shortcut menu.
        3. On the SwitchyOmega page that appears, click New profile in the left-side navigation pane. In the New Profile dialog box that appears, set Profile name, for example, enter SSH tunnel, and select PAC Profile.
        4. Enter the following content in the PAC Script editor:
          function regExpMatch(url, pattern) {    
            try { return new RegExp(pattern).test(url); } catch(ex) { return false; }    
          }
          
          function FindProxyForURL(url, host) {
              // Important: replace 172.31 below with the proper prefix for your VPC subnet
          
              if (shExpMatch(url, "*localhost*")) return "SOCKS5 localhost:8157";
              if (shExpMatch(url, "*emr-header*")) return "SOCKS5 localhost:8157";
              if (shExpMatch(url, "*emr-worker*")) return "SOCKS5 localhost:8157";
          
              return 'DIRECT';
          }
        5. In the left-side navigation pane, click Apply changes to complete the configurations.
        6. Start the command line and run one of the following commands:
          // Method 1: Use a private key.
          ssh -i /path/id_xxx -ND 8157 hadoop@masterNodeIP
          // Method 2: Use a username and password.
          ssh -ND 8157 hadoop@masterNodeIP
        7. After the command is executed, click the SwitchyOmega extension. Then, select the created SSH tunnel.
        8. In the address bar, enter the IP address of a node and the port number to access a specific Web UI.

          This node refers to the node that you want to log on to by using the preceding SSH commands. In most cases, it is a master node. Two frequently-used ports are port 8088 for YARN and port 50070 for HDFS.

      • Configure a local proxy to view the Web UIs.
        After SSH login without password is enabled between your local machine and the master node of an EMR cluster, you need to configure a local proxy to view the Web UI of Hadoop, Spark, or Ganglia on the browser. Detailed steps are as follows:
        1. Assume that you use Google Chrome or Firefox. Click Download FoxyProxy Standard.
        2. After the installation is complete, restart your browser, open a text editor, and enter the following content:
          <? xml version="1.0" encoding="UTF-8"? >
          <foxyproxy>
          <proxies>
          <proxy name="aliyun-emr-socks-proxy" id="2322596116" notes="" fromSubscription="false" enabled="true" mode="manual" selectedTabIndex="2" lastresort="false" animatedIcons="true" includeInCycle="true" color="#0055E5" proxyDNS="true" noInternalIPs="false" autoconfMode="pac" clearCacheBeforeUse="false" disableCache="false" clearCookiesBeforeUse="false" rejectCookies="false">
          <matches>
          <match enabled="true" name="120.*" pattern="http://120.*" isRegEx="false" isBlackList="false" isMultiLine="false" caseSensitive="false" fromSubscription="false" ></match>
          </matches>
          <manualconf host="localhost" port="8157" socksversion="5" isSocks="true" username="" password="" domain="" ></manualconf>
          </proxy>
          </proxies>
          </foxyproxy>

          Key parameter description:

          • 8157: the local port used to establish an SSH connection between your local machine and the master node of an EMR cluster. This port must match the port you used in the SSH command executed on the command line.
          • 120.*: used to match the IP address of the master node. Use the actual IP address of the master node to replace this value.
        3. In the browser, choose Foxyproxy > Options.
        4. In the Import FoxyProxy Setting dialog box that appears, click Import/Export to upload the .xml file you have edited, and click Add.
        5. In the browser, choose Foxyproxy > Use Proxy aliyun-emr-socks-proxy for all URLs.
        6. Type localhost:8088 in the address bar and press Enter. The Web UI of Hadoop appears.
  • Method 2: local port forwarding
    Notice If you use this method to view a Web UI, errors occur when you try to open subpages.
    • Use a private key:
      ssh -i /path/id_rsa -N -L 8157:masterNodeIP:8088 hadoop@masterNodeIP
    • Use a username and password:
      ssh -N -L 8157:masterNodeIP:8088 hadoop@masterNodeIP 
    Key parameter description:
    • path: the path where a private key is stored
    • masterNodeIP: the IP address of the master node you want to access
    • 8088: the port used to access the ResourceManager process on the master node