This topic describes how to configure Knox in the E-MapReduce (EMR) console and how to use a Knox account to access web UIs of open source components such as Hadoop Distributed File System (HDFS), YARN, Spark, and Ganglia over the Internet.

Prerequisites

An EMR cluster is created. For more information, see Create a cluster.

Preparations

  • Configure a security group rule
    1. Obtain the public IP address of your on-premises machine.

      For security purposes, we recommend that you allow only access from the current public IP address when you configure a security group rule. To obtain your current public IP address, visit http://myip.ipip.net/.

    2. Enable port 8443.
      1. Log on to the Alibaba Cloud EMR console and go to the Cluster Overview page of your cluster. In the Network Info section, click the link of Security Group ID.
      2. On the Security Group Rules page, click Add Security Group Rule in the upper-right corner.
      3. In the Add Security Group Rule dialog box, set Port Range to 8443/8443.
      4. Set Authorization Object to the public IP address obtained in Step i.
      5. Click OK.
    Notice
    • To prevent attacks from external users, you are not allowed to set Authorization Object to 0.0.0.0/0.
    • If no public IP address is assigned to the cluster when you create the cluster, you can add a public IP address to the cluster in the Elastic Compute Service (ECS) console. After the IP address is added, go back to the EMR console. In the left-side navigation pane of the Cluster Overview page, click Instances. In the upper-right corner of the Instances page, click Update Instance Info to immediately synchronize instance information.
    • After the public IP address is assigned, you must to bind the domain name with the public IP address.
  • Set a Knox account

    When you access Knox, you must enter your username and password. The authentication is based on LDAP. You can use the LDAP service of Apache Directory Server in the cluster or your own LDAP service.

    • Use the LDAP service of Apache Directory Server in the cluster

      Method 1 (recommended):

      On the Users page of the cluster, add a Knox account. For more information, see Manage user accounts.

      Method 2:
      1. Log on to the master node of the cluster in SSH mode. For more information, see Log on to a cluster.
      2. Prepare your username, such as Tom.
        Run the following commands to open the users.ldif file:
        su knox
        cd /usr/lib/knox-current/templates  
        vi users.ldif

        In the file, replace emr-guest and EMR GUEST with Tom, and set setPassword to the password of your username.

      3. Run the following commands to import user data to LDAP:
        su knox
        cd /usr/lib/knox-current/templates
        sh ldap-sample-users.sh
    • Use your own LDAP service
      1. Log on to the EMR console and go to the Configure tab of the Knox service. Click the cluster-topo tab in the Service Configuration section.
      2. Configure the parameters in the xml-direct-to-file-content field.
        Parameter Description
        main.ldapRealm.userDnTemplate Specifies a distinguished name (DN) template.
        main.ldapRealm.contextFactory.url Specifies the domain name and port number of your LDAP server.
        cluster-topo
      3. In the upper-right corner of the Service Configuration section, click Save.
      4. In the Confirm Changes dialog box, configure the parameters and click OK.
      5. In the upper-right corner of the Knox service page, choose Actions > Restart Knox.
      6. In the Cluster Activities dialog box, configure the parameters and click OK.

        In the Confirm message, click OK.

      7. Enable the Knox port, such as port 10389, to access the LDAP service over the Internet.

        The steps to enable this port are similar to those to enable port 8443. You must set Rule Direction to Outbound.

Use the Knox account to access the web UIs of other components

You can use your Knox account to access the web UIs of other components, such as HDFS, YARN, Spark, and Ganglia.

  • Use a URL in the EMR console
    1. Log on to the Alibaba Cloud EMR console.
    2. In the top navigation bar, select the region where your cluster resides and select a resource group based on your business requirements.
    3. Click the Cluster Management tab.
    4. On the Cluster Management page, find the cluster whose services you want to access and click Details in the Actions column.
    5. In the left-side navigation pane of the Cluster Overview page, click Connect Strings.
    6. On the Public Connect Strings page, click the URL of the component that you want to access.
  • Use the public IP address of the cluster
    1. On the Cluster Overview page, obtain the public IP address of your cluster.
    2. In the address bar of the browser, enter the URL of the component that you want to access and press Enter.
      • HDFS: https://{Public IP address of the cluster}:8443/gateway/cluster-topo/hdfs/
      • YARN: https://{Public IP address of the cluster}:8443/gateway/cluster-topo/yarn/
      • Spark History: https://{Public IP address of the cluster}:8443/gateway/cluster-topo/sparkhistory/
      • Ganglia: https://{Public IP address of the cluster}:8443/gateway/cluster-topo/ganglia/
      • Storm: https://{Public IP address of the cluster}:8443/gateway/cluster-topo/storm/
      • Oozie: https://{Public IP address of the cluster}:8443/gateway/cluster-topo/oozie/
      • Tez: https://{Public IP address of the cluster}:8443/gateway/cluster-topo/tez-ui2/
      • Impala Catalogd: https://{Public IP address of the cluster}:8443/gateway/cluster-topo/impalalog/
      • Impala Statestored: https://{Public IP address of the cluster}:8443/gateway/cluster-topo/impalastore/

Access control

Knox offers service-level access control. You can manage access permissions on a specific service by user, user group, or IP address. For more information, see Apache Knox authorization.

Example:
  • Scenario: Authorize only user Tom to access the web UI of YARN.
  • Procedure:
    1. Log on to the EMR console and go to the Configure tab of the Knox service. Click the cluster-topo tab in the Service Configuration section.
    2. Configure the parameters in the xml-direct-to-file-content field.
      Add the following access control code between the <gateway> and </gateway> labels.
      <provider>
            <role>authorization</role>
            <name>AclsAuthz</name>
            <enabled>true</enabled>
            <param>
                <name>YARNUI.acl</name>
                <value>Tom;*;*</value>
            </param>
      </provider>
      Note Set value in the format of username;group;ipaddr, which indicates the user, user group, and IP address. If you do not need to specify the user, user group, or IP address, you can enter an asterisk (*) as a wildcard.
    3. In the upper-right corner of the Service Configuration section, click Save.
    4. In the Confirm Changes dialog box, configure the parameters and click OK.
    5. In the upper-right corner of the Knox service page, choose Actions > Restart Knox.
    6. In the Cluster Activities dialog box, configure the parameters and click OK.

      In the Confirm message, click OK.

    Warning Knox allows you to use the RESTful API of a service to perform related operations on the service. For example, you can use the RESTful API of HDFS to add files to or remove files from HDFS. For security purposes, you are not allowed to use the LDAP username and password saved in the Knox software directory to access a service.

FAQ

  • Q: Knox stops providing services unexpectedly when I restart Knox and the error message "Failed to start gateway: org.apache.hadoop.gateway.services.ServiceLifecycleException: Gateway SSL Certificate is Expired" appears, as shown in the following figure. What do I do? Error
  • A: Perform the following steps:
    1. Log on to your cluster in SSH mode. For more information, see Log on to a cluster.
    2. Run the following command to rename the SSL certificate that expires:
      sudo mv /usr/lib/knox-current/data/security/keystores/gateway.jks /usr/lib/knox-current/data/security/keystores/bak_gateway.jks
      Note You can also move the SSL certificate to another directory.
    3. Restart Knox.
      1. In the Components section of the Knox service page, click Restart in the Actions column that corresponds to Knox.
      2. In the Cluster Activities dialog box, specify Description and click OK.
      3. In the Confirm message, click OK.

        You can click History in the upper-right corner of the Knox service page to view the restart progress. Knox is restarted if the value of the Status parameter is Successful.