edit-icon download-icon

Create Gateway

Last Updated: Aug 11, 2018

Gateway Introduction

Gateway is an ECS server in the same intranet as the E-MapReduce cluster. You can use Gateway to achieve load balancing and security isolation, or submit jobs to the E-MapReduce cluster. You can create Gateway in the following two ways:

  • (Recommended) Directly create on E-MapReduce console. Click to Create
  • Set up a Gateway manually.

Create a Gateway on E-MapReduce Console

Before you create a Gateway, make sure that you have created the E-MapReduce cluster. To create a Gateway, follow these steps:

  1. Log on to E-MapReduce Console.
  2. In the Cluster list page, click Create Gateway in the upper-right corner.
  3. Configure in the Create Gateway page.

    • Billing Method:Subscription method pays for a period one time, the price is relatively cheap compared to Pay-As-You-Go method, especially when you pay for three years one time, the discount is larger. Pay-As-You-Go method is based on the actual number of hours that you used the product to calculate the order, and it charges by hour.
    • Cluster:Create a Gateway for the cluster, that is, the created Gateway can submit jobs to which cluster. Gateway will automatically configure the Hadoop environment that is consistent with the cluster.
    • Configuration:The available ECS instance specifications in the zone.
    • System Disk Type:The system disk type of the Gateway node. There are two types of system disk: SSD cloud disk and efficient cloud disk. The displayed type of system disk varies according to different server models and different regions. The system disk is released with the release of the cluster by default.
    • System Disk Size:The minimum is 40GB and the maximum is 500GB. The default value is 300GB.
    • Data Disk Type:The data disk type of the Gateway node. There are two types of data disk: SSD Disk and efficient cloud disk. The displayed type of data disk varies according to different server models and different regions. The data disk is released with the release of the cluster by default.
    • Data Disk Size:The minimum is 200GB and the maximum is 4000GB. The default value is 300GB.
    • Quantity:The number of data disks. The minimum is 1 and the maximum is 10.
    • Cluster Name:The name of Gateway. The length can be 1~64 characters. It only allows Chinese character, letter, number, hyphen (-), and underscore (_).
    • Password/Key Pair
      • Password Mode: Enter the password for Gateway login in the text box.
      • Key Pair Mode: Select the key pair name for Gateway login in the drop-down menu. If no key pair has been created yet, click create a key pair on the right to go to the ECS console to create. After Gateway is created successfully, the key pair will be bound to the ECS in which Gateway is located.
  4. Click Create to save the configurations.

    The newly created Gateway will be displayed in the cluster list, and the status in Status column becomes Idle when the creation is successful.

Set up a Gateway manually

Network Environment

Make sure that the Gateway machine is in the security group of the corresponding EMR cluster, so that the Gateway nodes can smoothly access the EMR cluster. For setting the security group of the machine, see the ECS security group setting instructions.

Software Environment

  • System environment: CentOS 7.2+ is recommended.
  • Java environment: JDK 1.7 or later version must be installed and OpenJDK 1.8.0 is recommended.

Procedure

EMR 2.7 or later version, 3.2 or later version

To create Gateway in these versions, we recommend that you use the EMR console.

If you want to set up a Gateway manually, copy the following script to the Gateway host and run it. The command is sh deploy.sh <masteri_ip> master_password_file.

  • deploy.sh is the script name, and the content is as follows.
  • masteri_ip is the IP address of the master node in the cluster, which needs to be accessible.
  • master_password_file is the file for storing password of the master node, which is directly written in the file.
  1. #!/usr/bin/bash
  2. if [ $# != 2 ]
  3. then
  4. echo "Usage: $0 master_ip master_password_file"
  5. exit 1;
  6. fi
  7. masterip=$1
  8. masterpwdfile=$2
  9. if ! type sshpass >/dev/null 2>&1; then
  10. yum install -y sshpass
  11. fi
  12. if ! type java >/dev/null 2>&1; then
  13. yum install -y java-1.8.0-openjdk
  14. fi
  15. mkdir -p /opt/apps
  16. mkdir -p /etc/ecm
  17. echo "Start to copy package from $masterip to local gateway(/opt/apps)"
  18. echo " -copying hadoop-2.7.2"
  19. sshpass -f $masterpwdfile scp -r -o 'StrictHostKeyChecking no' root@$masterip:/usr/lib/hadoop-current /opt/apps/
  20. echo " -copying hive-2.0.1"
  21. sshpass -f $masterpwdfile scp -r root@$masterip:/usr/lib/hive-current /opt/apps/
  22. echo " -copying spark-2.1.1"
  23. sshpass -f $masterpwdfile scp -r root@$masterip:/usr/lib/spark-current /opt/apps/
  24. echo "Start to link /usr/lib/\${app}-current to /opt/apps/\${app}"
  25. if [ -L /usr/lib/hadoop-current ]
  26. then
  27. unlink /usr/lib/hadoop-current
  28. fi
  29. ln -s /opt/apps/hadoop-current /usr/lib/hadoop-current
  30. if [ -L /usr/lib/hive-current ]
  31. then
  32. unlink /usr/lib/hive-current
  33. fi
  34. ln -s /opt/apps/hive-current /usr/lib/hive-current
  35. if [ -L /usr/lib/spark-current ]
  36. then
  37. unlink /usr/lib/spark-current
  38. fi
  39. ln -s /opt/apps/spark-current /usr/lib/spark-current
  40. echo "Start to copy conf from $masterip to local gateway(/etc/ecm)"
  41. sshpass -f $masterpwdfile scp -r root@$masterip:/etc/ecm/hadoop-conf /etc/ecm/hadoop-conf
  42. sshpass -f $masterpwdfile scp -r root@$masterip:/etc/ecm/hive-conf /etc/ecm/hive-conf
  43. sshpass -f $masterpwdfile scp -r root@$masterip:/etc/ecm/spark-conf /etc/ecm/spark-conf
  44. echo "Start to copy environment from $masterip to local gateway(/etc/profile.d)"
  45. sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/hdfs.sh /etc/profile.d/
  46. sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/yarn.sh /etc/profile.d/
  47. sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/hive.sh /etc/profile.d/
  48. sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/spark.sh /etc/profile.d/
  49. if [ -L /usr/lib/jvm/java ]
  50. then
  51. unlink /usr/lib/jvm/java
  52. fi
  53. echo "" >>/etc/profile.d/hdfs.sh
  54. echo export JAVA_HOME=/usr/lib/jvm/jre-1.8.0 >>/etc/profile.d/hdfs.sh
  55. echo "Start to copy host info from $masterip to local gateway(/etc/hosts)"
  56. sshpass -f $masterpwdfile scp root@$masterip:/etc/hosts /etc/hosts_bak
  57. cat /etc/hosts_bak | grep emr | grep cluster >>/etc/hosts
  58. if ! id hadoop >& /dev/null
  59. then
  60. useradd hadoop
  61. fi

EMR 2.7 earlier version, 3.2 earlier version

Copy the following script to the Gateway host and run it.The command is sh deploy.sh <masteri_ip> master_password_file.

  • deploy.sh is the script name, and the content is as follows.
  • masteri_ip is the IP address of the master node in the cluster, which needs to be accessible.
  • master_password_file is the file for storing password of the master node, which is directly written in the file.
  1. !/usr/bin/bash
  2. if [ $# != 2 ]
  3. then
  4. echo "Usage: $0 master_ip master_password_file"
  5. exit 1;
  6. fi
  7. masterip=$1
  8. masterpwdfile=$2
  9. if ! type sshpass >/dev/null 2>&1; then
  10. yum install -y sshpass
  11. fi
  12. if ! type java >/dev/null 2>&1; then
  13. yum install -y java-1.8.0-openjdk
  14. fi
  15. mkdir -p /opt/apps
  16. mkdir -p /etc/emr
  17. echo "Start to copy package from $masterip to local gateway(/opt/apps)"
  18. echo " -copying hadoop-2.7.2"
  19. sshpass -f $masterpwdfile scp -r -o 'StrictHostKeyChecking no' root@$masterip:/usr/lib/hadoop-current /opt/apps/
  20. echo " -copying hive-2.0.1"
  21. sshpass -f $masterpwdfile scp -r root@$masterip:/usr/lib/hive-current /opt/apps/
  22. echo " -copying spark-2.1.1"
  23. sshpass -f $masterpwdfile scp -r root@$masterip:/usr/lib/spark-current /opt/apps/
  24. echo "Start to link /usr/lib/\${app}-current to /opt/apps/\${app}"
  25. if [ -L /usr/lib/hadoop-current ]
  26. then
  27. unlink /usr/lib/hadoop-current
  28. fi
  29. ln -s /opt/apps/hadoop-current /usr/lib/hadoop-current
  30. if [ -L /usr/lib/hive-current ]
  31. then
  32. unlink /usr/lib/hive-current
  33. fi
  34. ln -s /opt/apps/hive-current /usr/lib/hive-current
  35. if [ -L /usr/lib/spark-current ]
  36. then
  37. unlink /usr/lib/spark-current
  38. fi
  39. ln -s /opt/apps/spark-current /usr/lib/spark-current
  40. echo "Start to copy conf from $masterip to local gateway(/etc/emr)"
  41. sshpass -f $masterpwdfile scp -r root@$masterip:/etc/emr/hadoop-conf /etc/emr/hadoop-conf
  42. sshpass -f $masterpwdfile scp -r root@$masterip:/etc/emr/hive-conf /etc/emr/hive-conf
  43. sshpass -f $masterpwdfile scp -r root@$masterip:/etc/emr/spark-conf /etc/emr/spark-conf
  44. echo "Start to copy environment from $masterip to local gateway(/etc/profile.d)"
  45. sshpass -f $masterpwdfile scp root@$masterip:/etc/profile.d/hadoop.sh /etc/profile.d/
  46. if [ -L /usr/lib/jvm/java ]
  47. then
  48. unlink /usr/lib/jvm/java
  49. fi
  50. ln -s /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.131-3.b12.el7_3.x86_64/jre /usr/lib/jvm/java
  51. echo "Start to copy host info from $masterip to local gateway(/etc/hosts)"
  52. sshpass -f $masterpwdfile scp root@$masterip:/etc/hosts /etc/hosts_bak
  53. cat /etc/hosts_bak | grep emr | grep cluster >>/etc/hosts
  54. if ! id hadoop >& /dev/null
  55. then
  56. useradd hadoop
  57. fi

Test

  • Hive

    1. [hadoop@iZ23bc05hrvZ ~]$ hive
    2. hive> show databases;
    3. OK
    4. default
    5. Time taken: 1.124 seconds, Fetched: 1 row(s)
    6. hive> create database school;
    7. OK
    8. Time taken: 0.362 seconds
    9. hive>
  • Run the Hadoop job

    1. [hadoop@iZ23bc05hrvZ ~]$ hadoop jar /usr/lib/hadoop-current/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar pi 10 10
    2. Number of Maps = 10
    3. Samples per Map = 10
    4. Wrote input for Map #0
    5. Wrote input for Map #1
    6. Wrote input for Map #2
    7. Wrote input for Map #3
    8. Wrote input for Map #4
    9. Wrote input for Map #5
    10. Wrote input for Map #6
    11. Wrote input for Map #7
    12. Wrote input for Map #8
    13. Wrote input for Map #9
    14. File Input Format Counters
    15. Bytes Read=1180
    16. File Output Format Counters
    17. Bytes Written=97
    18. Job Finished in 29.798 seconds
    19. Estimated value of Pi is 3.20000000000000000000
Thank you! We've received your feedback.