This topic describes how to use Oozie in EMR.

Prerequisites

An EMR Hadoop cluster is created, and Oozie is selected from the optional services during the cluster creation. For more information, see Create a cluster.

Preparations

In this topic, macOS is used as an example, and Google Chrome is used to perform port forwarding.

  1. Log on to the master node. For more information, see Connect to the master node of an EMR cluster in SSH mode.
    ssh root@xx.xx.xx.xx

    xx.xx.xx.xx indicates the public IP address of the master node.

  2. Enter your password.
  3. Check the content of the id_rsa.pub file on the on-premises machine.
    cat ~/.ssh/id_rsa.pub
  4. Run the following command to create a .ssh directory on the master node:
    mkdir ~/.ssh/
    vim ~/.ssh/authorized_keys
  5. Copy the information returned in Step 3 to the authorized_keys file in the created .ssh directory. Then, run the ssh root@xx.xx.xx.xx command to log on to the master node in password-free mode.
  6. Run the following command on the on-premises machine to perform port forwarding:
    ssh -i ~/.ssh/id_rsa -ND 8157 root@xx.xx.xx.xx
  7. Re-open the Terminal application and run the following command to start Google Chrome:
    /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --proxy-server="socks5://localhost:8157" --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" --user-data-dir=/tmp

Access the Oozie web UI

In the address bar of Google Chrome, enter one of the following URLs to access the Oozie web UI:
  • Public IP address:11000/oozie
  • localhost:11000/oozie
  • Internal IP address:11000/oozie

Submit a workflow job

By default, ShareLib is installed in EMR clusters. When you submit an Oozie workflow job, you do not need to install ShareLib.

  1. In the job.properties file, specify NameNode and JobTracker (ResourceManager) based on the cluster type.
    • Non-HA cluster
      nameNode=hdfs://emr-header-1:9000
      jobTracker=emr-header-1:8032
    • HA cluster
      nameNode=hdfs://emr-cluster
      jobTracker=rm1,rm2
  2. Submit an Oozie workflow job.
    • Non-HA cluster
      1. Log on to the master node. For more information, see Connect to the master node of an EMR cluster in SSH mode.
        ssh root@Public IP address of the master node
      2. Download the sample code.
        [root@emr-header-1 ~]# su oozie
        [oozie@emr-header-1 root]$ cd /tmp
        [oozie@emr-header-1 tmp]$ wget http://emr-sample-projects.oss-cn-hangzhou.aliyuncs.com/oozie-examples/oozie-examples.zip
        [oozie@emr-header-1 tmp]$ unzip oozie-examples.zip
      3. Synchronize the Oozie workflow code to HDFS.
        [oozie@emr-header-1 tmp]$ hadoop fs -copyFromLocal examples/ /user/oozie/examples
      4. Submit an Oozie workflow job.
        [oozie@emr-header-1 tmp]$ $OOZIE_HOME/bin/oozie job -config examples/apps/map-reduce/job.properties -run
        If the command is successfully executed, the following information is returned:
        job: 0000000-160627195651086-oozie-oozi-W
      5. Access the Oozie web UI.

        You can view the submitted Oozie workflow job.

    • HA cluster
      1. Log on to the primary master node. For more information, see Connect to the master node of an EMR cluster in SSH mode.
        ssh root@Public IP address of the primary master node
      2. Download the sample code.
        [root@emr-header-1 ~]# su oozie
        [oozie@emr-header-1 root]$ cd /tmp
        [oozie@emr-header-1 tmp]$ wget http://emr-sample-projects.oss-cn-hangzhou.aliyuncs.com/oozie-examples/oozie-examples-ha.zip
        [oozie@emr-header-1 tmp]$ unzip oozie-examples-ha.zip
      3. Synchronize the Oozie workflow code to HDFS.
        [oozie@emr-header-1 tmp]$ hadoop fs -copyFromLocal examples/ /user/oozie/examples
      4. Submit an Oozie workflow job.
        [oozie@emr-header-1 tmp]$ $OOZIE_HOME/bin/oozie job -config examples/apps/map-reduce/job.properties -run
        If the command is successfully executed, the following information is returned:
        job: 0000000-160627195651086-oozie-oozi-W
      5. Access the Oozie web UI.

        You can view the submitted Oozie workflow job.