This topic describes how to use Oozie in EMR.
Prerequisites
An EMR Hadoop cluster is created, and Oozie is selected from the optional services
during the cluster creation. For more information, see
Create a cluster.
Preparations
In this topic, macOS is used as an example, and Google Chrome is used to perform port
forwarding.
- Log on to the master node. For more information, see Connect to the master node of an EMR cluster in SSH mode.
ssh root@xx.xx.xx.xx
xx.xx.xx.xx
indicates the public IP address of the master node.
- Enter your password.
- Check the content of the id_rsa.pub file on the on-premises machine.
- Run the following command to create a .ssh directory on the master node:
mkdir ~/.ssh/
vim ~/.ssh/authorized_keys
- Copy the information returned in Step 3 to the authorized_keys file in the created .ssh directory. Then, run the
ssh root@xx.xx.xx.xx
command to log on to the master node in password-free mode.
- Run the following command on the on-premises machine to perform port forwarding:
ssh -i ~/.ssh/id_rsa -ND 8157 root@xx.xx.xx.xx
- Re-open the Terminal application and run the following command to start Google Chrome:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --proxy-server="socks5://localhost:8157" --host-resolver-rules="MAP * 0.0.0.0 , EXCLUDE localhost" --user-data-dir=/tmp
Access the Oozie web UI
In the address bar of Google Chrome, enter one of the following URLs to access the
Oozie web UI:
- Public IP address:11000/oozie
- localhost:11000/oozie
- Internal IP address:11000/oozie
Submit a workflow job
By default, ShareLib is installed in EMR clusters. When you submit an Oozie workflow job, you do not need
to install ShareLib.
- In the job.properties file, specify NameNode and JobTracker (ResourceManager) based on the cluster type.
- Non-HA cluster
nameNode=hdfs://emr-header-1:9000
jobTracker=emr-header-1:8032
- HA cluster
nameNode=hdfs://emr-cluster
jobTracker=rm1,rm2
- Submit an Oozie workflow job.
- Non-HA cluster
- Log on to the master node. For more information, see Connect to the master node of an EMR cluster in SSH mode.
ssh root@Public IP address of the master node
- Download the sample code.
[root@emr-header-1 ~]# su oozie
[oozie@emr-header-1 root]$ cd /tmp
[oozie@emr-header-1 tmp]$ wget http://emr-sample-projects.oss-cn-hangzhou.aliyuncs.com/oozie-examples/oozie-examples.zip
[oozie@emr-header-1 tmp]$ unzip oozie-examples.zip
- Synchronize the Oozie workflow code to HDFS.
[oozie@emr-header-1 tmp]$ hadoop fs -copyFromLocal examples/ /user/oozie/examples
- Submit an Oozie workflow job.
[oozie@emr-header-1 tmp]$ $OOZIE_HOME/bin/oozie job -config examples/apps/map-reduce/job.properties -run
If the command is successfully executed, the following information is returned:
job: 0000000-160627195651086-oozie-oozi-W
- Access the Oozie web UI.
You can view the submitted Oozie workflow job.
- HA cluster
- Log on to the primary master node. For more information, see Connect to the master node of an EMR cluster in SSH mode.
ssh root@Public IP address of the primary master node
- Download the sample code.
[root@emr-header-1 ~]# su oozie
[oozie@emr-header-1 root]$ cd /tmp
[oozie@emr-header-1 tmp]$ wget http://emr-sample-projects.oss-cn-hangzhou.aliyuncs.com/oozie-examples/oozie-examples-ha.zip
[oozie@emr-header-1 tmp]$ unzip oozie-examples-ha.zip
- Synchronize the Oozie workflow code to HDFS.
[oozie@emr-header-1 tmp]$ hadoop fs -copyFromLocal examples/ /user/oozie/examples
- Submit an Oozie workflow job.
[oozie@emr-header-1 tmp]$ $OOZIE_HOME/bin/oozie job -config examples/apps/map-reduce/job.properties -run
If the command is successfully executed, the following information is returned:
job: 0000000-160627195651086-oozie-oozi-W
- Access the Oozie web UI.
You can view the submitted Oozie workflow job.