This topic uses an example to help you understand and use the features of E-MapReduce (EMR) clusters and jobs. For this example, you create a Spark job, run the job on the cluster to calculate Pi (π), and then view the result in the console.

Prerequisites

Background information

If the resource packages and data sources involved in jobs are stored in OSS, you must use the following OSS paths when you create a job:

  • The oss:// prefix indicates an OSS path from and to which you can read and write data. It is similar to hdfs://. In most cases, this type of path is used for job data sources.
  • The ossref:// prefix also indicates an OSS path. However, this prefix differs from the preceding prefix, because the required code resources are downloaded to a local disk and the path in the command line is replaced with this local path. The ossref:// path is convenient for you to run native code. You do not need to log on to the instance to upload code and dependent resource packages. In most cases, this type of path is used for job resource packages.

    For example, a .jar resource package is stored in the ossref://xxxxxx/xxx.jar path. When the job runs, EMR automatically downloads this package to the cluster.

    Notice A job fails if large-sized data resources are downloaded from a path that starts with the ossref prefix.

Procedure

  1. Log on to the Alibaba Cloud E-MapReduce console.
  2. Click the Data Platform tab.
  3. In the Projects section, find the target project and click Edit Job in the Actions column.
  4. In the left-side navigation pane, right-click the target folder and choose Create Job from the shortcut menu.
    Note You can also right-click the target folder to create a subfolder, rename the folder, or delete the folder.
  5. Specify Name, Description, and Job Type, for example, select Spark from the Job Type drop-down list.
  6. Click OK.
  7. Configure the job content. The following code is an example:
    --class org.apache.spark.examples.SparkPi --master yarn-client --driver-memory 512m --num-executors 1 --executor-memory 1g --executor-cores 2 /usr/lib/spark-current/examples/jars/spark-examples_2.11-2.1.1.jar 10
    Note Modify the name of the Spark package in the /usr/lib/spark-current/examples/jars/spark-examples_2.11-2.1.1.jar directory based on the Spark version in your cluster. For example, if the Spark version is 2.1.1, the .jar package is named spark-examples_2.11-2.1.1.jar. If the Spark version is 2.2.0, the .jar package is named spark-examples_2.11-2.2.0.jar.
  8. Click the Run button in the upper-right corner.
  9. View the job log and confirm the results.

    After you run the job, view the operational log on the Records tab in the lower part of the page. Then, click Details to go to the details page. On this page, you can view the job submission log and YARN container log.