This section describes clusters and jobs in E-MapReduce (EMR) and how to use them. For example, you can create a Spark job, run the job on the cluster to calculate Pi (π), and then view the result in the console.
- Create a cluster
Please refer to Step 3Create a cluster.
- Create a job.
- Click the Data Platform tab to go to the project list page. Click New Project in the upper-right corner.
- In the New Project dialog box, enter the project name and description, and then click Create.
- Click Design Workflow to the right of the specified project and to go to the Edit Jobs page.
- On the left side of the Edit Jobs page, right-click the folder that you want to operate and select New Job.
- Enter the job name and description.
- Select Spark as the job type.
- Click OK.
Note You can also right-click a folder and then choose to create a subfolder, rename the folder, or delete the folder.
- Enter parameters as follows:
--class org.apache.spark.examples.SparkPi --master yarn-client --driver-memory 512m --num-executors 1 --executor-memory 1g --executor-cores 2 /usr/lib/spark-current/examples/jars/spark-examples_2.11-2.1.1.jar 10Note The
/usr/lib/spark-current/examples/jars/spark-examples_2.11-2.1.1.jarJAR file name is defined by the version of Spark in the cluster. For example, if the version of Spark is 2.1.1, then the JAR file is named as
spark-examples_2.11-2.1.1.jar. If the version of Spark is 2.2.0, then the JAR file is named as
- Click Run.
- View the job log entries and confirm the results.
After you have executed a job, you can click the Log tab at the bottom of the page to view running log. Click View Details to go to the details page. On this page, you can view details, including the job submission log and YARN Container log.
OSS and ossref
The oss:// prefix indicates that the data path points to an OSS path, which specifies the operation path when reading/writing the data. This is similar to hdfs://.
The ossref:// prefix also indicates that the data path points to an OSS path. However, it is used to download the corresponding code to a local disk, and then replace the path in the command line with this local path. It is easier for you to run native code. You do not need to log on to the computer to upload the code and the dependent resource packages.
In this example, the ossref://xxxxxx/xxx.jar parameter represents the JAR package of job resources. This JAR package is stored on OSS. When this path is executed in the code, the JAR package will automatically download to the cluster and be executed. The two oss://xxxx and the two values following the JAR package are processed by the main class in the JAR package as parameters.