This section describes clusters and jobs in E-MapReduce (EMR) and how to use them. For example, you can create a Spark job, run the job on the cluster to calculate Pi (π), and then view the result in the console.
- Create a cluster
- In the Alibaba Cloud E-MapReduce console, click the Cluster tab to go to the clusters list page. Click Create Cluster in the upper-right corner.
- Software Configuration
- Select the latest EMR version. Example: EMR 3.13.0.
- Select the default software configuration.
- Hardware Configuration
- Select Pay-As-You-Go.
- If no security group has been created, enter a name and then create one.
- Select a master instance with 4 cores and 8 GB of memory.
- Select two core instances with 4 cores and 8 GB of memory.
- The remaining configuration all uses the default settings.
- Basic Configuration
- Enter the name of the cluster.
- Specify a path to store the job log. Make sure that the running log feature is enabled. Create an OSS bucket in the region where the cluster has been created.
- Enter the password that is used to log on to the cluster.
- Click OK to create the cluster.
- Create a job.
- Click the Data Platform tab to go to the project list page. Click New Project in the upper-right corner.
- In the New Project dialog box, enter the project name and description, and then click Create.
- Click Design Workflow to the right of the specified project and to go to the Edit Jobs page.
- On the left side of the Edit Jobs page, right-click the folder that you want to operate and select New Job.
- Enter the job name and description.
- Select Spark as the job type.
- Click OK.
Note You can also right-click a folder and then choose to create a subfolder, rename the folder, or delete the folder.
- Enter parameters as follows:
--class org.apache.spark.examples.SparkPi --master yarn-client --driver-memory 512m --num-executors 1 --executor-memory 1g --executor-cores 2 /usr/lib/spark-current/examples/jars/spark-examples_2.11-2.1.1.jar 10Note The
/usr/lib/spark-current/examples/jars/spark-examples_2.11-2.1.1.jarJAR file name is defined by the version of Spark in the cluster. For example, if the version of Spark is 2.1.1, then the JAR file is named as
spark-examples_2.11-2.1.1.jar. If the version of Spark is 2.2.0, then the JAR file is named as
- Click Run.
- View the job log entries and confirm the results.
After you have executed a job, you can click the Log tab at the bottom of the page to view running log. Click View Details to go to the details page. On this page, you can view details, including the job submission log and YARN Container log.