In this tutorial, you will be able to get a idea of what clusters, jobs, and execution plans play and how they are used in E-MapReduce. You will also be able to create a Spark Pi job and run it successfully in the cluster. Finally, you can see the approximate calculation result of Pi on the console page.
- Create a cluster.
- On the top of the EMR product console, click Cluster Management and click Create Cluster at the upper right corner.
- Software configurations.
- Use lasted EMR product version, such as EMR-3.4.1.
- Use the default software configuration.
- Hardware configuration.
- Select Pay-as-You-Go.
- If there is no security group, click New and enter the security group name.
- Select 4-core and 8G for the master node.
- Select 4-core and 8G for the core node (one instance).
- Keep others in default status.
- Basic configurations.
- Enter the name of the cluster.
- Select the log path to save job logs and select make sure that the logging feature is on. In the region for the cluster, create an OSS bucket.
- Enter the password.
- Create a cluster.
- Create a job.
- Click the Data Platform tab on the top to enter the Project List page.
- Click Design Workflow of the specified project in the Operation column.
- On the left side of the Job Editing page, right-click on the folder you want to operate and select New Job.
- In the New Job dialog box, enter the job name, job description, and select Spark as the job typethe job type.
Once the job type is selected, it cannot be modified.
- Click OK.
Note You can also create subfolder, rename folder, and delete folder by right-clicking on the folder.
- Enter parameters in theContent box as follows.
--class org.apache.spark.examples.SparkPi --master yarn-client --driver-memory 512m --num-executors 1 --executor-memory 1g --executor-cores 2 /usr/lib/spark-current/examples/jars/spark-examples_2.11-2.1.1.jar 10Notice The
/usr/lib/spark-current/examples/jars/spark-examples_2.11-2.1.1.jarjar file name is decided by Spark version in cluster, for example, if Spark version is 2.1.1, it should be
spark-examples_2.11-2.1.1.jar, if Spark version is 2.2.0, then file name is
- Click Run
- View job logs and confirm the results.
After the job runs, you can view the running log of the job in the Logs tab at the bottom of the page. Click Log Details to jump to the detailed log page of the job, you can see information such as the job's submitting log and YARN Container log.