This topic describes how to configure a Spark job.

Prerequisites

A project is created. For more information, see Manage projects.

Procedure

  1. Log on to the Alibaba Cloud E-MapReduce console with an Alibaba Cloud account.
  2. Click the Data Platform tab.
  3. In the Projects section, click Edit Job in the row of a project.
  4. In the left-side navigation pane, right-click the required folder and choose Create Job from the shortcut menu.
    Note You can also right-click the folder to create a subfolder, rename the folder, or delete the folder.
  5. In the dialog box that appears, set the Name and Description parameters, and select Spark from the Job Type drop-down list.
    This option indicates that a Spark job will be created. You can use the following command syntax to submit a Spark job:
    spark-submit [options] --class [MainClass] xxx.jar args
  6. Click OK.
  7. Specify the command line arguments required to submit the job in the Content field.
    Only the arguments that follow spark-submit are required.

    The following examples demonstrate how to specify the arguments required to submit Spark and Python Spark jobs.

    • Create a Spark job.

      Create a Spark WordCount job.

      • Name: Wordcount
      • Job Type: Spark
      • Arguments:
        • Enter the following command in the command line:
          spark-submit --master yarn-client --driver-memory 7G --executor-memory 5G --executor-cores 1 --num-executors 32 --class com.aliyun.emr.checklist.benchmark.SparkWordCount emr-checklist_2.10-0.1.0.jar oss://emr/checklist/data/wc oss://emr/checklist/data/wc-counts 32
        • Enter the following command in the Content field:
          --master yarn-client --driver-memory 7G --executor-memory 5G --executor-cores 1 --num-executors 32 --class com.aliyun.emr.checklist.benchmark.SparkWordCount ossref://emr/checklist/jars/emr-checklist_2.10-0.1.0.jar oss://emr/checklist/data/wc oss://emr/checklist/data/wc-counts 32
          Notice If a job is stored in OSS as a JAR file, you can reference the JAR file by using the ossref://emr/checklist/jars/emr-checklist_2.10-0.1.0.jar directory. Click Enter an OSS path in the lower part of the page. In the dialog box that appears, set File Prefix to OSSREF and specify the file in File Path. The system automatically completes the path of the Spark script in OSS.
    • Create a Python Spark job.

      In addition to Scala and Java Spark jobs, you can create Python Spark jobs in E-MapReduce. In the following example, a Spark Kmeans job running Python scripts is created.

      • Name: Python-Kmeans
      • Job Type: Spark
      • Arguments:
        --master yarn-client --driver-memory 7g --num-executors 10 --executor-memory 5g --executor-cores 1 ossref://emr/checklist/python/kmeans.py oss://emr/checklist/data/kddb 5 32
      • Python script resources can be referenced by using the ossref protocol.
      • The Python toolkit cannot be installed online.
  8. Click Save.