Spark job configuration

Last Updated: Dec 28, 2017

  1. Log on to Alibaba Cloud E-MapReduce Console Job List.

  2. Click Create a job in the upper right corner to enter the job creation page.

  3. Input the job name.

  4. Select the Spark job type to create a Spark job. In E-MapReduce back-end, Spark jobs are submitted by using the following process:

    1. spark-submit [options] --class [MainClass] xxx.jar args
  5. Fill in the Parameters option box with command line parameters required to submit this Spark job. The box must only be filled in with parameters after “spark-submit”. Examples below show how to create parameters for Spark and pyspark jobs.

    • Create a Spark job

      Create a Spark WordCount job:

      • Job name: Wordcount

      • Type: Select Spark

      • Parameters:

        • Command submitted to the command line:

          1. spark-submit --master yarn-client --driver-memory 7G --executor-memory 5G --executor-cores 1 --num-executors 32 --class com.aliyun.emr.checklist.benchmark.SparkWordCount emr-checklist_2.10-0.1.0.jar oss://emr/checklist/data/wc oss://emr/checklist/data/wc-counts 32
        • Only enter the E-MapReduce job parameter box with:

          1. --master yarn-client --driver-memory 7G --executor-memory 5G --executor-cores 1 --num-executors 32 --class com.aliyun.emr.checklist.benchmark.SparkWordCount ossref://emr/checklist/jars/emr-checklist_2.10-0.1.0.jar oss://emr/checklist/data/wc oss://emr/checklist/data/wc-counts 32

        Job Jar packages are saved in OSS. To refer to this Jar package is ossref://emr/checklist/jars/emr-checklist_2.10-0.1.0.jar. Click Select OSS path to view and select from OSS, the system will automatically complete the absolute path of Spark script on OSS. Switch the default “oss” protocol into “ossref” protocol.

    • Create a pyspark job

      In addition to Scala or Java job types, E-MapReduce also supports Spark jobs of python type. Create a Spark Kmeans job for python script:

      • Job name: Python-Kmeans

      • Type: Spark

      • Parameters:

        1. --master yarn-client --driver-memory 7g --num-executors 10 --executor-memory 5g --executor-cores 1 --jars ossref://emr/checklist/jars/emr-core-0.1.0.jar ossref://emr/checklist/python/ oss://emr/checklist/data/kddb 5 32
      • References of Python script resource are supported and “ossref” protocol is used.

      • For pyspark, online Python installation kit is not supported.

  6. Select the policy for failed operations.

  7. Click OK to complete the Spark job definition.

Thank you! We've received your feedback.