All Products
Search
Document Center

E-MapReduce:Run Spark jobs on ARM-based nodes

Last Updated:May 26, 2023

By default, EMR on ACK clusters run Spark jobs on nodes that use the x86 architecture. You can also run Spark jobs on elastic container instances that use the ARM architecture. This topic describes how to run Spark jobs on nodes that use the ARM architecture.

Prerequisites

Procedure

  1. Add virtual nodes that are required by Elastic Container Instance to an ACK cluster. For more information, see Method 2: Add ARM-based virtual nodes in the Schedule workloads to ARM-based nodes topic.

  2. Submit Spark jobs in an EMR on ACK cluster. For more information, see Submit a Spark job.

    Method 1: Submit a Spark job by using a CRD

    When you submit a job by using a Custom Resource Definition (CRD), configure the following parameters:

    • image: In this example, registry-vpc.cn-hangzhou.aliyuncs.com/emr/spark-py:emr-3.3.1-1.1.7-arm is used. Replace cn-hangzhou with the ID of the region that you use.

    • annotations: Add alibabacloud.com/burst-resource: "eci_only" for the annotations parameter.

    • nodeSelector: Add kubernetes.io/arch: arm64 for the nodeSelector parameter.

    Sample code:

    apiVersion: "sparkoperator.k8s.io/v1beta2"
    kind: SparkApplication
    metadata:
      name: spark-pi-eci
    spec:
      type: Scala
      sparkVersion: 3.3.1
      mainClass: org.apache.spark.examples.SparkPi
      # The specific image that is used by ARM-based nodes. 
      image: registry-vpc.cn-hangzhou.aliyuncs.com/emr/spark-py:emr-3.3.1-1.1.7-arm
      mainApplicationFile: "local:///opt/spark/examples/spark-examples.jar"
      arguments:
        - "100000"
      driver:
        cores: 2
        coreLimit: 2000m
        memory: 4g
        # Configure the annotations parameter to allow all executors to run Spark jobs by using Elastic Container Instance. 
        annotations:
          alibabacloud.com/burst-resource: "eci_only"
    		# Configure the nodeSelector parameter to specify ARM-based nodes. 
        nodeSelector:
          kubernetes.io/arch: arm64
      executor:
        cores: 4
        coreLimit: 4000m
        memory: 8g
        instances: 10
        # Configure the annotations parameter to allow all executors to run Spark jobs by using Elastic Container Instance. 
        annotations:
          alibabacloud.com/burst-resource: "eci_only"
    		# Configure the nodeSelector parameter to specify ARM-based nodes. 
        nodeSelector:
          kubernetes.io/arch: arm64

    Method 2: Run Spark jobs based on a Spark configuration file

    You can configure the image, annotation, and node selector in a Spark configuration file. This way, you can run Spark jobs on ARM-based nodes. The values are the same as the values that you specified by using Method 1.

    1. Go to the spark-defaults.conf tab.

      1. Log on to the EMR on ACK console.

      2. On the EMR on ACK page, find the cluster that you want to manage and click Configure in the Actions column.

      3. On the Configure tab, click the spark-defaults.conf tab.

    2. Enable Elastic Container Instance for the Spark cluster.

      1. On the spark-defaults.conf tab, click Add Configuration Item.

      2. In the Add Configuration Item dialog box, add the configuration items that are described in the following table.

        Configuration item

        Description

        Value

        spark.kubernetes.container.image

        The Spark image.

        registry-vpc.cn-hangzhou.aliyuncs.com/emr/spark-py:emr-3.3.1-1.1.7-arm

        Note

        You must replace cn-hangzhou with the ID of the region that you use.

        spark.kubernetes.driver.annotation.alibabacloud.com/burst-resource

        Specifies whether the Spark driver uses Elastic Container Instance to run Spark jobs.

        eci_only

        spark.kubernetes.driver.node.selector.kubernetes.io/arch

        The node selector of the Spark driver.

        arm64

        spark.kubernetes.executor.annotation.alibabacloud.com/burst-resource

        Specifies whether the Spark executor uses Elastic Container Instance to run Spark jobs.

        eci_only

        spark.kubernetes.executor.node.selector.kubernetes.io/arch

        The node selector of the Spark executor.

        arm64

      3. Click OK.

      4. In the dialog box that appears, configure the Execution Reason parameter and click Save.

    3. Deploy the configurations.

      1. In the lower part of the Configure tab, click Deploy Client Configuration.

      2. In the dialog box that appears, configure the Execution Reason parameter and click OK.