Spark on ACK overview - Container Service for Kubernetes

Spark on Container Service for Kubernetes (ACK) is a solution provided by ACK that leverages Spark on Kubernetes to enable rapid construction of an efficient, flexible, and scalable Spark big data processing platform, utilizing the enterprise-level container application management capabilities provided by ACK.

Introduction to Spark on ACK

Apache Spark is a compute engine tailored for large-scale data processing and is widely utilized in scenarios such as data analysis and machine learning. Beginning with version 2.3, Spark has enabled job submission to Kubernetes clusters (Running Spark on Kubernetes).

Spark Operator is an Operator designed for running Spark workloads on Kubernetes clusters. It automates the management of the Spark job lifecycle in a way native to Kubernetes, including configuration, submission, and retry processes.

The Spark on ACK solution customizes and enhances components like Spark Operator, ensuring compatibility with open-source versions and expanding its capabilities. It integrates seamlessly with the Alibaba Cloud ecosystem, offering features such as log retention, Object Storage Service, and observability, allowing for the quick establishment of a flexible, efficient, and scalable big data processing platform.

Features and advantages

Simplified development and operations
- Portability: Enables packaging of Spark applications and their dependencies into container images for easy migration between Kubernetes clusters.
- Observability: Allows job status monitoring through the Spark History Server and integrates with Simple Log Service and Managed Service for Prometheus for enhanced job observability.
- Workflow orchestration: Use workflow orchestration engines such as Apache Airflow and Argo Workflows to manage Spark jobs, automate data pipeline scheduling, and ensure consistent deployments across environments. This enhances operational efficiency and reduces migration costs.
- Multi-version support: Allows multiple versions of Spark jobs to run concurrently in a single ACK cluster.
Job scheduling and resource management
- Job queue management: Seamlessly integrated with ack-kube-queue, this feature offers flexible management of job queues and resource quotas, automatically optimizing the allocation of resources for workloads and enhancing the utilization of cluster resources.
- Multiple scheduling strategies: Leverage the existing scheduling capabilities of the ACK scheduler to support a variety of batch scheduling strategies, such as Gang Scheduling and Capacity Scheduling.
- Multi-architecture scheduling: Supports hybrid use of x86 and Arm architecture Elastic Compute Service (ECS) resources to improve efficiency and reduce costs.
- Multi-cluster scheduling: Utilize the ACK One multi-cluster fleet to distribute Spark jobs across various clusters, enhancing resource utilization across multiple clusters.
- Elastic computing power supply: Offers customizable resource priority scheduling and the integration of various elastic solutions, such as node autoscaling and instant elasticity. It also allows for the use of Elastic Container Instance and Alibaba Cloud Container Compute Service (ACS) computing resources without maintaining Elastic Compute Service instances, enabling on-demand and flexible scaling options.
- Colocation of multiple types of workloads: Seamlessly integrated with ack-koordinator, this feature supports the colocation of various types of workloads, thereby enhancing the utilization of cluster resources.
Performance and stability optimization
- Shuffle performance optimization: Configures Spark jobs to use Celeborn as the Remote Shuffle Service, achieving storage-compute separation and enhancing Shuffle performance and stability.
- Data access acceleration: Utilizes the data orchestration and access acceleration capabilities provided by Fluid to speed up data access for Spark jobs, thereby improving performance.

Overall architecture

The architecture of Spark on ACK allows for quick job submission through Spark Operator and leverages the observability, scheduling, and resource elasticity features of ACK and Alibaba Cloud products.

Client: Submit Spark jobs to the ACK cluster using command line tools like kubectl and Arena.
Workflow: Orchestrate and submit Spark jobs to the ACK cluster using frameworks such as Apache Airflow and Argo Workflows.
Observability: Enhance the observability of your system by using Spark History Server, Simple Log Service, and Managed Service for Prometheus. This includes monitoring job statuses, along with collecting and analyzing job logs and metrics.
Spark Operator: Automate Spark job lifecycle management, including configuration, submission, and retry.
Remote Shuffle Service (RSS): Use Apache Celeborn as RSS to enhance the performance and stability of Spark jobs.
Cache: Employ Fluid as a distributed cache system for data access and acceleration.
Cloud infrastructure: During the job execution, utilize Alibaba Cloud infrastructures, including computing resources such as ECS instances, eIlastic container instances, ACS clusters, storage resources such as disks, File Storage NAS (NAS) file systems, Object Storage Service (OSS) buckets, and network resources such as elastic network interfaces (ENIs), virtual private clouds (VPCs), and Server Load Balancer (SLB) instances.

Billing overview

The installation of components for running Spark jobs in an ACK cluster is free. However, the costs for the ACK cluster itself, including cluster management fees and associated cloud product fees, are charged as usual. For more information, see Billing overview.

Additional cloud product fees, such as those for collecting logs with Simple Log Service or for reading and writing data in OSS/NAS by Spark jobs, are charged by each respective cloud product. Refer to the operation documents below for further details.

Getting Started

Running Spark jobs in an ACK cluster typically involves a series of steps, such as basic usage, observability, and advanced configuration, which you can select and configure according to your needs.

Basic usage

Process	Description
Build a Spark container image	You can choose to directly use the Spark container image provided by the open-source community or customize it based on the open-source container image and push it to your own image repository. Below is a Dockerfile example. You can modify this Dockerfile as needed, such as replacing the Spark base image or adding dependent JAR packages, then build the image and push it to the image repository. Expand to view the example Dockerfile ARG SPARK_IMAGE=spark:3.5.4 FROM ${SPARK_IMAGE} # Add dependency for Hadoop Aliyun OSS support ADD --chown=spark:spark --chmod=644 https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aliyun/3.3.4/hadoop-aliyun-3.3.4.jar ${SPARK_HOME}/jars ADD --chown=spark:spark --chmod=644 https://repo1.maven.org/maven2/com/aliyun/oss/aliyun-sdk-oss/3.17.4/aliyun-sdk-oss-3.17.4.jar ${SPARK_HOME}/jars ADD --chown=spark:spark --chmod=644 https://repo1.maven.org/maven2/org/jdom/jdom2/2.0.6.1/jdom2-2.0.6.1.jar ${SPARK_HOME}/jars # Add dependency for log4j-layout-template-json ADD --chown=spark:spark --chmod=644 https://repo1.maven.org/maven2/org/apache/logging/log4j/log4j-layout-template-json/2.24.1/log4j-layout-template-json-2.24.1.jar ${SPARK_HOME}/jars # Add dependency for Celeborn ADD --chown=spark:spark --chmod=644 https://repo1.maven.org/maven2/org/apache/celeborn/celeborn-client-spark-3-shaded_2.12/0.5.3/celeborn-client-spark-3-shaded_2.12-0.5.3.jar ${SPARK_HOME}/jars
Create a dedicated namespace	Create one or more dedicated namespaces for Spark jobs (this tutorial uses `spark`) to achieve resource isolation and resource quotas. Subsequent Spark jobs will run in this namespace. The creation command is as follows. `kubectl create namespace spark`
Use Spark Operator to run Spark jobs	Deploy the ack-spark-operator component and configure `spark.jobNamespaces=["spark"]` (only listen to Spark jobs submitted in the `spark` namespace). After deployment, you can run the following example Spark job. Expand to view the example Spark job apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-pi namespace: spark # Ensure this namespace is in the namespace list specified by spark.jobNamespaces. spec: type: Scala mode: cluster # Replace <SPARK_IMAGE> with your own Spark container image. image: <SPARK_IMAGE> imagePullPolicy: IfNotPresent mainClass: org.apache.spark.examples.SparkPi mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.4.jar arguments: - "5000" sparkVersion: 3.5.4 driver: cores: 1 coreLimit: 1200m memory: 512m template: spec: containers: - name: spark-kubernetes-driver serviceAccount: spark-operator-spark executor: instances: 1 cores: 1 coreLimit: 1200m memory: 512m template: spec: containers: - name: spark-kubernetes-executor restartPolicy: type: Never For more information, see Use Spark Operator to run Spark jobs.
Read and write OSS data	There are multiple ways for Spark jobs to access Alibaba Cloud OSS data, including Hadoop Aliyun SDK, Hadoop AWS SDK, and JindoSDK. Depending on the SDK you choose, you need to include the corresponding dependencies in the Spark container image and configure Hadoop-related parameters in the Spark job. Expand to view the example code Refer to Read and write OSS data in Spark jobs to upload the test dataset to OSS, and then you can run the following example Spark job. apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-pagerank namespace: spark spec: type: Scala mode: cluster # Replace <SPARK_IMAGE> with your own Spark image image: <SPARK_IMAGE> imagePullPolicy: IfNotPresent mainClass: org.apache.spark.examples.SparkPageRank mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.4.jar arguments: # Specify the input test dataset, replace <OSS_BUCKET> with the OSS Bucket name - oss://<OSS_BUCKET>/data/pagerank_dataset.txt # Number of iterations - "10" sparkVersion: 3.5.4 hadoopConf: # Configure Spark job access to OSS fs.oss.impl: org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem # Replace <OSS_ENDPOINT> with the OSS access endpoint, for example, the internal network access endpoint of OSS in Beijing is oss-cn-beijing-internal.aliyuncs.com fs.oss.endpoint: <OSS_ENDPOINT> fs.oss.credentials.provider: com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider driver: cores: 1 coreLimit: "1" memory: 4g template: spec: containers: - name: spark-kubernetes-driver envFrom: # Read environment variables from the specified Secret - secretRef: name: spark-oss-secret serviceAccount: spark-operator-spark executor: instances: 2 cores: 1 coreLimit: "1" memory: 4g template: spec: containers: - name: spark-kubernetes-executor envFrom: # Read environment variables from the specified Secret - secretRef: name: spark-oss-secret restartPolicy: type: Never For more information, see Read and write OSS data in Spark jobs.

Observability

Process

Description

Deploy Spark History Server

Deploy ack-spark-history-server in the spark namespace, configure log storage backend (supporting PVC, OSS/OSS-HDFS, HDFS), and other information to read Spark event logs from the specified storage system and parse them into a Web UI for users to view. The following example configuration shows how to configure Spark History Server to read event logs from the /spark/event-logs path of the specified NAS file system.

Expand to view the example configuration

# Spark configuration
sparkConf:
  spark.history.fs.logDirectory: file:///mnt/nas/spark/event-logs

# Environment variables
env:
- name: SPARK_DAEMON_MEMORY
  value: 7g

# Data volume
volumes:
- name: nas
  persistentVolumeClaim:
    claimName: nas-pvc

# Data volume mount
volumeMounts:
- name: nas
  subPath: spark/event-logs
  mountPath: /mnt/nas/spark/event-logs

# Adjust resource size based on the number and scale of Spark jobs
resources:
  requests:
    cpu: 2
    memory: 8Gi
  limits:
    cpu: 2
    memory: 8Gi

Next, mount the same NAS file system when submitting Spark jobs and configure Spark to write event logs to the same path. You will then be able to view the job from the Spark History Server. Below is an example job.

Expand to view the example Spark job

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: spark
spec:
  type: Scala
  mode: cluster
  # Replace <SPARK_IMAGE> with your Spark image
  image: <SPARK_IMAGE>
  imagePullPolicy: IfNotPresent
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.4.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.4
  sparkConf:
    # Event log
    spark.eventLog.enabled: "true"
    spark.eventLog.dir: file:///mnt/nas/spark/event-logs
  driver:
    cores: 1
    coreLimit: 1200m
    memory: 512m
    template:
      spec:
        containers:
        - name: spark-kubernetes-driver
          volumeMounts:
          - name: nas
            subPath: spark/event-logs
            mountPath: /mnt/nas/spark/event-logs
        volumes:
        - name: nas
          persistentVolumeClaim:
            claimName: nas-pvc
        serviceAccount: spark-operator-spark
  executor:
    instances: 1
    cores: 1
    coreLimit: 1200m
    memory: 512m
    template:
      spec:
        containers:
        - name: spark-kubernetes-executor
  restartPolicy:
    type: Never

For more information, see Use Spark History Server to view information about Spark jobs.

Configure Simple Log Service to collect Spark logs

When running many Spark jobs in the cluster, it is recommended to use Simple Log Service to uniformly collect all Spark job logs for querying and analyzing the stdout and stderr logs of Spark containers.

Expand to view the example job

This code uses Simple Log Service in Spark jobs to collect logs located at /opt/spark/logs/*.log in the Spark container.

apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: spark
spec:
  type: Scala
  mode: cluster
  # Replace <SPARK_IMAGE> with the Spark image built in step one
  image: <SPARK_IMAGE>
  imagePullPolicy: IfNotPresent
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.4.jar
  arguments:
  - "5000"
  sparkVersion: 3.5.4
  # Read the log configuration file log4j2.properties from the specified ConfigMap
  sparkConfigMap: spark-log-conf
  sparkConf:
    # Event log
    spark.eventLog.enabled: "true"
    spark.eventLog.dir: file:///mnt/nas/spark/event-logs
  driver:
    cores: 1
    coreLimit: 1200m
    memory: 512m
    template:
      spec:
        containers:
        - name: spark-kubernetes-driver
          volumeMounts:
          - name: nas
            subPath: spark/event-logs
            mountPath: /mnt/nas/spark/event-logs
        serviceAccount: spark-operator-spark
        volumes:
        - name: nas
          persistentVolumeClaim:
            claimName: nas-pvc
  executor:
    instances: 1
    cores: 1
    coreLimit: 1200m
    memory: 512m
    template:
      spec:
        containers:
        - name: spark-kubernetes-executor
  restartPolicy:
    type: Never

For more information, see Use Simple Log Service to collect the logs of Spark jobs.

Performance Optimization

Process	Description
Improve Shuffle performance through RSS	Shuffle is an important operation in distributed computing, often accompanied by a large amount of disk IO, data serialization, and network IO, which can easily lead to OOM and data retrieval failures (Fetch failures). To optimize Shuffle performance and stability and improve the quality of computing services, you can use Apache Celeborn as the Remote Shuffle Service (RSS) in the Spark job configuration. Expand to view the example code Refer to Use Celeborn to enable RSS for Spark jobs to deploy the ack-celeborn component in the cluster, and then you can submit Spark jobs based on the code below, using Celeborn as RSS. apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-pagerank namespace: spark spec: type: Scala mode: cluster # Replace <SPARK_IMAGE> with your Spark image image: <SPARK_IMAGE> imagePullPolicy: IfNotPresent mainClass: org.apache.spark.examples.SparkPageRank mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.4.jar arguments: # Specify the input test dataset, replace <OSS_BUCKET> with the OSS Bucket name - oss://<OSS_BUCKET>/data/pagerank_dataset.txt # Number of iterations - "10" sparkVersion: 3.5.4 hadoopConf: # Configure Spark job access to OSS fs.oss.impl: org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem # Replace <OSS_ENDPOINT> with the OSS access endpoint, for example, the internal network access endpoint of OSS in Beijing is oss-cn-beijing-internal.aliyuncs.com fs.oss.endpoint: <OSS_ENDPOINT> fs.oss.credentials.provider: com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider # Read the log configuration file log4j2.properties from the specified ConfigMap sparkConfigMap: spark-log-conf sparkConf: # Event log spark.eventLog.enabled: "true" spark.eventLog.dir: file:///mnt/nas/spark/event-logs # Celeborn related configuration spark.shuffle.manager: org.apache.spark.shuffle.celeborn.SparkShuffleManager spark.serializer: org.apache.spark.serializer.KryoSerializer # Configure based on the number of Celeborn master replicas spark.celeborn.master.endpoints: celeborn-master-0.celeborn-master-svc.celeborn.svc.cluster.local,celeborn-master-1.celeborn-master-svc.celeborn.svc.cluster.local,celeborn-master-2.celeborn-master-svc.celeborn.svc.cluster.local spark.celeborn.client.spark.shuffle.writer: hash spark.celeborn.client.push.replicate.enabled: "false" spark.sql.adaptive.localShuffleReader.enabled: "false" spark.sql.adaptive.enabled: "true" spark.sql.adaptive.skewJoin.enabled: "true" spark.shuffle.sort.io.plugin.class: org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO spark.dynamicAllocation.shuffleTracking.enabled: "false" spark.executor.userClassPathFirst: "false" driver: cores: 1 coreLimit: "1" memory: 4g template: spec: containers: - name: spark-kubernetes-driver envFrom: # Read environment variables from the specified Secret - secretRef: name: spark-oss-secret volumeMounts: - name: nas subPath: spark/event-logs mountPath: /mnt/nas/spark/event-logs volumes: - name: nas persistentVolumeClaim: claimName: nas-pvc serviceAccount: spark-operator-spark executor: instances: 2 cores: 1 coreLimit: "1" memory: 4g template: spec: containers: - name: spark-kubernetes-executor envFrom: # Read environment variables from the specified Secret - secretRef: name: spark-oss-secret restartPolicy: type: Never For more information, see Use Celeborn as RSS in Spark jobs.
Define elastic resource scheduling priority	By using Elastic Container Instance-based pods and configuring appropriate scheduling strategies, you can create on-demand and pay based on actual resource usage, effectively reducing the cost waste caused by idle cluster resources. In scenarios where ECS instances and elastic container instances are mixed, you can also specify scheduling priorities. You do not need to modify scheduling-related configurations in SparkApplication. The ACK scheduler will automatically complete Pod scheduling based on the configured elastic strategy. You can flexibly customize the mixed use of various elastic resources (such as ECS instances and elastic container instances) as needed. Expand to view the example elastic strategy The following example customizes an elastic strategy: for Pods launched by Spark Operator in the `spark` namespace, ECS resources are preferred and up to 10 Pods can be scheduled to ECS instances. When ECS resources are insufficient, elastic container instances are used and up to 10 Pods can be scheduled to elastic container instances. apiVersion: scheduling.alibabacloud.com/v1alpha1 kind: ResourcePolicy metadata: name: spark namespace: spark spec: # Specify the scheduling strategy applied to Pods through label selectors selector: # For example, specify the scheduling strategy applied to Pods submitted through Spark Operator sparkoperator.k8s.io/launched-by-spark-operator: "true" strategy: prefer # Scheduling unit configuration # During scale-out, scale-out will be performed in the order of scheduling units; during scale-in, scale-in will be performed in the reverse order of scheduling units. units: # The first scheduling unit uses ECS resources, and up to 10 Pods can be scheduled to ECS instances - resource: ecs max: 10 # The scheduler will update the label information to the Pod podLabels: # This is a special label that will not be updated to the Pod k8s.aliyun.com/resource-policy-wait-for-ecs-scaling: "true" # When using ECS resources, you can specify schedulable nodes through node selectors nodeSelector: # For example, select ECS nodes of the pay-as-you-go type node.alibabacloud.com/instance-charge-type: PostPaid # The second scheduling unit uses ECI resources, and up to 10 Pods can be scheduled to ECI instances - resource: eci max: 10 # Ignore Pods that were already scheduled before the ResourcePolicy was created when counting the number of Pods ignorePreviousPod: false # Ignore Pods in the Terminating state when counting the number of Pods ignoreTerminatingPod: true # Preemption strategy # BeforeNextUnit indicates that the scheduler will attempt preemption when each Unit scheduling fails # AfterAllUnits indicates that ResourcePolicy will only attempt preemption when the last Unit scheduling fails preemptPolicy: AfterAllUnits # Under what circumstances Pods are allowed to use resources in subsequent Units whenTryNextUnits: # When one of the following two conditions is met, resources in subsequent Units are allowed to be used # 1. The Max of the current Unit is set, and the number of Pods in that Unit is greater than or equal to the set Max value; # 2. The Max of the current Unit is not set, and the PodLabels of that Unit contain the label k8s.aliyun.com/resource-policy-wait-for-ecs-scaling: "true", and after waiting for a timeout. policy: TimeoutOrExceedMax # When the policy is TimeoutOrExceedMax, if the current Unit resources are insufficient to schedule Pods, wait in the current Unit, and the maximum waiting time is timeout. # This strategy can be used with auto-scaling and ECI to achieve the effect of prioritizing node pool auto-scaling and automatically using ECI after a timeout. timeout: 30s For more information, see Use elastic container instances to run Spark jobs.
Configure dynamic resource allocation	Dynamic Resource Allocation (DRA) can dynamically adjust the computing resources used by jobs based on the size of the workload. You can enable dynamic resource allocation for Spark jobs to avoid long job execution times due to insufficient resources or resource waste due to excess resources. Expand to view the example job This example job further configures dynamic resource allocation in combination with Celeborn RSS. apiVersion: sparkoperator.k8s.io/v1beta2 kind: SparkApplication metadata: name: spark-pagerank namespace: spark spec: type: Scala mode: cluster # Replace <SPARK_IMAGE> with your Spark image image: <SPARK_IMAGE> imagePullPolicy: IfNotPresent mainClass: org.apache.spark.examples.SparkPageRank mainApplicationFile: local:///opt/spark/examples/jars/spark-examples_2.12-3.5.4.jar arguments: # Specify the input test dataset, replace <OSS_BUCKET> with the OSS Bucket name - oss://<OSS_BUCKET>/data/pagerank_dataset.txt # Number of iterations - "10" sparkVersion: 3.5.4 hadoopConf: # Configure Spark job access to OSS fs.oss.impl: org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem # Replace <OSS_ENDPOINT> with the OSS access endpoint, for example, the internal network access endpoint of OSS in Beijing is oss-cn-beijing-internal.aliyuncs.com fs.oss.endpoint: <OSS_ENDPOINT> fs.oss.credentials.provider: com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider # Read the log configuration file log4j2.properties from the specified ConfigMap sparkConfigMap: spark-log-conf sparkConf: # ==================== # Event log # ==================== spark.eventLog.enabled: "true" spark.eventLog.dir: file:///mnt/nas/spark/event-logs # ==================== # Celeborn # Ref: https://github.com/apache/celeborn/blob/main/README.md#spark-configuration # ==================== # Shuffle manager class name changed in 0.3.0: # before 0.3.0: `org.apache.spark.shuffle.celeborn.RssShuffleManager` # since 0.3.0: `org.apache.spark.shuffle.celeborn.SparkShuffleManager` spark.shuffle.manager: org.apache.spark.shuffle.celeborn.SparkShuffleManager # Must use kryo serializer because java serializer do not support relocation spark.serializer: org.apache.spark.serializer.KryoSerializer # Configure based on the number of Celeborn master replicas. spark.celeborn.master.endpoints: celeborn-master-0.celeborn-master-svc.celeborn.svc.cluster.local,celeborn-master-1.celeborn-master-svc.celeborn.svc.cluster.local,celeborn-master-2.celeborn-master-svc.celeborn.svc.cluster.local # options: hash, sort # Hash shuffle writer use (partition count) * (celeborn.push.buffer.max.size) * (spark.executor.cores) memory. # Sort shuffle writer uses less memory than hash shuffle writer, if your shuffle partition count is large, try to use sort hash writer. spark.celeborn.client.spark.shuffle.writer: hash # We recommend setting `spark.celeborn.client.push.replicate.enabled` to true to enable server-side data replication # If you have only one worker, this setting must be false # If your Celeborn is using HDFS, it's recommended to set this setting to false spark.celeborn.client.push.replicate.enabled: "false" # Support for Spark AQE only tested under Spark 3 spark.sql.adaptive.localShuffleReader.enabled: "false" # we recommend enabling aqe support to gain better performance spark.sql.adaptive.enabled: "true" spark.sql.adaptive.skewJoin.enabled: "true" # If the Spark version is 3.5.0 or later, configure this parameter to support dynamic resource allocation spark.shuffle.sort.io.plugin.class: org.apache.spark.shuffle.celeborn.CelebornShuffleDataIO spark.executor.userClassPathFirst: "false" # ==================== # Dynamic resource allocation # Ref: https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation # ==================== # Enable dynamic resource allocation spark.dynamicAllocation.enabled: "true" # Enable shuffled file tracing and implement dynamic resource allocation without relying on ESS. # If the Spark version is 3.4.0 or later, we recommend that you disable this feature when you use Celeborn as RSS. spark.dynamicAllocation.shuffleTracking.enabled: "false" # The initial value of the number of executors. spark.dynamicAllocation.initialExecutors: "3" # The minimum number of executors. spark.dynamicAllocation.minExecutors: "0" # The maximum number of executors. spark.dynamicAllocation.maxExecutors: "10" # The idle timeout period of the executor. If the timeout period is exceeded, the executor is released. spark.dynamicAllocation.executorIdleTimeout: 60s # The idle timeout period of the executor that cached the data block. If the timeout period is exceeded, the executor is released. The default value is infinity, which specifies that the data block is not released. # spark.dynamicAllocation.cachedExecutorIdleTimeout: # When a job exceeds the specified scheduled time, additional executors are requested. spark.dynamicAllocation.schedulerBacklogTimeout: 1s # After each time interval, subsequent batches of executors are requested. spark.dynamicAllocation.sustainedSchedulerBacklogTimeout: 1s driver: cores: 1 coreLimit: "1" memory: 4g template: spec: containers: - name: spark-kubernetes-driver envFrom: # Read environment variables from the specified Secret - secretRef: name: spark-oss-secret volumeMounts: - name: nas subPath: spark/event-logs mountPath: /mnt/nas/spark/event-logs volumes: - name: nas persistentVolumeClaim: claimName: nas-pvc serviceAccount: spark-operator-spark executor: cores: 1 coreLimit: "1" memory: 4g template: spec: containers: - name: spark-kubernetes-executor envFrom: # Read environment variables from the specified Secret - secretRef: name: spark-oss-secret restartPolicy: type: Never For more information, see Configure dynamic resource allocation for Spark jobs.
Use Fluid to accelerate data access	If your data is located in a data center or encounters a performance bottleneck during data access, you can use the data access and distributed cache orchestration capabilities provided by Fluid to accelerate data access. For more information, see Use Fluid to accelerate data access for Spark applications.

References

You can use Managed Service for Prometheus under the Application Monitoring > Cluster Pod Monitoring tab.
For Spark job queue management and resource allocation optimization, refer to Use ack-kube-queue to manage AI and machine learning workloads.
Implement fine-grained resource quota management with ElasticQuotaTree to enhance resource utilization. For details, see Improve resource utilization by using ElasticQuotaTree and ack-kube-queue.
For multi-cluster Spark job scheduling and distribution to fully utilize idle resources across multiple ACK clusters, see Use idle resources to schedule and distribute Spark jobs in multiple clusters.