You can run Spark jobs in the Container Service for Kubernetes (ACK) console and use Alluxio to accelerate data processing in a distributed manner. ACK provides Spark Operator to simplify the procedure of submitting Spark jobs. ACK also provides Spark History Server to record historical data of Spark jobs. This facilitates troubleshooting. This topic describes how to set up an environment in the ACK console to run Spark jobs.
Background information
Create an ACK cluster
- When you set the instance type of worker nodes, select ecs.d1ne.6xlarge in the Big Data Network Performance Enhanced instance family and set the number of worker nodes to 20.
- Each worker node of the ecs.d1ne.6xlarge instance type has 12 HDDs. Each HDD has a storage capacity of 5 TB. Before you mount the HDDs, you must partition and format them. For more information, see Partition and format a data disk larger than 2 TiB in size.
- After you partition and format the HDDs, mount them to the ACK cluster. You can run
the
df -h
command to query the mount information of the HDDs. The Figure 1 figure shows an example of the command output. - The 12 file paths under the /mnt directory are used in the configuration file of Alluxio. ACK provides a simplified method to mount data disks when the cluster has a large number of nodes. For more information, see Use LVM to manage local storage.

Create an OSS bucket
You must create an Object Storage Service (OSS) bucket to store data, including the test data generated by TPC-DS, test results, and test logs. In this example, the name of the OSS bucket is cloudnativeai. For more information about how to create an OSS bucket, see Create buckets.
Install ack-spark-operator
You can install ack-spark-operator and use the component to simplify the procedure of submitting Spark jobs.
- Log on to the ACK console.
- In the left-side navigation pane of the ACK console, choose .
- On the Marketplace page, click the App Catalog tab. Find and click ack-spark-operator.
- On the ack-spark-operator page, click Deploy.
- In the Deploy wizard, select a cluster and namespace, and then click Next.
- On the Parameters wizard page, set the parameters and click OK.
Install ack-spark-history-server
ack-spark-history-server generates logs and events for Spark jobs and provides a user interface to help you troubleshoot issues.
When you install ack-spark-history-server, you must specify parameters related to the OSS bucket on the Parameters wizard page. The OSS bucket is used to store historical data of Spark jobs. For more information about how to install ack-spark-history-server, see Install ack-spark-operator.
oss:
enableOSS: true
# Please input your accessKeyId
alibabaCloudAccessKeyId: ""
# Please input your accessKeySecret
alibabaCloudAccessKeySecret: ""
# oss bucket endpoint such as oss-cn-beijing.aliyuncs.com
alibabaCloudOSSEndpoint: ""
# oss file path such as oss://bucket-name/path
eventsDir: "oss://cloudnativeai/spark/spark-events"
kubectl get service ack-spark-history-server -n {YOUR-NAMESPACE}
Install Alluxio
You must run the Helm command to install Alluxio in the ACK console.