This topic compares the performance of ACK-based Spark SQL queries on 1 TB of data before and after the Alluxio distributed cache is used.
The following table lists the ACK cluster configurations.
|Cluster type||Standard dedicated cluster|
|Elastic Compute Service (ECS) instance||
|Number of worker nodes||20|
- Software version
- Apache Spark: 2.4.5
- Alluxio: 2.3.0
- Spark configurations
Parameter Value spark.driver.cores 5 spark.driver.memory (MB) 20480 spark.executor.cores 7 spark.executor.memory (MB) 20480 spark.executor.instances 20
The following table lists the amount of time consumed by the tests based on each benchmark. The queries are performed on 1 TB of data one after another.
|Benchmark||Total time consumed by 104 queries (Unit: minutes)|
|Spark with OSS||180|
|Spark with Alluxio Cold||145|
|Spark with Alluxio Warm||137|
The test results show that the query performance is improved after the Alluxio cache is used. The first time Alluxio is used, the query performance is not high because Alluxio has to cache data from Object Storage Service (OSS). The query performance will be greatly improved in subsequent tests.