Elastic Remote Direct Memory Access (eRDMA) allows you to process requests at ultra-low latency. This topic describes how to create a Spark cluster that contains eRDMA-enhanced Elastic Compute Service (ECS) instances as nodes and use Benchmark to test the load processing performance of the Spark cluster.
Background information
Benchmark is a performance benchmarking tool that is used to test load processing performance, including load execution time, transmission rate, throughput, and resource utilization.
Step 1: Make preparations
Before you test the load processing performance of a Spark cluster, make preparations to set up the environment that is required for the test. The preparations include preparing Hadoop and Spark machines, installing Hadoop, and installing and configuring eRDMA.
Prepare a Hadoop environment. If Hadoop clusters already exist, skip this step.
Requirements on hardware and software environments
Prepare the following Hadoop version, Spark version, and ECS instances:
Hadoop version: Hadoop 3.2.1.
Spark version: Spark 3.2.1.
ECS instances:
ECS instance type: See Overview.
Number of vCPUs per ECS instance: 16.
Number of ECS instances: Four. One ECS instance serves as the master node and the other three ECS instances serve as worker nodes in a Hadoop cluster.
Installation procedure
Log on to the ECS instance that serves as the master node.
For more information, see Connect to a Linux instance by using a password or key.
Configure eRDMA.
Install the required drivers.
For more information, see Configure eRDMA on an enterprise-level instance.
Configure network settings.
Run the following command to open the
hostsfile:vim /etc/hostsPress the I key to enter Insert mode and then modify the following content in the file:
192.168.201.83 poc-t5m0 master1 192.168.201.84 poc-t5w0 192.168.201.86 poc-t5w1 192.168.201.85 poc-t5w2NoteReplace the IP addresses with the IP addresses of actual eRDMA interfaces (ERIs).
Press the Esc key to exit Insert mode. Enter
:wqand press the Enter key to save and exit the file.
Configure Yet Another Resource Negotiator (YARN) settings.
NoteIf the default network interface controller (NIC) of the ECS instance supports eRDMA, you do not need to configure YARN settings.
Run the following commands in sequence to open the yarn-env.sh file:
cd /opt/hadoop-3.2.1/etc/hadoop vim yarn-env.shPress the I key to enter Insert mode and add the following content to the file:
RDMA_IP=`ip addr show eth1 | grep "inet\b" | awk '{print $2}' | cut -d/ -f1` export YARN_NODEMANAGER_OPTS="-Dyarn.nodemanager.hostname=$RDMA_IP"NoteReplace eth1 with an actual ERI name.
Press the Esc key to exit Insert mode. Enter
:wqand press the Enter key to save and exit the file.
Configure Spark.
NoteIf the default NIC of the ECS instance supports eRDMA, you do not need to configure Spark.
Run the following commands in sequence to open the spark-env.sh file:
cd /opt/spark-3.2.1-bin-hadoop3.2/conf vim spark-env.shPress the I key to enter Insert mode and add the following content to the file:
export SPARK_LOCAL_IP=`/sbin/ip addr show eth1 | grep "inet\b" | awk '{print $2}' | cut -d/ -f1`NoteReplace eth1 with an actual ERI name.
Press the Esc key to exit Insert mode. Enter
:wqand press the Enter key to save and exit the file.
Run the following command to start Hadoop Distributed File System (HDFS) and YARN:
$HADOOP_HOME/sbin/start-all.sh
Step 2: Download the Benchmark installation package
This section describes how to download the Benchmark installation package.
Run the following command to download the Benchmark installation package:
wget https://mracc-release.oss-cn-beijing.aliyuncs.com/erdma-spark/spark-erdma-jverbs.tar.gzRun the following command to decompress the
spark-erdma-jverbs.tar.gzinstallation package:tar -zxvf spark-erdma-jverbs.tar.gzThe following components are included in the installation package:
erdmalib: the native library that is required to run the spark-erdma plug-in. This library corresponds to the libdisni.so file.
plugin-sparkrdma: the plug-in and dependency library that support Spark RDMA, which correspond to the spark-eRDMA-1.0-for-spark-3.2.1.jar and disni-2.1-jar-with-dependencies.jar files.
Step 3: Run a Benchmark test
This section describes how to use Benchmark to test the load processing performance of the Spark cluster.
Run the following commands to modify IP routes.
NoteIf the default NIC of your ECS instance supports eRDMA, skip this step.
route del -net 192.168.201.0 netmask 255.255.255.0 metric 0 dev eth0 && \ route add -net 192.168.201.0 netmask 255.255.255.0 metric 1000 dev eth0NoteReplace the IP addresses with the gateway IP address of the actual ERI.
Configure Spark.
Run the following command to open the spark-jverbs-erdma.conf configuration file:
vim /opt/spark-3.2.1-bin-hadoop3.2/conf/spark-jverbs-erdma.confPress the I key to enter Insert mode and modify the following content in the file:
spark.master yarn spark.deploy-mode client #driver spark.driver.cores 4 spark.driver.memory 19g #executor spark.executor.instances 12 spark.executor.memory 10g spark.executor.cores 4 spark.executor.heartbeatInterval 60s #shuffle spark.task.maxFailures 4 spark.default.parallelism 36 spark.sql.shuffle.partitions 192 spark.shuffle.compress true spark.shuffle.spill.compress true #other spark.network.timeout 3600 spark.sql.broadcastTimeout 3600 spark.eventLog.enabled false spark.eventLog.dir hdfs://master1:9000/sparklogs spark.eventLog.compress true spark.yarn.historyServer.address master1:18080 spark.serializer org.apache.spark.serializer.KryoSerializer #eRDMA spark.driver.extraLibraryPath /path/erdmalib spark.executor.extraLibraryPath /path/erdmalib spark.driver.extraClassPath /path/spark-eRDMA-1.0-for-spark-3.2.1.jar:/path/disni-2.1-jar-with-dependencies.jar spark.executor.extraClassPath /path/spark-eRDMA-1.0-for-spark-3.2.1.jar:/path/disni-2.1-jar-with-dependencies.jar spark.shuffle.manager org.apache.spark.shuffle.sort.RdmaShuffleManager spark.shuffle.sort.io.plugin.class org.apache.spark.shuffle.rdma.RdmaLocalDiskShuffleDataIO spark.shuffle.rdma.recvQueueDepth 128NoteSet the
spark.shuffle.compressparameter tofalseto achieve a better acceleration ratio.In the preceding sample code, the Spark resource settings such as the
spark.executor.instances,spark.executor.memory,spark.executor.cores, andspark.sql.shuffle.partitionsparameters of an ECS instance that has 32 vCPUs and 128 GB of memory are used. Modify the Spark resource settings based on the actual cluster scale or instance specifications.
Press the Esc key to exit Insert mode. Enter
:wqand press the Enter key to save and exit the file.
Run the following commands in sequence to generate data:
cd /opt/spark-3.2.1-bin-hadoop3.2/conf spark-submit --properties-file /opt/spark-3.2.1-bin-hadoop3.2/conf/spark-normal.conf --class com.databricks.spark.sql.perf.tpcds.TPCDS_Bench_DataGen spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar hdfs://master1:9000/tmp/tpcds_400 tpcds_400 400 parquetNote400indicates the amount of data generated. Unit: GB. Change the value based on the cluster scale.Run the following command to run a Benchmark test:
spark-submit --properties-file /opt/spark-3.2.1-bin-hadoop3.2/conf/spark-jverbs-erdma.conf --class com.databricks.spark.sql.perf.tpcds.TPCDS_Bench_RunAllQuery spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar all hdfs://master1:9000/tmp/tpcds_400 tpcds_400 /tmp/tpcds_400_resultThe following command output indicates that the test is complete. You can view the load execution time of the Spark cluster in the test result.
NoteYou can delete the spark-erdma plug-in configurations from the files in the Spark conf directory or log on to another Spark cluster that does not support eRDMA, and use the preceding method to perform another Benchmark test. Then, you can compare the two test results to know performance differences between a Spark cluster that supports eRDMA and a Spark cluster that does not support eRDMA.