All Products
Search
Document Center

Elastic Compute Service:Deploy a Spark cluster on eRDMA-enhanced instances

Last Updated:Jul 04, 2024

Elastic Remote Direct Memory Access (eRDMA) allows you to process requests at ultra-low latency. This topic describes how to create a Spark cluster that contains eRDMA-enhanced Elastic Compute Service (ECS) instances as nodes and use Benchmark to test the load processing performance of the Spark cluster.

Background information

Benchmark is a performance benchmarking tool that is used to test load processing performance, including load execution time, transmission rate, throughput, and resource utilization.

Step 1: Make preparations

Before you test the load processing performance of a Spark cluster, make preparations to set up the environment that is required for the test. The preparations include preparing Hadoop and Spark machines, installing Hadoop, and installing and configuring eRDMA.

  1. Prepare a Hadoop environment. If Hadoop clusters already exist, skip this step.

    • Requirements on hardware and software environments

      Prepare the following Hadoop version, Spark version, and ECS instances:

      • Hadoop version: Hadoop 3.2.1.

      • Spark version: Spark 3.2.1.

      • ECS instances:

        • ECS instance type: See Overview.

        • Number of vCPUs per ECS instance: 16.

        • Number of ECS instances: Four. One ECS instance serves as the master node and the other three ECS instances serve as worker nodes in a Hadoop cluster.

    • Installation procedure

  2. Log on to the ECS instance that serves as the master node.

    For more information, see Connect to a Linux instance by using a password or key.

  3. Configure eRDMA.

    • Install the required drivers.

      For more information, see Configure eRDMA on an enterprise-level instance.

    • Configure network settings.

      1. Run the following command to open the hosts file:

        vim /etc/hosts
      2. Press the I key to enter Insert mode and then modify the following content in the file:

        192.168.201.83 poc-t5m0        master1
        192.168.201.84 poc-t5w0
        192.168.201.86 poc-t5w1
        192.168.201.85 poc-t5w2
        Note

        Replace the IP addresses with the IP addresses of actual eRDMA interfaces (ERIs).

      3. Press the Esc key to exit Insert mode. Enter :wq and press the Enter key to save and exit the file.

    • Configure Yet Another Resource Negotiator (YARN) settings.

      Note

      If the default network interface controller (NIC) of the ECS instance supports eRDMA, you do not need to configure YARN settings.

      1. Run the following commands in sequence to open the yarn-env.sh file:

        cd /opt/hadoop-3.2.1/etc/hadoop
        vim yarn-env.sh
      2. Press the I key to enter Insert mode and add the following content to the file:

        RDMA_IP=`ip addr show eth1 | grep "inet\b" | awk '{print $2}' | cut -d/ -f1`
        export YARN_NODEMANAGER_OPTS="-Dyarn.nodemanager.hostname=$RDMA_IP"
        Note

        Replace eth1 with an actual ERI name.

      3. Press the Esc key to exit Insert mode. Enter :wq and press the Enter key to save and exit the file.

    • Configure Spark.

      Note

      If the default NIC of the ECS instance supports eRDMA, you do not need to configure Spark.

      1. Run the following commands in sequence to open the spark-env.sh file:

        cd /opt/spark-3.2.1-bin-hadoop3.2/conf
        vim spark-env.sh
      2. Press the I key to enter Insert mode and add the following content to the file:

        export SPARK_LOCAL_IP=`/sbin/ip addr show eth1 | grep "inet\b" | awk '{print $2}' | cut -d/ -f1`
        Note

        Replace eth1 with an actual ERI name.

      3. Press the Esc key to exit Insert mode. Enter :wq and press the Enter key to save and exit the file.

  4. Run the following command to start Hadoop Distributed File System (HDFS) and YARN:

    $HADOOP_HOME/sbin/start-all.sh

Step 2: Download the Benchmark installation package

This section describes how to download the Benchmark installation package.

  1. Run the following command to download the Benchmark installation package:

    wget https://mracc-release.oss-cn-beijing.aliyuncs.com/erdma-spark/spark-erdma-jverbs.tar.gz
  2. Run the following command to decompress the spark-erdma-jverbs.tar.gz installation package:

    tar -zxvf spark-erdma-jverbs.tar.gz

    The following components are included in the installation package:

    • erdmalib: the native library that is required to run the spark-erdma plug-in. This library corresponds to the libdisni.so file.

    • plugin-sparkrdma: the plug-in and dependency library that support Spark RDMA, which correspond to the spark-eRDMA-1.0-for-spark-3.2.1.jar and disni-2.1-jar-with-dependencies.jar files.

Step 3: Run a Benchmark test

This section describes how to use Benchmark to test the load processing performance of the Spark cluster.

  1. Run the following commands to modify IP routes.

    Note

    If the default NIC of your ECS instance supports eRDMA, skip this step.

    route del -net 192.168.201.0 netmask 255.255.255.0 metric 0 dev eth0 && \
    route add -net 192.168.201.0 netmask 255.255.255.0 metric 1000 dev eth0
    Note

    Replace the IP addresses with the gateway IP address of the actual ERI.

  2. Configure Spark.

    1. Run the following command to open the spark-jverbs-erdma.conf configuration file:

      vim /opt/spark-3.2.1-bin-hadoop3.2/conf/spark-jverbs-erdma.conf
    2. Press the I key to enter Insert mode and modify the following content in the file:

      spark.master yarn
      spark.deploy-mode client
      #driver
      spark.driver.cores 4
      spark.driver.memory 19g
      #executor
      spark.executor.instances 12
      spark.executor.memory 10g
      spark.executor.cores 4
      spark.executor.heartbeatInterval   60s
      #shuffle
      spark.task.maxFailures 4
      spark.default.parallelism 36
      spark.sql.shuffle.partitions 192
      spark.shuffle.compress            true
      spark.shuffle.spill.compress      true
      
      #other
      spark.network.timeout 3600
      spark.sql.broadcastTimeout 3600
      spark.eventLog.enabled             false
      spark.eventLog.dir                 hdfs://master1:9000/sparklogs
      spark.eventLog.compress            true
      spark.yarn.historyServer.address   master1:18080
      spark.serializer                  org.apache.spark.serializer.KryoSerializer
      
      #eRDMA
      spark.driver.extraLibraryPath   /path/erdmalib
      spark.executor.extraLibraryPath   /path/erdmalib
      spark.driver.extraClassPath       /path/spark-eRDMA-1.0-for-spark-3.2.1.jar:/path/disni-2.1-jar-with-dependencies.jar
      spark.executor.extraClassPath     /path/spark-eRDMA-1.0-for-spark-3.2.1.jar:/path/disni-2.1-jar-with-dependencies.jar
      spark.shuffle.manager org.apache.spark.shuffle.sort.RdmaShuffleManager
      spark.shuffle.sort.io.plugin.class org.apache.spark.shuffle.rdma.RdmaLocalDiskShuffleDataIO
      spark.shuffle.rdma.recvQueueDepth  128
      Note
      • Set the spark.shuffle.compress parameter to false to achieve a better acceleration ratio.

      • In the preceding sample code, the Spark resource settings such as the spark.executor.instances, spark.executor.memory, spark.executor.cores, and spark.sql.shuffle.partitions parameters of an ECS instance that has 32 vCPUs and 128 GB of memory are used. Modify the Spark resource settings based on the actual cluster scale or instance specifications.

    3. Press the Esc key to exit Insert mode. Enter :wq and press the Enter key to save and exit the file.

  3. Run the following commands in sequence to generate data:

    cd /opt/spark-3.2.1-bin-hadoop3.2/conf
    spark-submit --properties-file /opt/spark-3.2.1-bin-hadoop3.2/conf/spark-normal.conf --class com.databricks.spark.sql.perf.tpcds.TPCDS_Bench_DataGen spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar hdfs://master1:9000/tmp/tpcds_400 tpcds_400 400 parquet
    Note

    400 indicates the amount of data generated. Unit: GB. Change the value based on the cluster scale.

  4. Run the following command to run a Benchmark test:

    spark-submit --properties-file /opt/spark-3.2.1-bin-hadoop3.2/conf/spark-jverbs-erdma.conf --class com.databricks.spark.sql.perf.tpcds.TPCDS_Bench_RunAllQuery spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar all hdfs://master1:9000/tmp/tpcds_400 tpcds_400 /tmp/tpcds_400_result

    The following command output indicates that the test is complete. You can view the load execution time of the Spark cluster in the test result.测试结果

    Note

    You can delete the spark-erdma plug-in configurations from the files in the Spark conf directory or log on to another Spark cluster that does not support eRDMA, and use the preceding method to perform another Benchmark test. Then, you can compare the two test results to know performance differences between a Spark cluster that supports eRDMA and a Spark cluster that does not support eRDMA.