The technology behind the TPCx-BB ranking

Background Introduction

Recently, TPC Benchmark Express-BigBench (TPCx BB) released its latest world ranking, and Alibaba Cloud's self-developed Citroen big data acceleration engine achieved the top spot in the TPCx BB SF3000 ranking.

TPCx BB testing is divided into two dimensions: performance and cost-effectiveness. Among them, in terms of performance, in this ranking, Alibaba Cloud leads the second place by 41.6%, reaching 2187.42 BBQpm, with a cost performance ratio of 40%, down to 346.53 USD/BBQpm.

(TPCx BB SF3000 Performance Dimension Ranking)

(TPCx BB SF3000 Cost Performance Dimension Ranking)

Take this opportunity to share with everyone the technical process behind this first.

Overview of Ershenlong Big Data Acceleration Engine MRACC

Alibaba Cloud's self-developed MRACC (Apasara Compute MapReduce Accelerator) is the killer weapon for achieving excellent results this time.

In today's rapidly increasing demand for data processing, many enterprises will use open source Spark, Hadoop components, or commonly used kits such as HDP and CDH to build their own open source big data clusters, processing data from terabytes to petabytes, and cluster sizes from a few to several thousand. The MRACC Citroen big data acceleration engine is designed for customer built scenarios, relying on the Citroen base to provide acceleration capabilities for commonly used components such as Spark, Hadoop, Alluxio, etc.

Combining the characteristics of Alibaba Cloud Dragon architecture, MRACC has undergone integrated software and hardware optimization, forming a unique performance advantage. Ultimately, it has improved the performance of complex SQL query scenarios by 2-3 times compared to the community version of Spark, and accelerated Spark performance by 30% using eRDMA. With the support of the Shenlong big data acceleration engine, enterprises using Alibaba Cloud ECS cloud servers to run big data clusters will achieve higher performance and cost-effectiveness.

Introduction to Three MRACC-Spark

Spark has been developing for ten years since its launch in 2010 and has now become the preferred engine for big data batch computing. MRACC has focused on optimizing the Spark engine, which is the most commonly used big data engine. Specifically, for the heavy IO feature of big data tasks, MRACC combines the advantages of cloud architecture in network and storage to accelerate software and hardware acceleration, including software SQL engine optimization, using cache, file clipping, indexing and other optimization methods, and trying to unload compression and other operations to heterogeneous devices; We also use eRDMA for network acceleration, running the data exchange during the shuffle phase on the eRDMA network, which reduces latency and significantly improves CPU utilization.

Four Spark SQL Engine Optimization

Since Spark2, Spark SQL, DataFrames, and Datasets interfaces have gradually replaced the basic RDD API as the mainstream programming model for Spark. The community has invested heavily in Spark SQL, and according to statistics, nearly half of the optimizations in Spark SQL will be focused on Spark SQL after the release of Spark 3.0. Using SparkSQL instead of Hive to perform offline tasks has become a mainstream choice for many enterprises.

We have made some optimizations for the analyzer, optimizer, planner, and query execution stages of the SQL engine. Spark3.0 has made significant improvements and optimizations to the SQL engine, with AQE and DP mechanisms receiving widespread attention. But currently, the AE mechanism of open-source Spark only supports partition pruning, and does not support non partition keys and subquery pruning. We have optimized this area to support dynamic data pruning for subquery, which can significantly reduce the amount of data involved in calculations.

In the execution phase of the physical plan, we support window top n sorting, which greatly improves the performance of SQL statements containing limits. We also support advanced features such as parquet rowgroup pruning and bloom filter join. The CBO mechanism of SPAKR SQL can effectively improve SQL execution efficiency, but in the CBO stage, excessive join tables can lead to a surge in CBO search costs. We support genetic algorithm search and solve the problem of excessive join tables causing a surge in costs.

In addition, it also supports functions such as deduplication, join foreign key elimination, integrity constraints, and in combination with deltake, it supports data addition, deletion, and modification operations.

Optimization of RDMA for Five Near Networks

At the 2021 Hangzhou Yunqi Conference, Alibaba Cloud released the fourth generation Shenlong architecture, providing the industry's first large-scale elastic RDMA acceleration capability. RDMA is a high-performance network transmission technology that provides direct memory access and data transfer via Kernel, thereby reducing CPU overhead and providing a high-performance network with low latency. In distributed computing, the shuffle process is essential and consumes a lot of computing and network resources, making it a key optimization focus for big data distributed computing. In response to the data exchange characteristics of Spark memory computing in the shuffle stage, shuffle data exchange can be transformed into a memory network memory mode, fully utilizing the characteristics of RDMA user mode memory direct interaction, low latency, and low CPU consumption. Ultimately, a 30% performance improvement was achieved on end-to-end benchmarks such as tpcxhs.

Six Performance Optimization Results

In the end, on the TPCDS 10T dataset, the performance improved by 2.19 times compared to the latest Spark3.1 version. Compared to second place, TPCx BB leads by 41.6%.

Figure 5 Data Effects of TPCDS and TPCx BB

Seven Prospects

At present, all these optimizations are packaged and delivered to customers in the form of plugins, and the customer code basically does not need to be modified, making it convenient for customers to use directly.

In the future, we will continue to integrate our software and hardware to maximize performance optimization capabilities to serve Alibaba Cloud's big data customers. In addition, we will continue to iterate on the integration of software and hardware performance optimization capabilities, and build MRACC Dragon's big data acceleration service capabilities with higher performance and lower costs to provide to users.

Attachment: Introduction to TPCx BB

TPCx-BB is an end-to-end big data testing benchmark based on retail scenarios, released by the International Standardization Testing Authority (TPC). It supports mainstream distributed big data processing engines, simulates the entire online and offline business processes, and has 30 query statements, involving descriptive process based queries, data mining, and machine learning algorithms. The testing of TPCx-BB has the characteristics of large data volume, complex features, and complex sources, which are closer to real business scenarios and have important reference significance for infrastructure selection in various industries.

The test results of TPCx BB can comprehensively and accurately reflect the overall operational performance of end-to-end big data systems. The test covers structured, semi-structured and unstructured data, and can more comprehensively evaluate the performance, cost performance, service and power consumption of big data system software and hardware from the perspective of customer's actual scenarios.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us