Big Data Acceleration Engine Awarded by TPCx-BB
Brief introduction: Shenlong big data acceleration engine, aiming at common components of big data, such as Spark, Hadoop, Alluxio, etc., combined with the characteristics of Alibaba Cloud Shenlong architecture, performs software and hardware integration optimization to form a unique performance advantage, and finally enables complex SQL query scenarios Compared with the community version of spark, the performance is 2-3 times higher, and the performance of eRDMA accelerated Spark is improved by 30%.
Recently, Benchmark Express-BigBench (TPCx-BB for short) announced the latest world ranking, and the Shenlong Big Data Accelerator independently developed by Alibaba Cloud won the first place in the TPCx-BB@3000 world ranking.
The TPCx-BB test is divided into two dimensions: performance and cost performance. Among them, in terms of performance, in this ranking, Alibaba Cloud leads the second place by as much as 41.6%, reaching 2187.42 BBQpm, and the cost performance leads the second place by 40%, reducing to 346.53 USD/BBQpm.
TPCx-BB is an end-to-end big data test benchmark based on retail scenarios released by the International Standardization Testing Authority (TPC). It supports mainstream distributed big data processing engines and simulates the entire online and offline business processes. 30 query statements, involving descriptive procedural queries, data mining and machine learning algorithms. The TPCx-BB test has the characteristics of large amount of data, complex features, and complex sources. It is closer to real business scenarios and has important reference significance for infrastructure selection in various industries.
The test results of TPCx-BB can fully and accurately reflect the overall operating performance of the end-to-end big data system. The test covers structured, semi-structured and unstructured data, and can more comprehensively evaluate the software and hardware performance, cost performance, service and power consumption of big data systems from the perspective of customers' actual scenarios.
The MRACC (ApasaraCompute MapReduce Accelerator) self-developed by Alibaba Cloud is the trump card for being ranked No. 1 in the world this time. The Shenlong big data acceleration engine is aimed at the common components of big data, such as Spark, Hadoop, Alluxio, etc., combined with the characteristics of the Aliyun Shenlong architecture, and optimizes the integration of software and hardware to form a unique performance advantage. In the end, the performance of complex SQL query scenarios is comparable Compared with the community version of spark, it is 2-3 times faster, and the performance of Spark accelerated by using eRDMA is improved by 30%.
Specifically, in view of the IO-heavy characteristics of big data tasks, MRACC combines the advantages of cloud architecture in network and storage to accelerate software and hardware, including software SQL engine optimization, using caching, file clipping, indexing and other optimization methods, and trying to Operations such as compression are offloaded to heterogeneous devices; eRDMA is also used for network acceleration, and the data exchange in the shuffle stage is run on the eRDMA network, which reduces delay and greatly improves CPU utilization.
The combination of MRACC and Shenlong cloud server has brought new imagination space to big data on the cloud, and brought higher performance and cost performance to users.
Recently, Benchmark Express-BigBench (TPCx-BB for short) announced the latest world ranking, and the Shenlong Big Data Accelerator independently developed by Alibaba Cloud won the first place in the TPCx-BB@3000 world ranking.
The TPCx-BB test is divided into two dimensions: performance and cost performance. Among them, in terms of performance, in this ranking, Alibaba Cloud leads the second place by as much as 41.6%, reaching 2187.42 BBQpm, and the cost performance leads the second place by 40%, reducing to 346.53 USD/BBQpm.
TPCx-BB is an end-to-end big data test benchmark based on retail scenarios released by the International Standardization Testing Authority (TPC). It supports mainstream distributed big data processing engines and simulates the entire online and offline business processes. 30 query statements, involving descriptive procedural queries, data mining and machine learning algorithms. The TPCx-BB test has the characteristics of large amount of data, complex features, and complex sources. It is closer to real business scenarios and has important reference significance for infrastructure selection in various industries.
The test results of TPCx-BB can fully and accurately reflect the overall operating performance of the end-to-end big data system. The test covers structured, semi-structured and unstructured data, and can more comprehensively evaluate the software and hardware performance, cost performance, service and power consumption of big data systems from the perspective of customers' actual scenarios.
The MRACC (ApasaraCompute MapReduce Accelerator) self-developed by Alibaba Cloud is the trump card for being ranked No. 1 in the world this time. The Shenlong big data acceleration engine is aimed at the common components of big data, such as Spark, Hadoop, Alluxio, etc., combined with the characteristics of the Aliyun Shenlong architecture, and optimizes the integration of software and hardware to form a unique performance advantage. In the end, the performance of complex SQL query scenarios is comparable Compared with the community version of spark, it is 2-3 times faster, and the performance of Spark accelerated by using eRDMA is improved by 30%.
Specifically, in view of the IO-heavy characteristics of big data tasks, MRACC combines the advantages of cloud architecture in network and storage to accelerate software and hardware, including software SQL engine optimization, using caching, file clipping, indexing and other optimization methods, and trying to Operations such as compression are offloaded to heterogeneous devices; eRDMA is also used for network acceleration, and the data exchange in the shuffle stage is run on the eRDMA network, which reduces delay and greatly improves CPU utilization.
The combination of MRACC and Shenlong cloud server has brought new imagination space to big data on the cloud, and brought higher performance and cost performance to users.
Related Articles
-
A detailed explanation of Hadoop core architecture HDFS
Knowledge Base Team
-
What Does IOT Mean
Knowledge Base Team
-
6 Optional Technologies for Data Storage
Knowledge Base Team
-
What Is Blockchain Technology
Knowledge Base Team
Explore More Special Offers
-
Short Message Service(SMS) & Mail Service
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00