Alibaba Cloud Technology Sets World Sorting Records

October 30, 2015

Alibaba Cloud's FuxiSort solution sorts 100TB of data in 377 seconds

Hangzhou, China, October 30, 2015 – Alibaba Cloud, Alibaba Group's (NYSE: BABA) cloud computing division, is pleased to announce that FuxiSort, the company's distributed computation framework, has set new world records for the GraySort and MinuteSort benchmarks in both the Daytona (general-purpose sorting) and Indy (sorting only 100B records) categories.

In 2015 results recently released on the Sort Benchmark website, Alibaba Cloud's FuxiSort took less than six-and-a-half minutes (377 seconds) to sort 100 TB of data, crushing a 23.4-minute record set in 2014 by Apache Spark, which had itself replaced a previous Hadoop record of 72 minutes.

The Daytona category of GraySort and MinuteSort benchmarks have long been considered the gold standard for measuring the scalability and efficiency of general-purpose distributed computing systems. These latest results reflect Alibaba Cloud's leadership in handling extremely demanding Internet-scale, data-intensive computation workloads. Sort Benchmark was first held in 1987 with single systems, and gradually accepted computing clusters as processing hardware from 1998. GraySort, named after the pioneering computer scientist Jim Gray, has evolved over the years into a benchmark for sorting at least 100TB of data, while MinuteSort focuses on sorting as much data as possible in one minute.

The Alibaba Cloud team employed a cluster of 3,377 commodity servers1 to set the Daytona GraySort record of 15.9TB/min and Daytona MinuteSort record of 7.7TB, an improvement of 3.6x and 2.1x over the previous records respectively.

Sort Benchmark competition 2014 World Records 2015 World Records
Daytona GraySort Apache Spark: 4.27TB/min. FuxiSort: 15.9TB/min.
UCSD: 4.35TB/min.
Indy GraySort Baidu: 8.38TB/min. FuxiSort: 18.2TB/min.
Daytona MinuteSort Samsung: 3.7TB/min. FuxiSort: 7.7TB/min.
Indy MinuteSort Baidu: 7.0TB/min. FuxiSort: 11TB/min.

Source: SortBenchmark.org. The larger the number, the better the performance.

"Making a clean sweep of the 2015 GraySort and MinuteSort categories for both Daytona and Indy categories with FuxiSort in our first year of participation is a clear validation of Alibaba Cloud's performance leadership. We cannot rest on our laurels and will strive to process even higher volumes of data in shorter times going forward. Ultimately, our ultimate goal is to offer our customers the best possible experience at all times," said Chao Li, team leader, Fuxi.

"As more mobile devices and sensors from the Internet of Things put data online, we will be capturing and analyzing ever larger volumes of data in various formats. Gaining accurate, actionable insights affordably and quickly from increasingly large volumes of data will require smarter technologies. Alibaba Cloud has proven expertise in this field, and we are committed to pushing the state-of-the-art technologies harder, faster, and further."

FuxiSort is built on top of Apsara, a general-purpose computing system developed in-house from scratch by Alibaba Cloud. Apsara, which debuted in 2011, manages cluster resources within a data center, and schedules parallel execution for a wide range of distributed online and offline applications. Apsera is the foundation for the majority of public cloud services offered by Alibaba Cloud, including Open Data Processing Service (ODPS), Open Storage Service (OSS) and Open Table Service (OTS). It supports all data-processing workloads within Alibaba Group as well. Fuxi, named after a god in Chinese mythology, is the framework that handles cluster-resource management and job scheduling within Apsara.

Apsara has been deployed on hundreds of thousands of physical servers in Alibaba Cloud data centers. A single Apsara cluster can be scaled up to 5,000 servers with hundreds of petabytes of storage and hundreds of thousands of CPU cores, making this unique computational engine one of the most powerful of its kind in the world. Together, they form the backbone of Alibaba Cloud's comprehensive suite of cloud services.

For further technical details of FuxiSort and Apsara, please refer to the technical report at http://sortbenchmark.org/FuxiSort2015.pdf. For more information on the Sort Benchmarks and Benchmark Categories, please visit http://sortbenchmark.org.

About Alibaba Cloud

Established in September 2009, Alibaba Cloud , Alibaba Group's cloud computing division, develops highly scalable platforms for cloud computing and data management. It provides a comprehensive suite of cloud computing services to support the participants of Alibaba Group's online and mobile commerce ecosystem, including sellers, and other third-party customers and businesses. Alibaba Cloud is a business unit within Alibaba Group.

1 3,134 nodes x (dual Xeon E5-2630 2.30Ghz, 96 GB memory, 12x2 TB SATA HD, 10 Gb/s Ethernet) and 243 nodes x (dual Xeon E5-2650v2 2.60Ghz, 128 GB memory, 12x2 TB SATA HD, 10 Gb/s Ethernet)