By Zhang Rui.
Pictured above is Zhang Rui, head of the Alibaba Database Technologies Team.
Alibaba's Double 11 Shopping Festival is certainly a grand project in the Internet age. It has pushed forwards loads of new technological innovations, and has provided Alibaba Cloud with a unique platform to test all sorts of new technologies.
This year Alibaba aims to support the highest possible QPS value at the midnight launch of the event, which is when traffic hits its highest peak. Doing so will allow Alibaba to provide the best user experience possible. However, this requires a powerful solution that offers the greatest levels of elasticity and stability while still being cost effective.
In this article, we will look into how Alibaba Cloud can achieve this with its latest generation of cloud database technologies.
As you probably all know, making databases elastic is a tough job because databases can be demanding in terms of performance and because the migration of massive amounts of data can also be very costly. The first approach to this problem of making databases more elastic is to deploy them in the cloud. The elasticity of the cloud can help secure the resources demanded by databases.
However, there are several challenge involved with deploying databases in the cloud. Consider these major challenges.
Through several years of research and development, we have created a series of solutions to overcome all of the above challenges. First, databases deployed in the cloud can use Alibaba Cloud's high-performance Elastic Compute Service (ECS) service solution that uses the Storage Performance Development Kit (SPDK), Data Plane Development Kit (DPDK) frameworks, and NVM Express (NVMe) storage. This solution can dramatically minimize virtualization losses. Second, we built the Hybrid Cloud Database Management (HDM) solution, which allows you to easily manage both your on- and off-cloud environments at the same time. Through using this solution, we were able to quickly connect all of our internal hybrid cloud systems in support of the Double 11 shopping festival. And third, you can use Alibaba Cloud's Virtual Private Cloud (VPC) to achieve a hybrid cloud connectivity solution. Internally, we used it to connect Alibaba Group's internal network with Alibaba Cloud's public cloud network.
Sometimes, cloud resources may not be enough for us to achieve the ultimate level of elasticity. However, with the help of offline/online hybrid deployment technologies, databases can use the computing resources of offline clusters to maximize elasticity and minimize costs. Solutions such as containerization and the isolating storage from computing are fundamental to successful offline/online hybrid deployment. This is because Containerization isolates and centrally schedules computing nodes, and isolating storage from computing is an important foundation for the elastic scheduling capability of databases. Major technological advances over the last few years, such as 25G speed networking, RDMA technologies, and high-performance distributed storage, have made isolated storage a reality.
The preceding figure shows the isolated storage database architecture use at Alibaba. This architecture contains a storage, network, and computing layer. The storage layer uses Alibaba's proprietary distributed storage system, named Apsara Distributed File System (nicknamed Pangu), and the computing nodes of the database are deployed on Alibaba's exclusive solution, named PouchContainer, where they are connected with the storage nodes through a 25G high-speed network.
Next, for the successful isolation of storage in the above solution, we worked hard to optimize the Apsara Distributed File System. Here's what we did:
We also worked hard to optimize the database side. Among optimizations, he most important one was the reduction of the traffic volume between computing and storage nodes, which in turn helped to limit the overall impact of network latency on database performance. To do this, we first doubled the database throughput by optimizing the synchronization of redo logs, and then disabled the double write buffer of databases, as Apsara Distributed File System supports atomic writes. Making these changes helped to increase database throughput by 20% and save a total of 100% of network bandwidth resources.
The changes of containerization and isolated storage allowed the databases to be stateless and have superior scheduling capabilities. This in turn also meant that, during the peak hours of Double 11, through mounting shared storage to different offline computing clusters, the databases could be scaled quickly and achieve faster elasticity.
When it came to developing better databases at Alibaba, we moved from originally using Oracle databases to developing our own in-house solutions. In particular, we developed AliSQL, which is based on MySQL, and the distributed middleware called Taobao Distributed Data Layer (TDDL). Now, starting from 2016, we have been working hard on developing another generation of database technology, one which we call X-DB, with X representing our pursuit for the ultimate level of performance and capabilities.
Alibaba's business scenarios are highly demanding on database performance:
These requirements define several of the important features that we require in the new generation of databases: strong consistency, global deployment capabilities, a distributed structure, high performance, high availability, and the automatic management of the data lifecycle.
Below is the architecture of X-DB:
The XDB architecture, as shown in the figure above, introduces the Paxos distributed consensus protocol stack and can be remotely deployed. Despite an increased network latency, the X-DB database still maintains high performance with a high throughput, equalizing with an isolated host under a three-node mode in the same region. With this architecture, X-DB is also highly tolerant of network jitters.
Now, let's consider some of the core technologies of X-DB:
The above comparison is of our in-house X-DB solution against Oracle's official Group Replication. This is a standard sysbench test with all three nodes deployed in the same data center. In the Insert scenario, X-DB records a QPS value that is 2.4 times of that of typical MySQL database and with less response time.
Next, this is a standard sysbench test under the remote deployment mode. In the Insert scenario, X-DB (50,366 QPS) shows a commanding advantage by handling nearly 6 times the QPS of MySQL Group Replication (8,481 QPS) within a much shorter response time. X-DB's response time is 58 ms, 38% of the 150 ms response time of MySQL Group Replication.
By replacing a traditional master/slave mode with the same-region cross-zone three-node deployment mode, we are able to guarantee high data quality and high availability across zones. Features of this solution include strong cross-zone data consistency, zero loss of the unavailable data of a single zone, failback in seconds upon the unavailability of a single zone, and automatically closed failover and failback without the need of third-party components. Plus, all of this can be achieved with no extra cost compared with the traditional master/slave mode.
In cross-region deployment mode, we can achieve active geo-redundancy by using more fundamental database technologies and degrade three-region six-replica (in the master/slave mode) to three-region five-replica (in the three-region five-node four-data center mode). From the business perspective, these features have several clear advantages: strong cross-region data consistency, zero loss of unavailable data in a single region, high performance in cross-region strong synchronization scenarios, and optional prioritized failover to another replica in the same region.
X-KV is an exclusive reinforcement technology based on the official MySQL Memcached plugin. After much hard work to optimize it this year, X-KV now supports more data types, including non-unique indexes, combined indexes, the MultiGet function, and Online Schema change. The biggest difference is that it supports SQL conversion through TDDL. The advantages of X-KV are its superior read performance, strong consistency, and low response time. These advantages can reduce overall cost, particularly maintenance costs because applications can be migrated in a transparent manner thanks to X-KV's support of SQL.
TDDL for X-KV also provides the following features:
The turnover of the Double 11 Shopping Festival keeps growing. This in turn has kept the synchronization latency of the online shopper library and seller library high in the last couple of years. As a result, sellers weren't able to process Double 11 orders promptly, and the seller library suffered from poor performance due to many complex searches. In the past, we tried setting independent queues, combining synchronization channels, and optimizing the seller libraries for large sellers, but none of these completely solved the underlying issue.
The answer to this was ESDB, which is a proprietary distributed document database based on Alibaba Cloud Elasticsearch (ES) that supports SQL APIs, allowing applications to seamlessly migrate from SQL to ESDB. With this solution, dynamic secondary hashing was provided to larger sellers, which in turn eliminated the data synchronization bottleneck. At the same time, the ESDB solution also supports complex searches.
Here are the top four technical challenges for the database monitoring system:
The architecture of the database monitoring system has gone through three generations. First generation consisted of an agent and MySQL database, and the second generation consisted of an agent, DataHub, and distributed NoSQL database, and the third generation consisted of an agent, the real-time computing engine, and HiTSDB. Let's discuss the third generation below.
HiTSDB, Alibaba's own in-house time series database, is perfect for storing massive amounts of monitoring data, solving the issues of the previous generations. The real-time computing engine pre-processes performance data every second and stores the data in HiTSDB. This third generation of architecture is highly optimized, making sure that these monitoring capabilities do not degrade performance, even during extreme high-traffic scenarios. Moreover, having the monitoring system always on has helped Alibaba gain greater insights about our databases and locate issues quickly.
Alibaba possesses the most experienced database administrators (DBAs) in the industry, having a massive bank of performance diagnosis data. In the future, we aim to combine the experience of our DBAs and our knowledge in big data and machine intelligence, so that further database analysis and optimizations can be done by machines rather than DBAs three years from now. We see self-diagnosis, self-optimization, and self-O&M as the future trends in database technology.
This year's Double 11 also saw Alibaba's pilot run of CloudDBA. By analyzing monitoring data and all SQLs, we achieved SQL self-optimization, in particular the optimization of slow SQL calls, space optimization through the analysis of useless tables and useless indexes, access model optimization with SQL and KV, and the prediction of storage space growth.
Our vision of next year's Double 11 can be defined in three keywords: Higher, Faster, Smarter.
Alibaba Clouder - January 22, 2020
Alibaba Clouder - September 8, 2020
Alibaba Clouder - November 27, 2019
Alibaba Clouder - February 2, 2021
ApsaraDB - April 3, 2019
Alibaba Clouder - May 20, 2020
An encrypted and secure cloud storage service which stores, processes and accesses massive amounts of data from anywhere in the worldLearn More
An online computing service that offers elastic and secure virtual cloud servers to cater all your cloud hosting needs.Learn More
An end-to-end platform that provides various machine learning algorithms to meet your data mining and analysis requirements.Learn More
More Posts by Alibaba Clouder