This topic describes the trends of the database industry and the evolution of PolarDB-X.

Industry trends

Database systems play a crucial role

Databases, operating systems (OSs), and middleware are three driving forces for system software. The three components are indispensable for the IT systems of enterprises, and are used to build application-level information management systems for enterprises on the Internet. Enterprises can use the three components to build the core platforms to store and manage data. A database system is the hub that processes and exchanges data for all application software, and plays a core role in the storage, query, analysis, and processing of data for fundamental software. The efficiency and stability of a database system and the programming languages supported by the database systems determine the performance of the upper-layer applications that use the database system and the efficiency of developers. Gartner is a globally trusted IT research and consulting company. Gartner's statistics in 2017 show that the worldwide revenue of basic enterprise software totals USD 195.852 billion, and that the worldwide revenue of databases totals USD 38.8 billion. This means that databases are a major revenue source of fundamental software and account for 20 percent of the total revenue.

Distributed databases are a trend

In recent years, the rapid development of the Internet and big data technologies has injected momentum into the explosive development of E-commerce in China. The transaction volume during Double 11 shopping festivals has experienced exponential growth, and E-commerce in China has skyrocketed and won international recognition. The rapid development of E-commerce brings high pressure on the business support systems in the backend. For example, transaction surges during Double 11 shopping festivals not only put heavy pressure on E-commerce websites that are accessed by customers but also present challenges for the business support systems, such as the logistics systems of logistics companies, the payment systems of banks, and the warehouse systems of retailers.

With the boost of inclusive financing and digital financing, capabilities of data management and processing become important for financial institutions to improve business performance in the new era. At the same time, flourishing mobile Internet and digital payments require financial systems to provide enhanced capabilities for new data models in specific scenarios, such as financial transactions between core accounts, business that uses online payments or mobile payments, and real-time transaction monitoring and metric analysis.

Traditional payment systems of banks adopt a technical architecture that uses mainframes or minicomputers provided by IBM, databases developed by Oracle, and storage systems provided by EMC. This architecture is known as the IOE architecture. Traditional payment systems based on the IOE architecture are costly and may result in the over-reliance of users on the products and services provided by IBM, Oracle, and EMC. The rise of the mobile Internet and digital payments is accompanied by the exponential growth of business in the core payment systems of banks. Traditional payment systems based on the IOE architecture are built in a centralized manner. With the breakdown of the Moore's law, the performance of a single server is bottlenecked. To resolve performance bottlenecks, technical personnel start to explore how to transform the traditional centralized systems into distributed systems. To improve performance at reduced costs, technical personnel use mid-end and low-end servers to scale out database instances instead of scaling up single high-performance databases. In this scenario, large-scale distributed transactional databases can become a trend to store and manage data in distributed systems.

Evolution of PolarDB-X

Overview

PolarDB-X is a distributed database service developed by Alibaba Cloud. It integrates the Distributed Relational Database Service (DRDS) that supports the SQL engine and the storage technologies of the self-developed distributed database service X-DB. Based on the integrated cloud-native architecture, PolarDB-X supports up to tens of millions of concurrent connections and hundreds of petabytes of mass data storage. PolarDB-X aims to provide solutions for mass storage, ultra-high concurrent throughput, performance bottlenecks on large tables, and efficiency in complex computing. PolarDB-X has been applied and tested in Double 11 shopping festivals and the business of Alibaba Cloud customers from various industries, and proves itself to be applicable to boost the digital transformation of enterprises.

PolarDB-X uses the standard technologies of relational databases to provide core features together with comprehensive management, O&M, and product-based capabilities. This way, PolarDB-X can provide stable, reliable, scalable, and maintainable database services that can be operated in similar methods as traditional standalone MySQL databases.

PolarDB-X has been used in Alibaba Cloud and Apsara Stack for years, and has been tested by the core transaction business during each Double 11 shopping festivals and by the business of Alibaba Cloud customers from various industries. PolarDB-X is applied to the core online business of a large number of users in various industries, such as the Internet, finance and payment, education, communications, and public utilities. PolarDB-X has become the standard distributed database service for all core online business of Alibaba Group and the business of Alibaba Cloud customers.

History

Taobao was registered in 2003 and adopted the classic Linux, Apache, MySQL, PHP (LAMP) stack. With the rapid growth of users, standalone MySQL databases can no longer meet the storage requirements in business. To meet the storage requirements in business, Taobao upgraded the technical architecture and replaced the MySQL databases with Oracle databases. As the number of users continued to quickly grow, the increasing Oracle databases were still incapable of meeting the requirements of the business on database scalability. To resolve this issue, Alibaba Group launched the de-IOE campaign in 2009. This started the evolution of PolarDB-X.

TDDL

A key step of the de-IOE campaign is to find alternatives to Oracle databases. During this period, the huge volume of business on Taobao created challenges for existing commercial databases. To prevent performance bottlenecks that might occur with business growth in the future, Alibaba Group determined to develop independent technologies and manage databases in an independent manner. As the x86 architecture grew to maturity, the difference between personal computers and minicomputers in stability was reduced. Meanwhile, MySQL databases started to adopt a lightweight thread model and supported high concurrency. The MySQL ecosystem was gradually improved. This way, the new solution adopted the sharding technology and a distributed architecture that combined TDDL with AliSQL. TDDL and AliSQL were developed based on the open source MySQL engine. TDDL aimed to resolve scalability issues and was used as a system architecture. However, TDDL cannot be delivered as a service.

DRDS

In 2014, Alibaba Cloud started its journey to cloud databases after the TDDL architecture grew to maturity. Alibaba Cloud developed the sharding technology for databases and tables, and launched a distributed cloud database service based on DRDS and ApsaraDB RDS for MySQL. DRDS featured a share-nothing architecture, and focused on providing a solution for storage expansion and delivered a product-based database service to users.

PolarDB-X 1.0

Alibaba Cloud continuously iterated on database services to resolve the pain points in database sharding and table sharding. As a result, DRDS supported kernel features, such as distributed transactions, global secondary indexes, and asynchronous DDL queries. Alibaba Cloud also continuously improved the compatibility with the SQL syntax. As a result, DRDS supported complex optimization, such as subquery unnesting and pushdowns of JOIN queries. DRDS also provided O&M capabilities, such as smooth scale-out, consistent backup and restoration, flashback of SQL queries, and SQL audit. In this phase, Alibaba Cloud continued to extend the boundaries of the sharding technology and explored the maximum capability. This process drove DRDS to provide an increasingly stable and standardized database service that could be applied to multiple scenarios. DRDS was transformed from middleware to a distributed database service. In 2019, the release of the new product line PolarDB-X became a milestone.

PolarDB-X 2.0

In 2018, the performance bottlenecks occurred on the instances at the computing layer. For example, database instances failed to provide the REPEATABLE READ isolation level for transactions. The capability of calculation pushdowns was limited by the compatibility with the SQL syntax. Data queries and data transmission were inefficient. Linear consistency between data replicas could not be ensured. Although these issues seemed unresolved, they provided a hint that the computing layer must be deeply integrated into the storage layer.

AliSQL is an independent branch of MySQL and continues to receive updates since its release. The X-DB databases have been tested by the business of Alibaba Group for years. The X-DB databases that use the X-Paxos protocol library and the X-Engine storage engine are built on top of AliSQL. The X-DB databases adopt the triplicate storage mechanism and deliver excellent performance at a low cost.

PolarDB-X is developed based on the cloud native architecture of PolarDB. PolarDB-X uses the remote direct memory access (RDMA) technology to optimize the architecture that decouples storage from computing. A PolarDB-X cluster consists of one primary node and one or more read-only nodes. PolarDB-X provides a resource pool to reduce costs. PolarDB-X can also optimize SQL queries and provide the data backup and restoration feature. You can restore data and enable auto scaling within seconds. This makes PolarDB-X one of the fastest-growing Alibaba Cloud database services.

The accumulated years of technical exploration and experience drive Alibaba Cloud to think about how distributed cloud databases can be developed.

Users expect to use cloud databases. This can help prevent data loss even if a server breaks down. This scenario requires databases that can ensure strong consistency and support disaster recovery to provide high availability. When the mobile Internet and IoT increase in usage, an explosive amount of data is generated. After the outbreak of the COVID-19 pandemic, more enterprises are concerned about the costs of the IT systems. Therefore, enterprise users expect to use high-performance and low-cost databases that support scaling of computing power and storage power. In database markets, users also expect to be charged based on queries.

Therefore, the next generation distributed databases must feature high availability and disaster recovery at the financial level. These databases must also support horizontal scaling, low-cost storage, on-demand scaling, transparent distribution, HTAP, and integration with new hardware.

In 2021, Alibaba Cloud released the next generation cloud native distributed database service PolarDB-X 2.0 based on the SQL engine of DRDS, the storage technologies of X-DB, and the cloud native features of PolarDB. PolarDB-X 2.0 focuses on resolving the issues that cannot be resolved by standalone databases. PolarDB-X 2.0 ensures data consistency in distributed systems and allows you to smoothly migrate data from standalone databases to distributed databases. PolarDB-X 2.0 provides low-cost storage and auto scaling based on cloud native technologies. PolarDB-X 2.0 provides various delivery modes. You can deploy database instances on Alibaba Cloud or Apsara Stack. You can also use the lightweight software edition of PolarDB-X 2.0.