This article is a transcript of a speech by Li Feifei, VP of Alibaba Group and Head of Alibaba Cloud Database Products BU, on cloud-native databases delivered at the Alibaba Cloud Developer Conference.
I think cloud-native will be the standard for cloud usage in the future. Cloud computing resources are ubiquitous and inexhaustible, and you don't have to worry about where and how many cloud resources are available. The ultimate of cloud native is to have resources flow like tap water; we use it every day but we don't need to know its source to use it.
With years of practice, experimentation, and exploration, Alibaba Cloud has grown with developers in the cloud-native database field. We believe five fields should be focused on in terms of the cloud-native database:
1) Cloud-Native Distributed Database
Deeply integrate cloud-native with distributed technologies and deeply integrate Share Nothing, Share Storage, and Share Everything architectures
With AI and machine learning technologies, the database system can be self-driven, allowing developers to manage and use database services, such as automatic parameter tuning, index recommendation, anomaly detection.
3) Security and Credibility
Security and credibility are very important. For example, how can you ensure that data is in a comprehensive-procedure, encrypted, and secured, and how can you ensure that it can provide security and credibility in the storage, transmission, and computing processes?
4) Online-Offline Data Processing Unification
Shall we reduce data procedures, offering online-offline unification from online processing and online analysis to offline storage? This will allow developers to access and process data with ease.
5) IoT Multi-Mode
With the rapid development of AIoT, IoT, and IoV, shall we build a multi-mode IoT database for developers and applications?
These are the next five directions we think are important. Thanks to our efforts in these directions, Alibaba Cloud achieved a breakthrough last year as the global database leader with Gartner.
It is believed that for any developer, the most important thing in the data layer is the data management lifecycle. The following explains what the comprehensive-procedure lifecycle of data is from the perspective of developers.
The first step is data production and integration, which shows how to do more efficient data integration, data cleaning, data transmission, and data backup. After this step, the next step is real-time data processing, which involves the traditional relational databases, online transactions, and OLTP. The next is data analysis and discovery, data desensitization, and data lineage.
The preceding content covers the entire lifecycle of data management. We use it to build different solutions and work with developers and partners to build the final killer app for applications, industries, and customers.
The following describes which tools we provide to developers in various stages of the lifecycle and what they can do with these tools.
Data production and integration are the first lifecycles of data. Data is collected, stored, and processed before coming to the world.
As shown in the preceding figure, Alibaba Cloud provides Data Transmission Service (DTS) for real-time incremental or full synchronization of over 17 different data sources, which makes it very simple to implement real-time data synchronization for applications from multiple heterogeneous data sources to multiple heterogeneous targets.
DBS unifies data backups across clouds and on-cloud and off-cloud data, allowing data to flow across multiple clouds and ends seamlessly.
Database Management Service (DMS) helps users with task orchestration, data analysis, and lineage analysis.
The section above constitutes the basic capabilities of Alibaba Cloud in data production and integration.
The next step is real-time data processing.
As developers, our priority is to ensure that applications in online transaction scenarios are always online and data is never lost. Here, we provide various options.
1) ApsaraDB for RDS Provides Enterprise-Level Database Autonomy.
First, Alibaba Cloud provides ApsaraDB for RDS.
Each cloud vendor has RDS, but what is the difference between Alibaba Cloud RDS and others?
With the evolution to cloud-native database 2.0, Alibaba Cloud RDS features provide an enterprise-level autonomous database service.
First, we built a cloud-native management platform based on Kubernetes and implemented microservice and containerized deployment for all management and control capabilities to shield the underlying resources of multiple heterogeneous services. By doing so, we can provide a cloud-native development and deployment environment for developers.
In this regard, we use AI and Machine Learning technologies to build an autonomous driving database platform. Many capabilities are provided for developers, such as automatic stress testing. We can generate stress testing data automatically, so its workload is almost the same as in a real environment. As such, developers can test the online system better. In addition, we provide a series of automatic autonomous service capabilities, such as index recommendation and parameter optimization.
Database Autonomy Service (DAS) can help discover and solve many problems faced by developers. For example, the slow running speed of online applications and the fully occupied thread pool.
2) Cloud-Native Relational Database PolarDB
In addition to Alibaba Cloud RDS, PolarDB is one of the core capabilities of cloud-native database 2.0.
We ensure PolarDB is 100% compatible with MySQL and PostgreSQL and highly compatible with Oracle syntax to help developers develop better applications on PolarDB. This allows developers to migrate to the cloud easily.
Many enterprises and developers need to deploy their applications globally. For example, the online education industry and gaming industry need our applications to serve nearby users. Alibaba Cloud has launched a global deployment capability called Global Database. This means PolarDB can be deployed across Available Zones (AZ) with zero RPO and extremely low RTO. Users can access developers' applications nearby by synchronizing data across AZ in real-time.
We have launched a more cost-effective PolarDB instance with free I/O bandwidth to give developers a better experience with Alibaba Cloud products, charging only 30-40% of the cloud-native databases provided by other cloud vendors.
We also carried out performance tests.
We used SysBench to connect to transaction processing and read-write testing for the CPU-intensive and I/O-intensive database testing.
The preceding figure shows the performance comparison between PolarDB and the CPU-intensive and I/O-intensive cloud-native databases. We can see that Polar DB exhibits excellent performance under two different workloads.
3) Cloud-Native Distributed Database PolarDB-X
Developers often deal with scenarios with massive data, high concurrency, and ultra-high concurrency. To this end, Alibaba Cloud launched PolarDB-X, the distributed edition of PolarDB, which separates storage and computing in the cloud-native architecture and builds architecture to support the unified distributed database.
PolarDB-X supports high concurrency, global secondary indexing, HTAP complex queries, distributed transactions, and online elastic scaling.
Let's take the global secondary indexing shown in the preceding figure as an example. It supports ACID and allows developers to focus on business application development instead of database or table sharding.
We use X-Paxos to support two data copies and one log copy. In addition, our three copies can be deployed across Available Zones with zero RPO across data centers in the same city.
After the real-time database is processed, how can we find information in the large amounts of accumulated transaction data? Now, we are working on data analysis and development.
AnalyticDB (ADB) is a cloud-native architecture featuring compute-storage separation and on-demand elastic scaling of computing resources. Compared with traditional data warehouses, the cost of ADB is three times lower.
We implement hot and cold data separation on this cloud-native architecture, with the price as low as 114 yuan per month for 1 TB. One data copy and multiple computing engines will be a major trend in the field of data analysis, which can help us adapt to the workload. This can reduce the overall cost of offline ETL and online interactive analysis.
The technologies mentioned above can help developers implement development and application with online-offline unification, with offline ETL and online interactive analysis supported as well. Essentially, it is the perfect combination of MTB architecture and the BSP model.
We are also highly compatible with the ecosystem. We will release Spark compatible editions soon to combine these open-source ecosystems into ADB to achieve load-based intelligent scheduling and hybrid application support.
Finally, we can discuss data development and management.
We provide an all-in-one online data platform for database developers. The developer community of Alibaba Cloud enables hundreds of thousands of database developers to use the DMS capabilities to access and manage multi-heterogeneous database resources.
DMS supports all the familiar databases, including PolarDB, PolarDB-X, RDS, AnalyticDB, MySQL, Oracle, and SQL Server.
The DMS provides all-in-one capabilities covering data assets, database design, database development, data integration, and data services. This helps developers implement data-based O&M, disaster recovery/multi-active, T+1/real-time/archiving, centralized data processing, BI reports, multi-dimensional analysis, and more.
As the global leader in cloud-native databases, Alibaba Cloud will be the first cloud vendor to announce the open-source version of core cloud-native database technologies to invite developers to work together to build the cloud-native database 2.0.
As a global database leader, we have made an open-source version of the high-availability cluster edition of PolarDB for PG Paxos. It is currently available on GitHub.
In September, we will release a highly scalable distributed version based on HLC hybrid clocks. In 2022, we will release a sharding and plug-in version of Share Nothing. RDS AliSQL has been open-source in the MySQL ecosystem for a long time. Now, after a major upgrade, RDS will release RDS GalaxySQL, Paxos high-availability edition, and the cloud-native distributed edition.
Let's take a look at what needs to be done:
The figure above shows the open-source components that we have prepared for the MySQL and PG ecosystem communities.
Databases are accelerating their migration to the cloud, and cloud-native and distributed technologies are reshaping the entire database technology stack. Alibaba Cloud has extensive practical experience in Internet services and cloud database services and has accumulated technologies in terms of high availability, distributed database, cloud-native, and storage-compute separation. These technologies are open to the public in the form of components and systems. They will work with open-source communities to build a cloud-native distributed database ecosystem. All open-source components use the most developer-friendly protocol, which complies with the Apache Version 2.0 protocol. Developers are welcome to join us in building the world's leading cloud-native database 2.0 community with Chinese characteristics.
The open-source PolarDB for PG uses the X-Paxos protocol to help developers implement databases with zero RPO, high compatibility, and availability quickly. The three-node model is now in place of the traditional master/slave model. All open-source components are out-of-the-box, allowing developers to enjoy the capabilities of PolarDB. They are also continuously growing based on the PostgreSQL and MySQL ecosystem. Welcome to the open-source community of cloud-native database 2.0!
Open-source address: https://github.com/alibaba/PolarDB-for-PostgreSQL
Alibaba Clouder - October 16, 2020
Alibaba Clouder - November 23, 2020
AlibabaCloud_Network - May 26, 2021
Alibaba Developer - July 13, 2021
Alibaba Clouder - February 17, 2021
Alibaba Cloud New Products - July 8, 2020
A cloud-native database management platform that allows you to manage on-premises databases in the same way as in Alibaba Cloud.Learn More
Mitigate the scalability problem of single machine relational databases for large-scale online databases.Learn More
ApsaraDB for POLARDB is a cloud-native relational database compatible with MySQL, PostgreSQL, and Oracle.Learn More
A ledger database that provides powerful data audit capabilities.Learn More
More Posts by ApsaraDB