This topic describes the PolarDB-X architecture and its benefits.
Architecture evolution
The evolution of distributed database services is driven by the following types of technologies that are available in the industry: sharding, NewSQL, and cloud-native database technology. Each type of technology provides unique benefits and features. The PolarDB-X architecture leverages Distributed Relational Database Service (DRDS) and X-DB technologies to ensure stability, integrates the cloud native technologies of PolarDB, and provides NewSQL capabilities to ensure data consistency in distributed systems. This allows PolarDB-X to provide database services based on a cloud-native and distributed architecture.
Overall architecture
The following figure shows the overall architecture of PolarDB-X.
Core components
Global meta service (GMS): provides distributed metadata and a global timestamp distributor named Timestamp Oracle (TSO) and maintains meta information such as tables, schemas, and statistics. GMS also maintains security information such as accounts and permissions.
Compute node (CN): provides a distributed SQL engine that contains core optimizers and executors. A CN uses a stateless SQL engine to provide distributed routing and computing and uses the two-phase commit protocol (2PC) to coordinate distributed transactions. A CN also executes DDL statements in a distributed manner and maintains global indexes.
Data node (DN): provides a data storage engine. A DN uses Paxos to provide highly reliable storage services and uses multiversion concurrency control (MVCC) for distributed transactions. A DN also provides the pushdown computation feature to push down operators such as Project, Filter, Join, and Agg in distributed systems, and supports local SSDs and shared storage.
Change data capture (CDC): provides a primary/secondary replication protocol that is compatible with MySQL. The primary/secondary replication protocol is compatible with the protocols and data formats that are supported by MySQL binary logging. CDC uses the primary/secondary replication protocol to exchange data.
Columnar node: provides persistent columnstore indexes and maintains and updates columnstore indexes in real time based on changes recorded in distributed transaction logs to facilitate efficient analytical query processing. By leveraging object storage and working in tandem with CNs, a columnar node provides the scalability required for real-time updates and the capability to execute snapshot-consistent queries.
Core modules
The PolarDB-X architecture is similar to that of a traditional standalone relational database. The PolarDB-X architecture contains the network layer, SQL parsing layer, optimization layer, execution layer, and storage layer. The optimization layer supports logical optimization and physical optimization. The execution layer supports single-node two-phase execution, single-node parallel execution, and cross-node parallel execution. The storage layer supports various optimization and execution methods for traditional standalone databases.
Database tools
PolarDB-X is highly compatible with MySQL protocols and ecosystems. PolarDB-X supports common MySQL drivers such as Java Database Connectivity (JDBC) and Open Database Connectivity (ODBC) and is compatible with multiple programming languages such as Java, Go, C, C++, and Python. PolarDB also supports tools that are used to import and export data and various client GUIs. The following figure shows the complete set of database tools that are supported by PolarDB-X.
PolarDB-X can work with the following database tools to form a closed loop:
Data Management Service (DMS) is a cloud-based version of the Alibaba database service platform that has been used for more than 10 years. DMS offers a ready-to-use, centralized, web-based database management terminal that supports multiple database types and environments. You do not need to install the terminal or perform O&M operations. DMS can help enterprise users quickly build a database DevOps solution that ensures the same level of security, efficiency, and compliance that is provided by the solution of Alibaba Group.
Database Autonomy Service (DAS) is a cloud service that uses machine learning and the experience of database experts to automate perception, repair, optimization, O&M, and security management for databases. DAS simplifies database management and prevents service failures that are caused by manual operations. This ensures the stability, security, and efficiency of database services.
Data Transmission Service (DTS) supports data sources such as relational databases, NoSQL databases, and big data analytics services. DTS provides data migration, change tracking, and real-time synchronization to allow you to transfer data within milliseconds in an asynchronous manner in public and hybrid cloud scenarios. The underlying infrastructure of DTS uses the active geo-redundancy architecture that is used to handle the workloads of Alibaba Group services during the Double 11 Shopping Festival. This way, DTS can provide real-time data streams for thousands of downstream applications.
Data Disaster Recovery is a cloud native backup platform that is cost-effective and can be used to meet the business requirements of enterprises. It can protect databases that are deployed in multiple environments, such as data centers, third-party clouds, public clouds, and hybrid clouds.
Architecture
As an advanced cloud service that is provided by Alibaba Cloud, PolarDB-X provides a complete range of capabilities such as visualized O&M, diversified delivery forms, and a complete set of API operations. PolarDB-X can also work with multiple database tools.
PolarDB-X instances are deployed based on Kubernetes and hosted on high-performance physical servers.
A PolarDB-X instance consists of multiple nodes. When you purchase a PolarDB-X instance, you must specify at least two nodes. After the instance is created, you can scale in or scale out the instance by at least one node. Each node supports various instance specifications, such as 4c16g, 8c32g, and 16c64g.
PolarDB-X provides the following instance families to meet the requirements for different levels of resource isolation: general-purpose, dedicated, and dedicated host. General-purpose: Idle computing resources such as CPU cores are shared among PolarDB-X instances on the same server. This improves cost-effectiveness. Dedicated: The computing resources of a server such as the CPU cores that are allocated to each PolarDB-X instance are exclusive to the instance. This improves performance stability.
PolarDB-X supports various delivery forms in public and hybrid clouds. In Alibaba Cloud public cloud, you can deploy PolarDB-X instances across regions and zones. In scenarios related to network and security, PolarDB-X supports virtual private clouds (VPCs), IP address whitelists, asymmetric encryption, and Transparent Data Encryption (TDE) to ensure data security. In Alibaba Cloud hybrid cloud, you can deploy PolarDB-X instances by using DBStack in a lightweight manner. This way, you can deploy databases and perform O&M operations on the existing hardware of your machines.
Benefits
High availability
PolarDB-X is developed based on the architecture of X-DB and uses X-Paxos to ensure strong consistency and deliver a recovery point objective (RPO) of 0 in case of a node failover. These capabilities have been consistently proven during the Double 11 Shopping Festival over the years. PolarDB-X also provides various deployment solutions and disaster recovery capabilities. For example, you can deploy a PolarDB-X instance across three data centers in the same region or five data centers in three regions and use Paxos to ensure fully synchronous replication. You can also use the three data centers across two zones, geo-disaster recovery, or active geo-redundancy solution to deploy a PolarDB-X instance and use binary logs to implement asynchronous replication. To ensure efficient data transmission across regions, PolarDB-X uses batching and pipelining to optimize network performance.
High compatibility
PolarDB-X is compatible with MySQL syntax, such as SQL statements and function types. PolarDB-X provides TSO to keep the data globally consistent across distributed transactions. PolarDB-X uses TSO and 2PC to provide complete atomicity, consistency, isolation, and durability (ACID) capabilities and supports the Read-Committed and Repeatable-Read isolation levels for distributed systems. PolarDB-X provides global secondary indexes based on distributed transactions. PolarDB-X can write multiple transactions at the same time to ensure strong consistency between data in index tables and base tables. PolarDB-X uses a cost-based optimizer (CBO) to select indexes. In metadata and ecosystem integration scenarios, PolarDB-X uses the online DDL feature to ensure metadata consistency in distributed systems. In terms of hardware, PolarDB-X is compatible with major operating systems and chips that are developed by Chinese vendors, such as Kirin, Kunpeng, and Hygon.
Vendors of major distributed databases in the industry rarely provide services that can be used to collect data change logs such as redo logs and binary logs in distributed databases. In previous years, enterprises developed relational databases on a large scale based on the established ecosystems and standards in the industry. PolarDB-X provides binary logs that are fully compatible with MySQL databases. You can use a PolarDB-X database in the same manner that you use a MySQL database to use the standard binary log dump protocol to obtain binary logs.
High scalability
PolarDB-X uses a shared-nothing architecture for the horizontal scaling of instances and online scaling of databases. In online transaction processing (OLTP) scenarios, PolarDB-X can process tens of millions of concurrent requests and store petabytes of data. In online analytical processing (OLAP) scenarios, PolarDB-X uses massively parallel processing (MPP) technology to linearly improve the query performance after an instance is scaled out. This helps meet the requirements for complex report queries such as TPC-H.
HTAP
Continued development of mobile Internet and IoT devices can lead to explosive data growth. Traditional OLTP and OLAP solutions are developed based on a simple read/write splitting or extract, transform, and load (ETL) model. In a traditional OLTP or OLAP solution, data is extracted from an online database and stored in a data warehouse for computing based on the T+1 mechanism. These traditional solutions provide poor real-time performance and inefficient link utilization and result in high storage and maintenance costs. PolarDB-X can handle mixed OLTP and OLAP workloads. This allows you to run TPC-C and TPC-H benchmark tests in the same instance at the same time. During the tests, the analytical processing workloads do not affect the stability of the transaction processing workloads. PolarDB-X also provides multiple innovative features. For example, PolarDB-X can accurately identify transaction processing workloads and analytical processing workloads at the computing layer. After the workloads are identified, PolarDB-X intelligently routes the workloads to different replicas by implementing consistent reads on multiple replicas. By default, MPP is enabled on the analytical processing link. This way, resource isolation is ensured, and the query performance is linearly improved for OLAP scenarios. PolarDB-X optimizes the pushdown computation feature at the storage layer. Future versions will include a high-performance columnar storage engine that provides the hybrid transaction/analytical processing (HTAP) capability for hybrid row-column storage.
Extreme elasticity
PolarDB-X supports the cloud native technologies of PolarDB. PolarDB uses shared storage and provides network optimization capabilities that are based on Remote Direct Memory Access (RDMA). This way, PolarDB-X provides rapid backup and scalability and allows you to expand the storage capacity of an instance based on your business requirements. PolarDB-X can write multiple data records at the same time. This capability and the shared storage allow you to flexibly scale an instance within seconds without the need to migrate data. This allows PolarDB-X to provide a smooth user experience during a scaling operation.
Ecosystem compatibility
PolarDB-X fully embraces and contributes to the open source ecosystem of MySQL. PolarDB-X has full control over its codebase and is compatible with distributed MySQL databases. The PolarDB-X architecture is simple and open. This makes PolarDB-X easy to operate and maintain. PolarDB-X can work with other Alibaba Cloud database services such as DTS, Data Disaster Recovery, and DMS to form a closed loop. This way, PolarDB-X can be connected to the entire ecosystem of Alibaba Cloud.