Architecture evolution

Distributed database services are developed by using the following three types of technologies that are available in the industry: sharding, NewSQL, and cloud native database technology. Each type of technology for distributed database architectures provides unique benefits and features. The PolarDB-X architecture leverages Distributed Relational Database Service (DRDS) and X-DB technologies to ensure stability, integrates the cloud native technologies of PolarDB, and provides NewSQL capabilities to ensure data consistency in distributed systems. This way, PolarDB-X provides database services based on a cloud native and distributed architecture.

Overall architecture

The following figure shows the overall architecture of PolarDB-X.456789
Core components
  • Global meta service (GMS): provides distributed metadata and a global timestamp distributor named Timestamp Oracle (TSO) and maintains meta information such as tables, schemas, and statistics. GMS also maintains security information such as accounts and permissions.
  • Compute node (CN): provides a distributed SQL engine that contains core optimizers and executors. A compute node uses a stateless SQL engine to provide distributed routing and computing and uses the two-phase commit protocol (2PC) to coordinate distributed transactions. A compute node also executes DDL statements in a distributed manner and maintains global indexes.
  • Data node (DN): provides a data storage engine. A data node uses Paxos to provide highly reliable storage services and uses multiversion concurrency control (MVCC) for distributed transactions. A data node also provides the pushdown computation feature to push down operators such as Project, Filter, Join, and Agg in distributed systems, and supports local SSDs and shared storage.
  • Change data capture (CDC): provides a primary/secondary replication protocol that is compatible with MySQL. The primary/secondary replication protocol is compatible with the protocols and data formats that are supported by MySQL binary logging. CDC uses the primary/secondary replication protocol to exchange data.
Core modules

The architecture of a PolarDB-X database is similar to that of a traditional standalone relational database. The PolarDB-X architecture contains the network layer, SQL parsing layer, optimization layer, execution layer, and storage layer. The optimization layer supports logical optimization and physical optimization. The execution layer supports single-node two-phase execution, single-node parallel execution, and cross-node parallel execution. The storage layer supports various optimization and execution methods of traditional standalone databases.

Database tools

PolarDB-X is highly compatible with MySQL protocols and ecosystems. PolarDB-X supports common MySQL drivers such as Java Database Connectivity (JDBC) and Open Database Connectivity (ODBC) and is compatible with multiple programming languages such as Java, Go, C, C++, and Python. PolarDB also supports tools that are used to import and export data and various client GUIs. PolarDB-X can work with the following database tools to form a closed loop:
  • Data Management (DMS) is a cloud version of the Alibaba database service platform that has been used for more than 10 years. DMS offers a ready-to-use and unified web-based database management terminal that supports multiple database types and environments. You do not need to install the terminal or perform O&M operations. DMS can help enterprise users build a database DevOps solution in a short period of time that ensures the same level of security, efficiency, and compliance as that provided by the solution of Alibaba Group.
  • Database Autonomy Service (DAS) is a cloud service that uses machine learning and the experience of database experts to automate perception, repair, optimization, O&M, and security management for databases. DAS simplifies database management and prevents service failures that are caused by manual operations. This helps ensure the stability, security, and efficiency of database services.
  • Data Transmission Service (DTS) supports data sources such as relational databases, NoSQL databases, and big data analytics services. DTS provides data migration, change tracking, and real-time synchronization. This allows you to transfer data within milliseconds in an asynchronous manner in public and hybrid cloud scenarios. The underlying infrastructure of DTS uses the active geo-redundancy architecture that is used to handle the workloads of Alibaba Group services during the Double 11 Shopping Festival. This way, DTS can provide real-time data streams to thousands of downstream applications.
  • Database Backup is a cloud native backup platform that is cost-effective and can be used to meet the business requirements of enterprises. Database Backup can protect databases that are deployed in multiple environments, such as data centers, third-party clouds, public clouds, and hybrid clouds.

Architecture

PolarDB-X is an advanced cloud service that is provided by Alibaba Cloud. PolarDB-X provides a complete range of capabilities such as visualized O&M, diversified delivery forms, and a complete set of API operations. PolarDB-X can also work with multiple database tools.

PolarDB-X instances are deployed based on Kubernetes and hosted on high-performance physical servers.
  1. A PolarDB-X instance consists of multiple nodes. When you purchase a PolarDB-X instance, you must specify at least two nodes. After the instance is created, you can scale in or scale out the instance by at least one node. Each node supports various instance types, such as 4c16g, 8c32g, and 16c64g.
  2. PolarDB-X provides the following instance families to meet the requirements for different levels of resource isolation: general-purpose, dedicated, and dedicated host. For the general-purpose instance family, idle computing resources such as CPU cores are shared among PolarDB-X instances on the same server. This makes resource utilization more cost-effective. For the dedicated instance family, the computing resources of a server such as CPU cores that are allocated to each PolarDB-X instance are exclusive to the instance. This improves performance stability.
  3. PolarDB-X supports various delivery forms in public and hybrid clouds. In public clouds, instances can be deployed across regions and zones. In terms of the network and security, PolarDB-X supports virtual private clouds (VPCs), IP address whitelists, asymmetric encryption, and Transparent Data Encryption (TDE) to secure data. In hybrid clouds, instances can be deployed by using DBStack in a lightweight manner. This way, you can deploy databases and perform O&M operations on the existing hardware of your machines.

Benefits

High availability

PolarDB-X is developed based on the architecture of X-DB and uses X-Paxos to ensure strong consistency and deliver a zero recovery point objective (RPO) in case of a node failover. PolarDB-X provides capabilities that are known for being used during the Double 11 Shopping Festival for years. PolarDB-X also provides various deployment solutions and disaster recovery capabilities. For example, you can deploy your PolarDB-X instance across three data centers in the same region or five data centers in three regions and use Paxos to ensure fully synchronous replication. You can also use the three data centers across two zones, geo-disaster recovery, or active geo-redundancy solution to deploy your PolarDB-X instance and use binary logs to implement asynchronous replication. To ensure efficient data transmission across regions, PolarDB-X uses batching and pipelining to optimize network performance.

High compatibility

PolarDB-X is compatible with MySQL syntax, such as SQL statements and function types. PolarDB-X provides TSO to keep the data globally consistent across distributed transactions. PolarDB-X uses TSO and 2PC to provide complete ACID capabilities and support the Read-Committed and Repeatable-Read isolation levels for distributed systems. ACID stands for atomicity, consistency, isolation, and durability. PolarDB-X provides global secondary indexes based on distributed transactions. PolarDB-X can write multiple transactions at a time to ensure strong consistency between index tables and base tables. PolarDB-X uses a cost-based optimizer (CBO) to select indexes. In terms of metadata and ecosystem integration, PolarDB-X uses the online DDL feature to ensure metadata consistency in distributed systems. In terms of hardware, PolarDB-X is compatible with major operating systems and chips that are developed by Chinese vendors, such as Kirin, Kunpeng, and Hygon.

Vendors of major distributed databases in the industry barely provide services that can be used to collect data change logs such as redo logs and binary logs in distributed databases. In the preceding years, enterprises developed relational databases on a large scale based on the established ecosystems and standards in the industry. PolarDB-X provides global binary logs that are fully compatible with MySQL databases. You can use a PolarDB-X database in the same manner as a MySQL database to use the standard binary log dump protocol to obtain binary logs.

High scalability

PolarDB-X uses a shared-nothing architecture for horizontal scaling of instances and online scaling of databases. In online transaction processing (OLTP) scenarios, PolarDB-X can process tens of millions of concurrent requests and store petabytes of data. In online analytical processing (OLAP) scenarios, PolarDB-X uses the massively parallel processing (MPP) capability to linearly improve the query performance after an instance is scaled out. This helps meet the requirements for complex report queries such as TPC-H.

HTAP

Continued development of mobile Internet and IoT devices can lead to explosive data growth. Traditional OLTP and OLAP solutions are developed based on a simple read/write splitting or extract, transform, and load (ETL) model. In a traditional OLTP or OLAP solution, data is extracted from an online database to a data warehouse for computing based on the T+1 mechanism. These traditional solutions provide poor real-time performance and inefficient link utilization and incur high storage and maintenance costs. PolarDB-X can handle mixed workloads of OLTP and OLAP. This allows you to run TPC-C and TPC-H benchmark tests in the same instance at the same time. During the test period, the analytical processing workloads do not affect the stability of transaction processing workloads. PolarDB-X also provides multiple innovative features. For example, PolarDB-X can accurately identify the transaction processing workloads and analytical processing workloads at the computing layer. After the workloads are identified, PolarDB-X intelligently routes the workloads to different replicas by implementing consistent reads on multiple replicas. By default, the MPP capability is enabled on the analytical processing link. This way, resource isolation is ensured, and the query performance is linearly improved for OLAP scenarios. PolarDB-X optimizes the pushdown computation feature at the storage layer. Future versions will include a high-performance columnar storage engine that provides the hybrid transaction/analytical processing (HTAP) capability for hybrid row-column storage.

Ultra elasticity

PolarDB-X supports the cloud native technologies of PolarDB. PolarDB uses shared storage and provides network optimization capabilities that are based on Remote Direct Memory Access (RDMA). This way, PolarDB-X provides rapid backup and scalability and allows you to expand the storage capacity of an instance based on your business requirements. PolarDB-X can write multiple data records at a time. This capability and the shared storage allow you to flexibly scale an instance within seconds without the need to migrate data. This way, PolarDB-X provides a smooth user experience during a scaling operation.

Ecosystem compatibility

PolarDB-X is fully compatible with and a contributor to the open source ecosystem of MySQL. PolarDB-X manages its source code and is compatible with distributed MySQL databases. The architecture of PolarDB-X is simple and open. This makes PolarDB-X easy to operate and maintain. PolarDB-X can work with other Alibaba Cloud database services such as DTS, DMS, and Database Backup to form a closed loop. This way, PolarDB-X can be connected to the entire ecosystem of Alibaba Cloud.