All Products
Search
Document Center

PolarDB:Product Overview

Last Updated:Jul 04, 2023

What is PolarDB?

PolarDB is a new-generation database service that is developed by Alibaba Cloud. This service decouples computing from storage and uses integrated software and hardware. PolarDB is a secure and reliable database service that provides auto scaling, high performance, and mass storage. PolarDB is 100% compatible with MySQL and PostgreSQL and highly compatible with Oracle.

PolarDB provides three engines: PolarDB for MySQL, PolarDB for PostgreSQL, and PolarDB for Xscale. Years of best practices in Double 11 events prove that PolarDB can offer the flexibility of open source ecosystems and the high performance and security of commercial cloud-native databases.

Engine

Ecosystem

Architecture

Platform

Scenario

PolarDB for MySQL

MySQL

Shared storage and compute-storage decoupled architecture

Public cloud and Apsara Stack Enterprise Edition

Cloud-native databases in the MySQL ecosystem

PolarDB for PostgreSQL

PostgreSQL and Oracle

Shared storage and compute-storage decoupled architecture

Public cloud, Apsara Stack Enterprise Edition, and DBStack

Cloud-native databases in the PostgreSQL ecosystem

PolarDB for Xscale

MySQL

Shared nothing and distributed architecture

Public cloud, Apsara Stack Enterprise Edition, and DBStack

Large-scale data and ultra-high concurrent applications in the MySQL ecosystem

Architecture of PolarDB for MySQL and PolarDB for PostgreSQL

PolarDB for MySQL and PolarDB for PostgreSQL both use an architecture of shared storage and compute-storage decoupling. They are featured by cloud-native architecture, integrated software and hardware, and shared distributed storage. Physical replication and RDMA are used between, the primary node and read-only nodes to reduce latency and accelerate data synchronization. This resolves the issue of non-strong data consistency caused by asynchronous replication and ensures zero data loss in case of single point of failure (SPOF). The architecture also enables node scaling within seconds.

Core components

  • PolarProxy

PolarDB for MySQL uses PolarProxy to provide external services for the applications. PolarProxy forwards the requests from the applications to database nodes. You can use the proxy to perform authentication, data protection, and session persistence. The proxy parses SQL statements, sends write requests to the primary node, and evenly distributes read requests to multiple read-only nodes.

  • Compute nodes

A cluster contains one primary node and multiple read-only nodes. A cluster of Multi-master Cluster Edition (only for PolarDB for MySQL) supports multiple primary nodes and multiple read-only nodes. Compute nodes can be either general-purpose or dedicated.

  • Shared storage

Multiple nodes in a cluster share storage resources. A single cluster supports up to 100 TB of storage capacity.

image..png

Architecture benefits

  • Large storage capacity

The maximum storage capacity of a cluster is 100 TB. You do not need to purchase clusters for database sharding due to the storage limit of a single host. This simplifies application development and reduces the O&M workload.

  • Cost-effectiveness

PolarDB decouples computing and storage. You are charged only for the computing resources when you add read-only nodes to a PolarDB cluster. In traditional database solutions, you are charged for both computing and storage resources when you add nodes.

  • Elastic scaling within minutes

PolarDB supports rapid scaling for computing resources. This is based on container virtualization, shared storage, and compute-storage decoupling. It requires only 5 minutes to add or remove a node. The storage capability is automatically scaled up. During the scale-up process, your services are not interrupted.

  • Read consistency

Log sequence numbers (LSNs) are used in cluster addresses to ensure global consistency and avoid inconsistency caused by replication latency between primary and read-only clusters.

  • Millisecond-level latency in physical replication

Redo-based physical replication is used instead of binary log-based logical replication to improve the efficiency and stability of primary/secondary replication. No delays occur even if you perform DDL operations on large tables, such as adding indexes or fields.

  • Data backup within seconds

Snapshots that are implemented based on the distributed storage can back up a database with terabytes of data in a few minutes. During the entire backup process, no locks are required, which ensures high efficiency and minimized impacts on your business. Data can be backed up anytime.

Architecture of PolarDB-X

PolarDB for Xscale uses an architecture of shared nothing and compute-storage decoupling. This architecture allows you to achieve hierarchical capacity planning based on your business requirements and implement large-scale scaling.

Core components

  • Global meta service (GMS): provides distributed metadata and a global timestamp distributor named Timestamp Oracle (TSO) and maintains meta information such as tables, schemas, and statistics. GMS also maintains security information such as accounts and permissions.

  • Compute node (CN): provides a distributed SQL engine that contains core optimizers and executors. A compute node uses a stateless SQL engine to provide distributed routing and computing and uses the two-phase commit protocol (2PC) to coordinate distributed transactions. A compute node also executes DDL statements in a distributed manner and maintains global indexes.

  • Data node (DN): provides a data storage engine. A data node uses Paxos to provide highly reliable storage services and uses multiversion concurrency control (MVCC) for distributed transactions. A data node also provides the pushdown computation feature to push down operators such as Project, Filter, Join, and Agg in distributed systems, and supports local SSDs and shared storage.

  • Change data capture (CDC): provides a primary/secondary replication protocol that is compatible with MySQL. The primary/secondary replication protocol is compatible with the protocols and data formats that are supported by MySQL binary logging. CDC uses the primary/secondary replication protocol to exchange data.

    image..png

Architecture benefits

  • High availability

PolarDB-X is developed based on the architecture of X-DB and uses X-Paxos to ensure strong consistency and deliver a zero recovery point objective (RPO) in case of a node failover. PolarDB-X provides capabilities that are known for being used during Double 11 events for years. PolarDB-X also provides various deployment solutions and disaster recovery capabilities. For example, you can deploy your PolarDB-X instance across three data centers in the same region or five data centers in three regions and use Paxos to ensure fully synchronous replication. You can also use the three data centers across two zones, geo-disaster recovery, or active geo-redundancy solution to deploy your PolarDB-X instance and use binary logs to implement asynchronous replication. To ensure efficient data transmission across regions, PolarDB-X uses batching and pipelining to optimize network performance.

  • High compatibility

PolarDB for Xscale is compatible with MySQL, such as SQL statements and function types. PolarDB-X provides TSO to keep the data globally consistent across distributed transactions. PolarDB-X uses TSO and 2PC to provide complete ACID capabilities and support the Read-Committed and Repeatable-Read isolation levels for distributed systems. ACID stands for atomicity, consistency, isolation, and durability. PolarDB-X provides global secondary indexes based on distributed transactions. PolarDB-X can write multiple transactions at a time to ensure strong consistency between the data in index tables and base tables. PolarDB-X uses a cost-based optimizer (CBO) to select indexes. In terms of metadata and ecosystem integration, PolarDB for Xscale uses the online DDL feature to ensure metadata consistency in distributed systems. In terms of hardware, PolarDB-X is compatible with major operating systems and chips that are developed by Chinese vendors, such as Kirin, Kunpeng, and Hygon. Vendors of major distributed databases in the industry barely provide services that can be used to collect data change logs such as redo logs and binary logs in distributed databases. In the preceding years, enterprises developed relational databases on a large scale based on the established ecosystems and standards in the industry. PolarDB for Xscale provides binary logs that are fully compatible with MySQL databases. You can use a PolarDB for Xscale database in the same manner as a MySQL database to use the standard binary log dump protocol to obtain binary logs.

  • High scalability

PolarDB for Xscale uses a shared-nothing architecture for configuration changes of clusters and online scaling of databases. In online transaction processing (OLTP) scenarios, PolarDB-X can process tens of millions of concurrent requests and store petabytes of data. In online analytical processing (OLAP) scenarios, PolarDB-X uses the massively parallel processing (MPP) capability to linearly improve the query performance after an instance is scaled out. This helps meet the requirements for complex report queries such as TPC-H.

  • HTAP

Continued development of mobile Internet and IoT devices can lead to explosive data growth. Traditional OLTP and OLAP solutions are developed based on a simple read/write splitting or extract, transform, and load (ETL) model. In a traditional OLTP or OLAP solution, data is extracted from an online database to a data warehouse for computing based on the T+1 mechanism. These traditional solutions provide poor real-time performance and inefficient link utilization and incur high storage and maintenance costs. PolarDB for Xscale can handle mixed workloads of OLTP and OLAP. This allows you to run TPC-C and TPC-H benchmark tests in the same instance at the same time. During the test period, the analytical processing workloads do not affect the stability of transaction processing workloads. PolarDB-X also provides multiple innovative features. For example, PolarDB-X can accurately identify the transaction processing workloads and analytical processing workloads at the computing layer. After the workloads are identified, PolarDB-X intelligently routes the workloads to different replicas by implementing consistent reads on multiple replicas. By default, the MPP capability is enabled on the analytical processing link. This way, resource isolation is ensured, and the query performance is linearly improved for OLAP scenarios.