Architecture - PolarDB - Alibaba Cloud Documentation Center

PolarDB for PostgreSQL(Compatible with Oracle) uses the shared everything architecture. In PolarDB for PostgreSQL(Compatible with Oracle), shared distributed storage is used to decouple compute from storage.

PolarDB for PostgreSQL(Compatible with Oracle) separates compute nodes from storage nodes. This allows you to scale out a PolarDB for PostgreSQL(Compatible with Oracle) instance in real time. Due to the limited computing capability provided by a single database instance, the traditional method is used to scale out a single database instance to create multiple database replicas. Replicas are copies of full data and require a large storage space. Log data is frequently synchronized between the primary node and replicas. This results in high network overheads. If you want to add replicas to a traditional database cluster, all incremental data must be synchronized to the replicas. As a result, the synchronization latency increases. In PolarDB for PostgreSQL(Compatible with Oracle), database files and log files, such as redo log files, are stored in shared storage devices. This ensures that the same full data and incremental data is shared by the primary node and all the replicas. Only the metadata information in the memory of the primary node needs to be synchronized to replicas. Multiversion concurrency control (MVCC) is used to ensure the data read consistency among nodes. This way, the data that is synchronized among the replicas and the primary node is consistent. In addition, the network overheads of cross-node data synchronization are reduced and the synchronization latency is reduced.

The compute nodes of PolarDB for PostgreSQL(Compatible with Oracle) provide multiple features. For example, the compute nodes can parse and optimize SQL statements, execute parallel queries, and handle lock-free high-performance transactions. The memory status is synchronized among compute nodes by using the protocol of high-throughput physical replication. The maximum number of database nodes is 16. One of the 16 nodes is a primary node and the other nodes are read-only nodes. A read-only node in PolarDB can be created within 5 minutes by using the preceding real-time scaling capability. The volume that is mounted on the read-only node is the same as the volume that is mounted on the other nodes. If you create a read-only node in PolarDB for PostgreSQL(Compatible with Oracle), this node can provide services without replicating data from the other nodes.

Compute nodes and storage nodes can push operators such as filter and projection from the compute layer to the storage layer for execution by using an intelligent interconnection protocol that is based on database semantics. Compute nodes and storage nodes are connected over a 25 Gbit/s Remote Direct Memory Access (RDMA) network. This ensures low latency of transactions and query statements and reduces the latency of status synchronization among compute nodes. The user-mode network protocol layer that uses the kernel bypass method is used to communicate between compute nodes and storage nodes.

PolarDB for PostgreSQL(Compatible with Oracle) supports a maximum storage capacity of 100 TB. The storage capacity can be expanded based on your business requirements. During the process of expanding the storage capacity, your business is not affected. You are charged only for the capacity that you use.

Polar File System (PolarFS) is a distributed file system independently developed by Alibaba Cloud. PolarFS is as the storage of PolarDB. In China, PolarFS is a distributed storage system designed for database applications. The system adopts an I/O stack in full user space and features low latency and high performance. For more information, see PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database in VLDB 2018. In addition, PolarFS provides I/O capability that features low latency and high performance. This capability is similar to the I/O capability of the local SSD architecture. PolarFS provides high storage capacity and performance based on the distributed cluster architecture. PolarFS is used as the storage infrastructure of PolarDB. In addition to the high performance and the scaling capability, the core competencies of PolarFS include a series of highly-reliable and high-availability storage technologies that are designed to coordinate with PolarDB databases. These technologies have been developed based on long-term experience in meeting the high business requirements of PolarDB customers. These technologies are also based on the large-scale development and O&M experience of public clouds.

PolarProxy is a built-in intelligent database proxy in PolarDB for PostgreSQL(Compatible with Oracle). PolarProxy provides a unified endpoint for applications. Multiple compute nodes at the underlying layer are transparent to applications. If you add or reduce nodes or a failover is performed, these operations are transparent to applications. This way, you do not need to change the endpoint of your database. PolarProxy resides between databases and applications and also controls the traffic of sessions and requests by using buffers, merging traffic, reusing connections, and balancing loads. This way, special scenarios such as high concurrency scenarios can be handled.