PolarDB separates compute nodes from storage nodes. This allows you to scale out a PolarDB instance in real time. A single database instance provides limited computing capability. To scale out a single database instance, you must create multiple database replicas. In this traditional method, multiple copies of full data must be stored. In addition, log data is frequently synchronized between the primary node and replicas. This results in high network overheads. If replicas need to be added to a traditional database cluster, all incremental data must be synchronized to these replicas. As a result, the synchronization latency increases. In PolarDB, database files and log files, such as redo log files, are stored in shared storage devices. This ensures that the same full data and incremental data is shared by the primary node and all the replicas. Only the metadata information in the memory of the primary node needs to be synchronized to replicas. Multiversion concurrency control (MVCC) is used to ensure the data read consistency among nodes. This way, the data that is synchronized among the replicas and the primary node is consistent. In addition, the network overheads of cross-node data synchronization are reduced and the synchronization latency is reduced.
The compute nodes of PolarDB provide multiple features. For example, the compute nodes can parse and optimize SQL statements, execute parallel queries, and handle lock-free high-performance transactions. The memory status is synchronized among compute nodes by using the protocol of high-throughput physical replication. The maximum number of database nodes is 16. One of these 16 nodes is a primary node and the other nodes are read-only nodes. A read-only node in PolarDB can be created within 5 minutes by using the preceding real-time scaling capability. The volume that is mounted on the read-only node is the same as the volume that is mounted on the other nodes. If you create a read-only node in PolarDB, this node can provide services without replicating data from the other nodes.
Compute nodes and storage nodes can push operators such as filter and projection from the compute layer to the storage layer for execution by using an intelligent interconnection protocol that is based on database semantics. Compute nodes and storage nodes are connected over a 25 Gbit/s Remote Direct Memory Access (RDMA) network. This ensures low latency of transactions and query statements and reduces the latency of status synchronization among compute nodes. The user-mode network protocol layer that uses the kernel bypass technique is used for communication between compute nodes and storage nodes.
PolarDB supports a maximum storage capacity of 100 TB. The storage capacity can be dynamically expanded based on your business requirements. During the process of expanding the storage capacity, your business is not affected. You are charged only for the capacity that you use. PolarFS is a distributed file system independently developed by Alibaba Cloud. PolarFS is used in the storage layer of PolarDB. In China, PolarFS is the first distributed storage system designed for database applications. The system adopts an I/O stack in full user space and features low latency and high performance. For more information, see PolarFS: An Ultra-low Latency and Failure Resilient Distributed File System for Shared Storage Cloud Database in VLDB 2018. In addition, PolarFS provides I/O capability that features low latency and high performance. This capability is similar to the I/O capability of the local SSD architecture. PolarFS provides high storage capacity and performance based on the distributed cluster architecture. PolarFS serves as the storage infrastructure of PolarDB. In addition to the high performance and the scaling capability, the core competencies of PolarFS include a series of highly-reliable and high-availability storage technologies that are designed to coordinate with PolarDB databases. These technologies have been developed based on long-term experience in meeting the high business requirements of PolarDB customers. These technologies are also based on the large-scale development and O&M experience of public clouds.
PolarProxy is a built-in intelligent database proxy in PolarDB. This proxy provides a unified endpoint for applications. Multiple compute nodes at the underlying layer are transparent to applications. If you add or reduce nodes or a failover is performed, these operations are transparent to applications. In this case, you do not need to change the endpoint of your database. PolarProxy resides between databases and applications. PolarProxy also controls the traffic of sessions and requests by using buffers, merging traffic, reusing connections, and balancing loads. This way, special scenarios such as high concurrency scenarios can be handled.