Key features of PolarDB-X - PolarDB - Alibaba Cloud Documentation Center

This topic describes the key features of PolarDB-X.

Distributed linear scalability

PolarDB-X horizontally partitions data in a table into multiple data nodes. Data is partitioned by using partitioning functions. PolarDB-X supports common partitioning functions such as hash partitioning and range partitioning.

As shown in the following example, the orders table in the shop database is distributed into 12 partitions from orders_00 to orders_11, based on the hash value of the ID attribute of each row. The partitions are evenly distributed across four data nodes. Users do not need to be concerned about the specific data distribution. The distributed SQL layer of PolarDB-X automatically routes queries to the correct nodes and aggregates results from different partitions and nodes.

分布式线性扩展

Scale-out and migration

The amount of data increases when your business grows. In most cases, you must add data nodes to handle the increased amount of data. When you add a new data node to an instance, PolarDB-X automatically performs a scale-out task to rebalance the data.

In the following example, the data of the orders table is originally distributed into four data nodes. After the number of data nodes of the instance are increased from 4 to 6, PolarDB-X automatically performs a scale-out task to migrate some partitions from the existing nodes to the new nodes. The migration process is completed in the background by using idle resources and does not affect your online business.

扩容迁移

High availability and disaster recovery

In most cases, when a database instance is deployed in a production environment, multiple replicas are created to ensure the high availability and data durability of the instance. Modern databases often use a majority consensus replication protocol such as Paxos to ensure strong consistency between replicas. The protocol requires that at least three nodes exist in the instance, and each write operation is acknowledged by more than half of the nodes. This way, the instance can provide services as expected even if one of the nodes is down. PolarDB-X employs the X-Paxos replication protocol developed by Alibaba. X-Paxos is an enhanced version of Paxos that provides extensive optimization in terms of functionality and performance. This protocol has reliably supported the Double 11 Shopping Festival consecutively for over a decade, which demonstrates its stability and reliability.

The Paxos replication protocol allows you to deploy a PolarDB-X instance across multiple data centers and ensure data center-level disaster recovery. Common deployment methods include three data centers in the same zone and three data centers across two zones. The second method is used in hybrid cloud deployment scenarios. In most cases, one of the three data centers functions as the primary data center due to the characteristics of the Paxos protocol. The primary data center is responsible for providing external services.

Distributed transactions

PolarDB-X natively supports distributed transactions and can guarantee the atomicity, consistency, isolation, and durability (ACID) of the transactions.

PolarDB-X uses Timestamp Oracle (TSO) and multiversion concurrency control (MVCC) to ensure the consistency of the snapshots that are read. This way, the intermediate status of a distributed transaction such as a money transfer transaction is not read. When the compute node commits a transaction, the compute node executes the transaction and obtains the timestamp from TSO. Then, the compute node commits the timestamp and the data to the multi-version storage engine that run by a data node. During the read process, if the query involves data stored across multiple partitions, PolarDB-X retrieves a global timestamp to use as the version number for the read operation. Then, PolarDB-X assesses the visibility of the version of each row to ensure that PolarDB-X reads only the data written by transactions committed before the global timestamp.

Distributed transactions are a fundamental feature in distributed systems. For example, in a read/write splitting solution, the multiple versions of data in a transaction are synchronized to the learner replicas to ensure that read-only instances do not read stale data due to synchronization latency. In a log file that records global data changes, distributed transactions are sorted by timestamp. When PolarDB-X performs a point-in-time recovery (PITR), PolarDB-X uses the timestamps of distributed transactions to accurately identify the globally consistent version of data at the corresponding point in time.

Integrated centralized-distributed architecture

PolarDB-X supports the integrated centralized-distributed architecture.

By using the integrated centralized-distributed architecture, PolarDB-X combines the scalability and resiliency of distributed databases with the centralized management capability and performance of centralized databases. You can seamlessly switch between the centralized and distributed modes. In an integrated centralized-distributed database, data nodes operate independently in a centralized manner and are fully compatible with the single-node database model. As the business grows and an upgrade to a distributed system is required, the architecture can be seamlessly transitioned in place into a distributed model. During the upgrade, distributed components are seamlessly integrated with the existing DNs without the need to migrate data or modify the applications.

To facilitate integration with the integrated centralized-distributed architecture, PolarDB-X instances are available in the following editions: Standard Edition (centralized) and Enterprise Edition (distributed).

HTAP

PolarDB-X supports hybrid transaction/analytical processing (HTAP). This allows PolarDB-X to support highly concurrent requests, transactional requests, and complex analytical queries. Analytical queries are performed on large amounts of data and require complex computations. For example, you can perform analytical queries to aggregate data within a specific period of time. Compared with common simple queries, analytical queries require a longer period of time to execute and consume more computing resources. Several seconds or minutes are required to execute an analytical query.

To enhance the performance of complex analytical queries, PolarDB-X has introduced the IMCI technology. Combined with vectorized operators, this technology significantly improves analytical processing capabilities.

Compatibility with the MySQL ecosystem

PolarDB-X is developed to ensure full compatibility with the MySQL ecosystem. This section describes the compatibility between PolarDB-X and MySQL in terms of SQL syntax, transaction behavior, and data import and export. For more information, see Compatibility with MySQL.

PolarDB-X is compatible with the MySQL protocol.PolarDB-X instances can communicate with common MySQL clients by using drivers such as Java Database Connectivity (JDBC) drivers, Open Database Connectivity (ODBC) drivers, and Go drivers. PolarDB-X can be connected to MySQL clients by using protocols such as SSL, the prepared statement protocol, and Load.

PolarDB-X is compatible with DML, Data Access Language (DAL), and DDL statements in MySQL.

Most MySQL functions, including JSON, encryption, and decryption functions
Views, common table expressions (CTEs), window functions, and analytic functions in MySQL 8.0
Various data types in MySQL, including TIMESTAMP and DECIMAL.
Common strings, character sets, and collations in MySQL
Most information_schema views

PolarDB-X is compatible with the MySQL binlog replication protocol. You can treat a PolarDB-X cluster as a normal MySQL node and use another MySQL node as the synchronization source or destination of the PolarDB-X cluster. The binary log format of PolarDB-X is the same as that of MySQL and therefore PolarDB-X can also be used in scenarios that use the change data capture (CDC) mechanism. For example, Canal can be used to synchronize data from PolarDB-X to other storage solutions.