All Products
Search
Document Center

PolarDB:Binary logs

Last Updated:Jul 12, 2023

PolarDB-X supports the binary log feature in two modes. This topic describes the two modes and their scenarios.

Overview

MySQL provides binary log files to record data changes. A binary log file can be regarded as a message queue that stores detailed incremental data changes in chronological order. Downstream systems or tools consume the changes in the queue to synchronize data from MySQL in real time. This mechanism is also referred to as change data capture (CDC).

PolarDB-X is a distributed database service that is compatible with the MySQL ecosystem. A PolarDB-X instance uses the CDC component to provide change logs that are compatible with the binary log format of MySQL. The distributed features of PolarDB-X, such as instance scaling, distributed transactions, and global indexes, are not reflected in the change logs. This way, you can use a PolarDB-X instance in the same manner that you use a standalone MySQL database.

PolarDB-X provides binary logs in two modes. The two modes can be used at the same time.

  • Single-stream mode: The single-stream mode is also referred to as the Global Binlog mode. In this mode, the binary logs of all data nodes are merged into a global queue. A single log stream that ensures the integrity and order of transactions is provided. The single-stream mode provides a stronger guarantee for data consistency. For example, in the transfer scenario, a consistent balance can be queried at any time in downstream MySQL databases that subscribe to the single-stream binary logs of an PolarDB-X instance.

  • Multi-stream mode: The multi-stream mode is also referred to as the Binlog-X mode. In this mode, the binary logs of all data nodes are not merged into a global queue. Instead, the logs are distributed to different log streams by using a hash algorithm. The multi-stream mode compromises the integrity of transactions to some extent, but greatly improves the extensibility. The single-point bottleneck faced by single-stream binary logs of large-scale clusters can be resolved.

Single-stream mode

The raw binary logs of all data nodes are sorted and merged into one queue, and the internal details are removed. This way, PolarDB-X provides a log stream that is compatible with the binary log format and dump protocol of MySQL. By default, the single-stream binary log feature is enabled when you purchase a PolarDB-X instance.

Usage notes

  • The master and slave nodes of the CDC component replicate binary log files from each other to ensure data consistency between the two nodes. Downstream systems consume binary logs based on the file name and position. The file name and position do not change even if a switchover occurs between the master and slave nodes.

  • Distributed transactions can be merged only if the transaction policy is set to Timestamp Oracle (TSO). Otherwise, only eventual consistency can be ensured for data. By default, the transaction policy in PolarDB-X is TSO.

  • If you need to modify the partition key value for a data row, make sure that value is modified within a distributed transaction whose transaction policy is set to TSO. This way, the DELETE event is recorded in binary logs earlier than the INSERT event. This ensures data consistency. Specifically, to modify a partition key value with data consistency ensured, you must first set the transaction policy to TSO, and then perform one of the following operations:

    • Execute an UPDATE statement to modify the partition key value.

    • Execute a REPLACE statement to modify the partition key value.

    • Explicitly start a transaction to execute a DELETE statement, modify the partition key value, and then execute an INSERT statement.

Multi-stream mode

Multi-stream binary logs are also fully compatible with the binary log format and dump protocol of MySQL. Each binary log stream can be regarded as a log stream from a standalone MySQL database. SQL statements such as CHANGE MASTER and SHOW BINLOG EVENTS can be executed on each log stream to consume or view binary logs.

By default, the multi-stream binary log feature is disabled. To use the multi-stream binary log feature, you must enable this feature in the console. You can create multiple multi-stream groups for a PolarDB-X instance. Each multi-stream group contains multiple log streams. Different groups are isolated from each other. You can configure parameters such as the number of log streams and the splitting level for a multi-stream group based on your business requirements.

Splitting level

The multi-stream binary log feature provides three types of splitting levels. You can configure splitting levels based on your business scenario when you create a multi-stream group.

  • Database level

    Binary logs are distributed to different log streams based on the hash values that are calculated by using database names. This way, the binary logs of a database are always routed to the same log stream in sequence. This splitting level is suitable for scenarios in which a single PolarDB-X instance has a large number of databases. If a transaction does not involve cross-database operations, the integrity of the transaction can be ensured in binary logs that are split at the database level.

  • Table level

    Binary logs are distributed to different log streams based on the hash values that are calculated by using table names. This way, the binary logs of a table are always routed to the same log stream in sequence. This splitting level is suitable for scenarios in which a large number of tables exist, and the operations on a single table, such as DML and DDL operations, are expected to be kept in sequence in a binary log stream.

  • Record level

    Binary logs are distributed to different log streams based on the hash values that are calculated by using the primary key values of data rows. This way, the binary logs of a data row are always routed to the same log stream in sequence. This splitting level is suitable for scenarios where binary logs are expected to be fully dispersed and do not need to be kept in sequence for a database or a table. To use this splitting level, make sure that the data tables contain a primary key. Binary logs of a table that does not contain a primary key are directly discarded.

You can configure splitting levels at the service layer and the database and table layer. After a splitting level is configured at a layer, you cannot modify the splitting level. Otherwise, the same binary logs appear in different log streams. This causes data inconsistency. Before you create a multi-stream group, we recommend that you plan the splitting levels based on your business requirements.

  • Service layer

    The splitting level configured at the service layer is the default splitting level of a multi-stream group. If you do not configure a splitting level for a database or a table, the splitting level configured at the service layer is used.

  • Database and table layer

    You can separately configure a splitting level for a database or table. The splitting level configured for a database or table overrides the splitting level configured at the service layer. This meets the requirements for differentiated management.

Usage notes

  • After you create a multi-stream group, you cannot modify the number of log streams. Plan the number of log streams before you create a multi-stream group. We recommend that you configure the number of log streams to be greater than or equal to the number of data nodes.

  • After you create a multi-stream group, you cannot modify the splitting levels that take effect. We recommend that you plan the splitting levels before you create a multi-stream group.

  • If you want to separately configure a splitting level for a new table, configure the splitting level before data is written to the table.

  • You can rename tables after you set the splitting level to table. PolarDB-X always split logs based on the initial table names.

  • If the splitting level is set to table, the binary logs of a large table may be routed to some specific log streams. This causes a data skew issue. In this case, you can separately configure a splitting level for the large table.

  • If you want to modify the number of log streams or the splitting levels that take effect, you can create another multi-stream group to replace the original multi-stream group. In this case, some O&M operations need to be performed in downstream systems to adjust the log consumption.

  • If the splitting level is set to record, a data table contains the UNIQUE constraint, and unique key swapping occurs, data inconsistency may occur. For example, the value a of the unique key name is successively held by data rows whose id values are 1 and 2. The execution order of the delete(id=1,name=1) and insert(id=2,name=a) statements in the destination database is uncertain. If the insert(id=2,name=2) statement is executed before the delete(id=1,name=1) statement, a write conflict occurs. In this case, we recommend that you set the splitting level to table.

Transparent consumption

The CDC component preferentially saves binary log files on local disks, and can upload the files to a remote storage such as Object Storage Service (OSS) in real time. Generally, the files are stored on the local disks for a short period of time, and on the remote storage for a long period of time, such as 15 days. The CDC component provides the transparent consumption feature that shields the storage differences between the local disks and remote storage. Downstream systems can access the binary log files on the remote storage without any adaptation.

Note

CDC V2.0.0 and later support the transparent consumption feature.

Active geo-redundancy

In addition to using external systems as data storage, the CDC component of PolarDB-X supports business deployment in the active geo-redundancy architecture. For example, users are granted write permissions on different data centers based on the areas to which the users belong. In this case, users can only write data in the specified data centers. For read operations, each user can read data from the replica nearest to the geographical area in which the user resides. PolarDB-X uses the CDC component to synchronize data to the replicas when the data is written to a data center.