Community Blog An Interpretation of the Source Code of OceanBase (8): Submission and Playback of Transaction Logs

An Interpretation of the Source Code of OceanBase (8): Submission and Playback of Transaction Logs

This article discusses the submission and playback of transaction logs.

By Keqing

The seventh article of this series (Understand the Implementation Principles of Database Index) introduced the index build process of OceanBase from the perspective of code introduction and explained the code related to index construction.

This eighth article of this series explains the submission and playback of transaction logs. OceanBase logs (clogs) are similar to REDO logs of traditional databases. This module is responsible for persisting transaction data during transaction commit and implements a distributed consistency protocol based on Multi_Paxos.

The Design Concept of Log Module

Compared with the traditional RDS log module, OceanBase's Log Service faces the following challenges:

  • After Multi_Paxos replaces the traditional primary/secondary synchronization mechanism, the high availability of the system and high reliability of data are realized.
  • Global deployment needs to be supported, with network latency between multiple replicas reaching tens or even hundreds of ms.
  • As a distributed database, OceanBase uses partitions as the basic unit for data synchronization. A single device must support 100,000 partitions. On this basis, log data must be written and read efficiently.
  • At the same time, the batch of operations of any function needs to be considered to speed up execution. Log Service maintains status information (such as the member list and leader of the replica(. It needs to provide efficient support for the rich functionality of the distributed system.

The Log's Life

The log module of OceanBase implements a set of standard Multi_Paxos to ensure that all submitted data can be recovered without a majority of permanent failures.

It also implements an out-of-order log commit to ensure there are no dependencies between transactions. Before the following introduction to OceanBase's project implementation for Multi_Paxos, readers take for granted that they have already understood the core idea of Multi_Paxos. If you want to know more, please learn more about Multi_Paxos in the question and answer section of the community.

Let's take a transaction log of a partition as an example. The normal process is shown below:


  1. The transaction layer calls the log_service->submit_log () interface to commit logs (which carries on_succ_cb callback pointers).
  2. The clog first assigns log_id and submit_timestamp and commits to a sliding window to generate a new log_task. Then, the local system commits the write disk and synchronizes the logs to the follower through RPC.
  3. When local disk writing is complete, call log_service->flush_cb() to update the log_task status and mark local persistence. When the follower disk writing is successful, return the ACK to the leader.
  4. The leader receives the ack_list of the ack and updates log_task.
  5. The leader counts the majority in steps 4 and 5. Once a majority is reached the leader sequentially calls the log_task->on_success callback transactions while sending a confirmed_info message to the follower and slides this log out of the sliding window.
  6. After receiving the confirmed_info message, the follower tries to slide this log out of the sliding window, submits, and replays the log in the sliding out of the operation.

On the follower, the submitted logs are played back in real-time. The playback process is shown below:


When the follower slides out the log, the corresponding value of the submit_log_task that records the playback point in each partition will be increased. This task will be asynchronously committed to a global thread pool for consumption.

When the global thread pool consumes the submit_log_task of a partition, it reads all the logs to be played back in it and distributes them to the corresponding four playback task queues of the partition. The allocation method is mainly to hash according to the trans_id to ensure the logs of the same transaction are allocated to one queue. The queue assigned to the new task asynchronously submits the task_queue as a task to the global thread pool mentioned in 1 for consumption.

When the global thread pool consumes task_queue, it traverses all subtasks in the queue in turn and executes the corresponding application logic based on the log type. At this point, a log has been synchronized to the follower and can be read on the follower.

After learning the source code interpretation of this article, you may have a corresponding understanding of the submission and playback of transaction logs. The next article in this series will explain the relevant content of OceanBase storage layer code interpretation. Please look forward to it!

0 0 0
Share on


16 posts | 0 followers

You may also like