Community Blog An Interpretation of OceanBase Database Source Code – Module Structure

An Interpretation of OceanBase Database Source Code – Module Structure

This article introduces the overall architecture and module composition of the OceanBase database code and the functions of each module.

By Zhu Weng, Director of Core R&D at OceanBase

Zhu Weng, original name Zhifeng Yang, graduated from Peking University, has been engaged in research on Distributed Database Systems for a long time. Now he is participating in the research and development of the OceanBase, Alibaba's proprietary distributed relational database, committing to adopt advanced design to make HTAP database system the core of technical benchmark. In the OceanBase system, he has been responsible for the research and development of Ocean Base's SQL engine, distributed main control module, multi-mode database direction, and Ocean Base's database platform product development, and recently began to be responsible for the core innovation research and development work. Zhu Weng has an in-depth understanding of C++, distributed system principles, SQL query processing, transaction processing, compilation technology, engineering efficiency, etc.


At the Database OceanBase 3.0 Summit, OceanBase announced the official open-source and established an OceanBase open-source community. 300 million lines of core code were opened to the community. The open-source OceanBase community version of the code is not easy for newcomers to get started after years of iteration and change. I will write a series of articles about source code interpretation to help readers understand.

This article will be introduced from the following six modules:

  1. The Overall Architecture of the Database: Sort out the overall architecture and module composition of the OceanBase database code and the respective functions of each module
  2. The Process of SQL: Introduce the execution process of any SQL in the OceanBase database, including the process of receiving, processing, and returning the results to the client
  3. Partition: Explain the related knowledge of the OceanBase database storage layer
  4. Transactions: Parse the external interface of OceanBase database transactions
  5. Tenant: Describe the characteristics of multi-tenancy in OceanBase databases
  6. Virtual Tables: Disassemble the essence of OceanBase database virtual tables

(Note: This series is a code guide, not a design interpretation. It must be viewed in combination with the code, and it would be better to match it with practice. Otherwise, the tutorial manual will just be a textbook.)

You can understand the basic principles of OceanBase databases and get the implementation steps of databases through source code interpretation articles. You can also apply OceanBase implementation principles to other databases, which will help you learn from other databases. After you are familiar with the OceanBase code, you can use our code in the follow up work or contribute your code to the OceanBase community.

This article is Part 1 of the Source Code Interpretation Series. It introduces the overall architecture and module composition of the OceanBase database code and the functions of each module.

Top-Level Directory


The preceding figure shows the top-level directory. The body code is in the src directory, and the unit testing code is in the unittest directory. The unittest directory has the same structure and naming method as the src directory. For example, src/sql/abc.cpp corresponding to the single test file is unittest/sql/test_abc.cpp. The single test uses the gtest and gmock framework. The unittest directory contains integration tests for some important components.

The test directory is the system test, where the test object is the fully started observer. The test/mysql_test directory contains test cases run using the modified mysql_test framework. It uses SQL to test the system function correctness.

The cmake directory is related to the build.sh script compilation, which will be described in detail in a later article.

deps Directory


The deps directory is special. It contains what src depends on. The deps/3rd directory contains a set of tools for downloading and compiling third-party libraries, specially developed for the community edition. deps/easy is a libev-based rpc framework developed by Alibaba's Duo Long in the early years. We have made some modifications on this basis. With the open-source of OceanBase, OceanBase's rpc framework is based on easy directory. deps/oblib is the core base library. Why put it here? This is because it has been recombined with the OceanBase code repository many times.

oblib Directory


Generally, the oblib library does not depend on OceanBase src, but it is depended on. Rpc is an internalrpc framework used by business code in OceanBase. It depends on libeasy. The rpc module provides a set of convenient macros to define the rpc. The lib directory is the lowest layer of dependencies. It has no external dependencies and contains a large number of basic classes, such as error code definitions, container classes, memory allocators, and the most basic header file ob_define.h. (If you want to drink coffee, you can change this file and execute make.) Generally, the code in the oblib directory, especially the code in the oblib/src/lib directory, is not related to the service code of OceanBase. In other words, if you are working on another C++ project, you can use this library. Note: The encoding specification of OceanBase does not require STL containers, so there are a large number of wheels. The code in the common directory depends on lib, but it is more service-independent than the top-level src. If you make a storage system (even if it is not a database), you may use the public classes here. The most important class is ObObj in ob_object.h, which indicates a value that contains type information. For example, if you add a column type, you must change this class.


The following section focuses on several subdirectories in the deps/oblib/src/common directory. The object directory defines the most important data type ObObj, and the column data type supported by OceanBase. This can be seen from the enumeration type ObObObjType. Later, 36 is the data type under the Oracletenant type. ObObj is the atom for storage and data processing. The ObRowkey defined in the rowkey directory is the primary key of each row of records. OceanBase stores only index-organized tables at the underlying layer. Each row must contain a primary key. A user-visible table without a primary key uses a hidden auto-increment column as a rowkey, which is a simulation. The memtable and sstable of the storage engine are indexed with the rowkey. The row directory defines a row of records to represent ObNewRow (you cannot find ObRow:), which is the molecule of data processing. The ObRowIterator is the interface of many operation classes based on it.

The log directory defines a set of useful log macros. Macros (such as LOG_WARN) are found in the code of ApsaraDB for OceanBase and are provided in ob_log.h. Its interface combines the advantages of printf and cout. It is not as simple ascout. It is strongly typed and has a unified key-value style. We used many tricks of templates and macros to implement this set of interfaces in the old version of C++. If you are familiar with this set of interfaces first and then try to contribute code, you will fall in love with them. (This is the millet plus rifle for debugging distributed systems.)

src Directory


Finally, it's the src directory.

The election is distributed election module. It is relatively independent because if the elections do not decide the first leader, all components would not work. It is independent of the Paxos protocol. This election protocol requires clock synchronization across nodes. Clog originally meant commitlog, but it has become a proprietary term, referring to transaction redo logs of OceanBase. The implementation of Paxos is also in this directory. The archive is a log archiving component. Backup and recovery depend on this component.

The rootserver directory is the master control service of OceanBase clusters. This name is not accurate. The exact name should be rootservice, which is not an independent process but a set of services started by some observer. Interested readers can look at the open-source code of OceanBase 0.4. Cluster management and automatic disaster recovery, system bootstrapping, partition replica management and SLB, and execution of DDL are all in this component.

Share directories are public classes that are forcibly separated from the parent body oblib/src/common, so their namespace is common instead of shared.

sql refers to SQL. storage is the storage engine. Transaction management is located under storage/transaction.

The observer is the assembly shop of all components. The entries are ob_server.h and ob_service.h. The protocol layer command processing entry for MySQL is located in observer/mysql.


The source code of the database is extensive and profound. This article provides an overall overview of the directory architecture of the source code of OceanBase. The content will be disassembled later. Hopefully, it will be helpful for you.

0 0 0
Share on


1 posts | 0 followers

You may also like



1 posts | 0 followers

Related Products