×
Community Blog OBProxy: A Detailed Explanation of the Functional Modules and Features

OBProxy: A Detailed Explanation of the Functional Modules and Features

Part 1 of the High-Performance Data Access Middleware series offers a detailed explanation of OBProxy function modules and features.

By Zhixin, OceanBase Technical Expert

Preface

OceanBase is a Distributed Relational Database Service (DRDS). As the size of ObServer clusters continues to expand, if you connect to the ObServer directly, the probability of stopping or machines going online or offline will also increase. As such, OBProxy was created to solve the problems of SQL routing and high availability in distributed database systems.

OBProxy (OceanBase Database Proxy) is a service proxy for OceanBase databases. Using OBProxy can mask the complexity brought about by the distribution of the backend ObServer cluster itself. It makes accessing a distributed database as simple as accessing a standalone database. To this end, we have planned a high-performance data access middleware: OBProxy topic, including nine articles. The topic will explain the deployment, principles, functions, architecture, troubleshooting, best practices, and other aspects of OBProxy at one time to help readers understand OBProxy.

Now, we will start a special series about OBProxy and learn OBProxy together. You will look at distributed system problems from the perspective of the full procedure and master important knowledge points, such as connection management, data routing, and high availability disaster recovery. You will understand SQL's experience from the ExecuteQuery interface call to return results so you can better control distributed databases!

This article offers a detailed explanation of OBProxy function modules and features to help you better understand what OBProxy is, its value, and how to use it.

1. What Is OBProxy?

Terms

Let's learn some terms for your follow-up reading.

Terms Description
ODP Product name of the agent
Obproxy/ObProxy ODP alias, ODP's binary, and process name
Obproxyd.sh Daemon script for the obproxy process is responsible for ODP startup and health check.
ODP Console Control center when using the sharding feature

OceanBase Database Proxy is a service proxy for OceanBase databases. The user's SQL statement is sent to the ODP node first. ODP selects an appropriate ObServer (OceanBase database process name) to forward the SQL statement and returns the result to the user. First, let's take a look at OBProxy from the overall architecture.

1

In the figure, APP represents a business process. There are three ObProxies in front of APP (the process name is Obproxy). In actual deployment, there is generally a Server Load Balancer between APP and ObProxy. For example, F5 distributes requests to multiple ObProxy, followed by ObServer. There are six ObServers in the figure. OBProxy knows the data distribution information in the ObServer and can efficiently forward user SQL to the machine where the data is located. The execution efficiency is higher than forwarding to a node without data. The data in table t1 is in P1, the data in table t2 is in P2, and the data in table t3 is in P3. Red indicates the primary replica, and blue indicates the secondary replica. For insert into t1 statements, OBProxy can send SQL to the ObServer machine that contains the primary replica P1 in IDC2.

Why does OBProxy need to send SQL statements to the node where the data resides? The SQL execution plan can be executed locally after the data is sent to the node where the data is located. The performance is better without remote RPC calls. In the actual production environment, in addition to data distribution, OBProxy considers the geographical distribution of machines to avoid requests across data centers and cities. There are many routing policies. We will also have special chapters to introduce to you later.

2. How to Use OBProxy

After deploying an OceanBase cluster (including OBProxy), users can use the database service. Let's use JDBC access database as an example:

final String URL = "jdbc:mysql://127.0.0.1:2883/test?useSSL=false&useServerPrepStmts=true";

When establishing a connection, users must initialize the relevant connection information first. The preceding URL contains information, such as the IP address of the database, PORT, the name test of the accessed database, and connection attributes. The difference between using OBProxy access and direct connection to ObServer access is the difference between IP and PORT. Other information does not need to be changed. When used later, OBProxy is transparent to users.

Therefore, using OBProxy will make the problem simple. Users do not need to care about the distributed architecture of the database system. The benefits of this design include the following three aspects.

  1. OBProxy is compatible with the MySQL protocol so users can use MySQL standard drivers.
  2. The code for users to access the database does not need to be changed when switching from a MySQL database to an OceanBase database.
  3. OBProxy shields users from the complexity of backend distributed systems, such as machine changes, machine downtime, unit distribution of tenants, and daily merging. This ensures the stability of connections between clients and OBProxy.

3. The Past and Present of OBProxy

Reviewing the history of development can help us better understand why OBProxy is what it is from the perspectives of solution design, business requirements, and historical compatibility. Let's learn about the past and present of OBProxy together.

3.1 The History of OBProxy Development

OBProxy products have been designed and developed since 2014 and have a history of nearly eight years. Its products are widely used in Ant Group, private cloud scenarios, and public cloud scenarios. They also play an important role in access links. The following is the development history:

2

To sum up, the development history of OBProxy is listed below:

  1. It was designed around the database kernel 1.0 architecture from 2014 to 2018. OBProxy provides the MySQL protocol proxy, efficient forwarding, connection management, data routing, and disaster recovery management capabilities. The main focus was on MySQL compatibility and distributed feature adaptation during this period. Its business was widely used within Ant.
  2. OBProxy explored the cloud-native product form of DBMesh from 2018 to 2021, which supports SideCar deployment, O&M, and management. It also pushes down SOFA ZDAL (SDK) capabilities to OBProxy, such as the sharding capability of the unitized architecture. The OceanBase database kernel was developed in Oracle mode (supported by commercial versions). OBProxy supports functions, such as Oracle mode protocol, partition table routing, PS protocol in distributed scenarios, and SSL link encryption.
  3. OBProxy has faced more challenges from 2021 to now, as there are more customers of OceanBase services. OBProxy is committed to improving its products to meet the challenges and diverse needs of its customers. First, OBProxy (together with OceanBase database kernel) supports more complex functions and continuously optimizes performance. Second, OBProxy has improved its usability and developed products, providing good product features and experiences for research and development personnel and operation and maintenance personnel. Third, OceanBase actively embraces the public cloud, explores and perfects the form of OBProxy resource pool, and adapts and combines with other cloud products to reduce costs and increase efficiency and provide more product capabilities to serve customers.

3.2 OBProxy Product Form

For middleware products, there are usually two types, SDK and proxy. Their respective advantages and disadvantages are listed below:

Service Type Advantage Limit
SDK Generally, it is integrated into the business code in the form of a library. Compared with the proxy mode, it has fewer hops, good performance, and short troubleshooting procedures. It is tightly coupled and mutually interacted with business codes, and the O&M operations are perceived by the business sides.
Proxy Decouple business logic and basic capabilities
Enable faster version iteration and upgrade
Support for multi-language drivers
Upgrade O&M is less business-aware.
This results in multiple or even two hops (Server Load Balancer) of the trace, which has a performance impact. Troubleshooting procedures are longer and more complex. Specialized O&M personnel is required.

Currently, OBProxy products are provided in proxy form, and we will also provide SDK form in the future. The main challenge for OBProxy developers in supporting the SDK and proxy modes is how to reuse code. The solution is to wrap the underlying capabilities into library interfaces and make business code and OBProxy code call each other through process communication technology.

4. Detailed Explanation of OBProxy Function Modules and Features

4.1 Function Module

Next, we will interpret the OBProxy function module to help you systematically understand the implementation and functions of OBProxy. The following figure divides the functions of OBProxy into three layers:

3

4.1.1 Basic Layer

The basic layer implements basic frameworks and basic tool libraries (such as network communication and thread management) and provides support for upper layers.

The network communication library supports TCP protocol, SSL protocol, and RDMA communication and encapsulates easy-to-use interfaces for upper-layer use. The asynchronous event framework completes thread creation, management, task distribution, and scheduling. The basic library encapsulates some basic capabilities and provides easy-to-use interfaces for writing code.

4.1.2 Business Layer

  1. The OBProxy business layer is the most complex and provides some basic capabilities related to database business.
  2. The database protocol implements MySQL mode, Oracle mode protocol, and proprietary protocol. The protocol enables OceanBase products to be compatible and develops more powerful features.
  3. Connection management handles client and server connections and provides advanced capabilities (such as connection maintenance and exception handling).
  4. SQL parsing is used to sense SQL semantics and extract routing key information (such as table names and partition keys from SQL).
  5. Data routing is used to distribute requests to the ObServer node with the most efficient backend execution. Accurate routing is particularly critical to performance.
  6. High availability of disaster recovery can ensure that OBProxy can discover the problematic ObServer in time or try again after selecting the problematic ObServer.
  7. Transaction status management is used to manage the transaction status on a connection. The transaction status affects the routing and forwarding of OBProxy.

4.1.3 Product Layer

In continuous development, OBProxy productizes some capabilities to provide external services. Its product forms are mainly proxy mode and SDK mode. Sharding is a database and table sharding capability supported by OBProxy in the unitized architecture of Ant Group. We are also exploring more useful features and enriching product functions.

4.2 Execution Process

After understanding the functional modules, we look at the execution process of OBProxy from SQL requests.

4

The execution process is listed below:

  1. The client establishes a TCP connection with the OBProxy. The OBProxy uses epoll (implemented in the network communication library) to process the read and write events of the socket.
  2. Read the byte stream from TCP, save it to the buffer, and parse the MySQL protocol message. Parse the header first and then decide whether to parse the subsequent content.
  3. Read SQL from the message and perform SQL parsing.
  4. Find the table data distribution based on the table name and location cache (table partition information) and select a node with data.
  5. Find the corresponding ObServer connection from the database connection and check the disaster recovery management of the ObServer.
  6. Use the selected connection to interact with the backend ObServer through the high-performance forwarding framework.
  7. Data received from the ObServer is processed at the protocol layer and returned to the client.

The preceding process does not describe disaster recovery management under abnormal conditions. You can refer to the preceding figure. In addition to requesting the main process, OBProxy has many background tasks, which are also important. We will introduce it in subsequent articles on this topic.

4.3 OBProxy Key Features

We learned about the functional modules of OBProxy in 5.1. We learned about the main work of OBProxy when executing an SQL statement in 5.2. To sum up, the main key features of OBProxy are listed below:

  1. High-Performance Forwarding: OBProxy is an important part of the data access process. It uses a multi-threaded asynchronous framework and a transparent streaming forwarding design to optimize the critical path code and ensure the minimum consumption of machine resources.
  2. Protocol Support: It supports multiple formats, such as MySQL, Oracle-compatible, and proprietary protocols. Currently, the proprietary protocol is being enhanced to achieve more powerful features.
  3. Connection Management: It is important to keep the client connection stable. The intuitive feeling is that the business does not report connection errors. OBProxy will shield the backend problems and maintain the stability of the client connection.
  4. Data Routing: Data routing affects performance and high availability. It is closely related to deployment architecture and data distribution and has a great impact on SQL execution. Correct routing is a point of great concern to everyone.
  5. Sharding Capability: It is an important part of the existing Finance Cloud solution. The C language version also has better performance.

5. OBProxy Future Planning

Hopefully, readers have better understanding of OBProxy after reading this article. How will OBProxy plan for the future? We will continue to meet the demands of customers and create good products.

OBProxy originated from Ant Group and served more customers. We believe the main future directions include:

  1. Basic Capabilities: It iterates according to customer requirements and the functions of ObServer, adapts to the database kernel, and supports new features, continuously improving and enhancing the stability and performance of the OBProxy kernel.
  2. Platform Adaptation: It supports more platforms, such as Kubernetes, Docker, cloud platform, and arm. It deeply adapts the platform to provide a better product experience by using platform capabilities.
  3. Ecological Connection: On the one hand, it is compatible with existing open-source projects, such as providing monitoring data for skywalking. On the other hand, it supports the connection between open-source projects and OceanBase databases, allowing the open-source community to use OceanBase databases better.
  4. Productization: It creates mature solutions to serve customers according to OBProxy features. It continuously improves document content and polishes the ease of use of features.
  5. Driver: It adapts to more language drivers and provides both agent and SDK forms to provide a good product experience.

The future is full of opportunities and challenges. OBProxy will continuously forge forward together with the OceanBase database kernel to provide good products, documents, and services for everyone.

6. Summary

OBProxy is not very much noticeable to users. However, when it comes to some advanced features and the distribution of OceanBase databases, the principle of OBProxy is an unavoidable topic. Later, we will provide more interesting content, such as data routing, full trace troubleshooting, connection stability, and the high availability of distributed systems. These contents will also help you understand the OceanBase database. I hope you can learn and grow through this series of articles!

0 0 0
Share on

OceanBase

16 posts | 0 followers

You may also like

Comments

OceanBase

16 posts | 0 followers

Related Products