Community Blog An Interpretation of the Source Code of OceanBase (11): Analysis of Location Cache Module

An Interpretation of the Source Code of OceanBase (11): Analysis of Location Cache Module

This article explains ObTableScan design and code knowledge and introduces the analysis of the location cache module.

By Zhennan

The tenth article of this series (Table One and Its Service Addressing) introduced the creation of the system tenant's Table 1 and explained the service addressing process related to Table 1. This article introduces the analysis of the location cache module.

Location cache is a basic module on the observer. It provides the ability to obtain and cache the location information of a replica for multiple other modules (such as SQL, transaction, and CLOG). Location cache depends on the meta tables at all levels, the underlying partition_service, and the service of log_service to obtain the location information of the replica. It passively updates and caches the cache by calling each module. Each module of the same observer shares the same location cache.


The Cache Content of the Location Cache

In the OceanBase cluster, the location information of each replica is recorded in the meta table. If you have to send SQL to the meta table to find the location every time you access the copy, it is too inefficient. Therefore, we cache the location information of the entity table in each observer, which is managed by the location cache module and implemented in the ObPartitionLocationCache. The main cache content is:

1. Core Location Cache

Sys_cache_ caches the location information of system tables, and user_cache_ caches the location information of user tables.

Different data structures are stored separately to avoid the excessive number of user tables and squeeze out the location cache of system tables.


2. Leader Cache

  • Sys_leader_cache_: It is used to cache the leader information of the system table of the system tenant in the cluster.

The cache is proposed to support the function of obtaining the leader of the system tenant table without relying on internal tables in limited scenarios. On the one hand, it solves the problem when distributed transaction promotion cannot obtain the system table leader deadlock. On the other hand, it can slightly optimize the speed of obtaining the leader of the system tenant table.

  • Leader_cache_ is used to cache the leader information of the user table.

It was introduced to optimize the location path of the nonblock_get_leader() method by adding the leader_cache_ to cache the leader location information, considering the consumption of getting the location from the KVCache structure.


Capabilities Provided by the Location Cache Module

The location cache module caches the location information of the accessed entity table locally based on each observer. Location cache uses a passive refresh mechanism. When other internal modules find that the cache is invalid, the refreshed interface must be called to refresh the cache.

Corresponding to the cached content, the location cache module provides the ability to obtain the specific pkey(pgkey) corresponding to the partition and leader location information. It is mainly applied to modules such as SQL, Proxy, storage, transaction, and clog (the latter two focus on leader information). The interfaces are listed below:

// Synchronous interface:
int ObPartitionLocationCache::get(const uint64_t table_id,
                                  const int64_t partition_id,
                                  ObPartitionLocation &location,
                                  const int64_t expire_renew_time,  // If this parameter is set to INT64_MAX, it indicates a forced refresh.
                                  bool &is_cache_hit,
                                  const bool auto_update /*= true*/) // The get function has the refresh function.
int ObPartitionLocationCache::get_strong_leader(const common::ObPartitionKey &partition,
                                                common::ObAddr &leader,
                                                const bool force_renew) // The leader is essentially obtained through the get function, which has refresh capability.
// Asynchronous interfaces:
int ObPartitionLocationCache::nonblock_get(const uint64_t table_id,
                                           const int64_t partition_id,
                                           ObPartitionLocation &location,
                                           const int64_t cluster_id)  // Query from the location cache in nonblock mode.
int ObPartitionLocationCache::nonblock_get_strong_leader(const ObPartitionKey &partition, ObAddr &leader) // Check the leader cache first, if not, go to nonblock_get.
int ObPartitionLocationCache::nonblock_renew(const ObPartitionKey &partition,
                                             const int64_t expire_renew_time,    
                                             const int64_t specific_cluster_id) // With the above two functions, the location cache is refreshed if the access fails. Implemented by ObLocationAsyncUpdateTask.

Nonblock_renew() is implemented in the form of a task queue:


Priorities are assigned to multiple queues to speed up the recovery of abnormal scenarios:

• pall_root_update_queue_; // __all_core_table、__all_root_table、__all_tenant_gts、__all_gts

• prs_restart_queue_; // rs restart related sys table

• psys_update_queue_; // other sys table in sys tenant

• puser_ha_update_queue_; // __all_dummy、__all_tenant_meta_table

• ptenant_space_update_queue_; // sys table in tenant space

• puser_update_queue_; // user table

Refresh Mechanism of Location Cache

Main refresh processes:


1. SQL Refresh – The Original Refresh Method

SQL refresh uses SQL statements to query location information in the meta table to refresh the local cache. This refresh process is the same as the reporting process.


SQL refresh depends on the meta table being readable and the reporting process running properly. SQL refresh has a certain delay. For example, when the leader changes and the report is in progress, the SQL refresh returns the old leader. This issue can be resolved by refreshing it again.

2. RPC Refresh – Independent of Meta Table Refresh

Why Is RPC Refresh Proposed?

Since SQL refresh depends on meta tables, SQL modules, and underlying reports. The location cache cannot be refreshed in case of network exceptions. As shown in the following figure, server A (where the meta table is located) is disconnected from the DEF network, and D issues a refresh request. In this case, SQL refresh cannot be performed. However, the location of D can be determined using the replica location information of the same partition on EF. RPC refresh reduces the dependency on meta table and SQL query consumption to a certain extent, accelerating cache refresh.


Implementation of RPC Refresh

The central idea is to obtain all replica location information of partitions in this region through the old cache. RPC obtains member_info (including the leader, member_list, lower_list, etc.) through the partition service on the corresponding server and compares member_info with the old location information to perceive the change of leader and replica type (F->L).

The conditions for a successful refresh are member_list unchanged and non_paxos members unchanged.

Advantages and Disadvantages of RPC Refresh

Advantages: The consumption is very small, so the replica changes in the leader and member lists can be sensed more quickly, which has a good effect on unowned election and leader re-election. Therefore, we will give priority to RPC refresh.

Disadvantages: RPC refresh only ensures the leader, the Paxos member list of the replica, and the location information of the read-only replica directly cascaded under the Paxos member of this region is accurate to pursue efficiency. However, it does not sense the changes of the read-only replica cascaded at the second level and above, nor does it sense changes to read-only replicas cascaded under other regions.

3. Other Mechanisms

Forced SQL Refresh

  • Aim: RPC refresh cannot perceive changes in remote read-only replicas and changes of read-only replicas cascaded at level 2 and above.
  • Method:

    • Limit the number of SQL refreshes per second, FORCE_REFRESH_LOCATION_CACHE_THRESHOLD

Batch Refresh

  • Aims to optimize the refresh speed of location cache in RTO scenarios and optimize the daily SQL execution
  • Method:

    • Tasks are classified by partition table type.
    • Tasks are classified by Sys partitions and user partitions.
    • Tasks are classified by tenants.
    • The location_cache for __all_core_table/__all_root_table refreshes separately.

The Location Cache of the Virtual Table

The virtual table does not have a storage entity and is generated based on specific rules when you are querying. The location cache module provides special location information for queries on virtual tables to unify the query logic at the SQL layer.

1. Classification of Virtual Tables

From the perspective of distribution, virtual tables can be divided into the following three categories:

  • LOC_DIST_MODE_ONLY_LOCAL: They are virtual tables that are only executed locally.
  • LOC_DIST_MODE_DISTRIBUTED: They are virtual tables for distributed execution (including cluster-level and tenant-level).
  • LOC_DIST_MODE_ONLY_RS: They are virtual tables that need to be executed on RS.

2. Obtain the Location of the Virtual Table

// Key functions::
int ObSqlPartitionLocationCache::virtual_get(const uint64_t table_id,const int64_t partition_id,share::ObPartitionLocation &location,const int64_t expire_renew_time,bool &is_cache_hit)

//LOC_DIST_MODE_ONLY_LOCAL:// only requires its own address.
int ObSqlPartitionLocationCache::build_local_location(uint64_t table_id,ObPartitionLocation &location)
|-replica_location.server_ = self_addr_; 

//LOC_DIST_MODE_DISTRIBUTED:// is essentially the server_list of the cluster.
int ObSqlPartitionLocationCache::build_distribute_location(uint64_t table_id, const int64_t partition_id,ObPartitionLocation &location)
|-int ObTaskExecutorCtx::get_addr_by_virtual_partition_id(int64_t partition_id, ObAddr &addr)

//LOC_DIST_MODE_ONLY_RS:// is essentially the location of RS.
int ObSqlPartitionLocationCache::get(const uint64_t table_id,ObIArray<ObPartitionLocation> &locations,const int64_t expire_renew_time,bool &is_cache_hit,const bool auto_update /*=true*/)
|-int ObPartitionLocationCache::get(const uint64_t table_id,ObIArray<ObPartitionLocation> &locations,const int64_t expire_renew_time,bool &is_cache_hit,const bool auto_update /*=true*/)
| |-int ObPartitionLocationCache::vtable_get(const uint64_t table_id,ObIArray<ObPartitionLocation> &locations,const int64_t expire_renew_time,bool &is_cache_hit)
| | |- int renew_vtable_location(const uint64_t table_id,common::ObSArray<ObPartitionLocation> &locations);

We can summarize the location cache of the virtual table in three sentences.

  • The location cache module returns the address of the local server for locally executed virtual tables.
  • The location cache module returns the server_list of the cluster for virtual tables for distributed execution.
  • The location cache module returns the address of the server where the RS is located for the virtual table where the RS is executed.

This is the end of the source code interpretation. Thank you.

0 0 0
Share on


16 posts | 0 followers

You may also like