MySQL Memory Allocation and Management (Part III)

By Huaxiong Song, from the ApsaraDB RDS for MySQL kernel team

Part I and Part II of the introduction to MySQL memory allocation and management introduce the memory allocation, usage, and management of MySQL. In practice, it is important to control memory usage and reduce the risk of OOM. Part III will interpret the MySQL memory limit feature. The code is based on version 8.0.28.

1. Changes in the Memory Limit Feature

1.1 Added Variables

Four new variables are added to the new memory limit feature. The following table describes the meanings and values of the variables.

variable_name	Description
global_connection_memory_limit	The memory limit for global connections. [1, 18446744073709551615]
connection_memory_limit	The memory limit for a single connection. [1, 18446744073709551615]
connection_memory_chunk_size	The minimum change unit of the memory statistics, which is used to control the update frequency. [1, 10241024512]. Default value: 8912.
global_connection_memory_tracking	The switch to control the enabling and tracing of global memory counters.

MySQL [(none)]> show variables where variable_name in ('global_connection_memory_limit', 'connection_memory_limit', 'connection_memory_chunk_size', 'global_connection_memory_tracking'); # The memory limit variables.
+-----------------------------------+----------------------+
| Variable_name                     | Value                |
+-----------------------------------+----------------------+
| connection_memory_chunk_size      | 8912                 |
| connection_memory_limit           | 18446744073709551615 |
| global_connection_memory_limit    | 18446744073709551615 |
| global_connection_memory_tracking | OFF                  |
+-----------------------------------+----------------------+
MySQL [(none)]> show status like "Global_connection_memory"; # The memory usage. 
+--------------------------+-------+
| Variable_name            | Value |
+--------------------------+-------+
| Global_connection_memory | 0     |
    +--------------------------+-------+

1.2 Changes in the Data Structure

This feature changes some existing data structures, including adding a member object related to memory usage statistics and modifying PFS_thread, PSI_thread_service_v5, and THD.

class PFS_thread {
...
+ THD *m_cnt_thd // THD for updating memory counters.
...
};
// ------------------------------------------------------------- //
struct PSI_thread_service_v5 {
...
+ set_mem_cnt_THD_v1_t set_mem_cnt_THD;
};
typedef void (*set_mem_cnt_THD_v1_t)(THD *thd, THD **backup_thd);
// ------------------------------------------------------------- //
class THD {
...
+ Thd_mem_cnt *mem_cnt;    // The memory counter object.
+ bool enable_mem_cnt();   // Enable the memory counter.
+ void disable_mem_cnt();  // Close the memory counter.
...
};

You can simply construe THD as the context of a connection. m_cnt_thd is the THD object responsible for updating the memory count information. There may exist problems with THD conversion in operations such as group commit, and this member ensures that the memory statistics are correct during conversion.
The interface set_mem_cnt_THD() is a function that assists in the THD conversion. It sets m_thd and m_cnt_thd respectively, and the values of the two are the same in most cases.
Thd_mem_cnt is a memory counter object. During initialization, the mem_cnt member added to the THD structure is a Thd_mem_cnt_noop (null operation counter). In the prepare phase of connection establishment, the mem_cnt member is created as a Thd_mem_cnt_conn (with real counting function) by calling the enable_mem_cnt(). In the THD destructor phase, the disable_mem_cnt() is called to release the counter. Both Thd_mem_cnt_noop and Thd_mem_cnt_conn are subclasses of Thd_mem_cnt.

// Create a Thd_mem_cnt_conn.
thd_prepare_connection()
|   thd->enable_mem_cnt() {
|   | Thd_mem_cnt *tmp_mem_cnt = new Thd_mem_cnt_conn(this);
|   | mem_cnt = tmp_mem_cnt;
|   }
// Release the Thd_mem_cnt_conn.
～THD()
|   THD::release_resources()
|   |   disable_mem_cnt() {
|   |   |  mem_cnt->flush(); // Clear the memory count information of the current THD and deduct the corresponding global data.
|   |   |  delete mem_cnt;
|   |   }

1.3 Adding Data Structures

As mentioned earlier, the memory counter object Thd_mem_cnt is the most important data structure introduced in WL. It is responsible for saving and updating related memory usage information. In the process of memory limitation, it is its subclass Thd_mem_cnt_conn that plays a role. The following is a further introduction to Thd_mem_cnt_conn.

1.3.1 Data Structure

Before introducing Thd_mem_cnt_conn, we first need to know the introduced counting mode. The counting operation and error handling under different modes are different. The combination of multiple modes can be realized by bitwise operation.

enum enum_mem_cnt_mode {
  MEM_CNT_DEFAULT = 0U, // Do not count.
  MEM_CNT_UPDATE_GLOBAL_COUNTER = (1U << 0), // The global information is updated.
  MEM_CNT_GENERATE_ERROR = (1U << 1), // The OOM error message is generated.
  MEM_CNT_GENERATE_LOG_ERROR = (1U << 2) // The OOM error message is generated and written to the log.
};

The key data structure of Thd_mem_cnt_conn is shown in the preceding figure. mem_count, max_conn_mem, and glob_mem_counter respectively refer to the memory requested by the connection, the maximum memory of the connection (this value is not a specified value and will change with mem_count), and the value passed by the connection to the global counter. The following figure shows the connecting and changing process of parameters. mem_count is related to the actual memory usage of the connection. glob_mem_counter grows in units, and the growth is related to the connection_memory_chunk_size parameter.

Q: Why use glob_mem_counter? Can I directly accumulate the current mem_count to the global memory counter?

A: Changes to global count information must be atomistic. Frequent changes will cause contention for global counters and locks and affect concurrency. The parameter connection_memory_chunk_size ensures that the memory data summarized to the global memory counter each time is an integer multiple of chunk_size. In other words, glob_mem_counter = connection_memory_chunk_size n, and it increases by the size of connection_memory_chunk_size m. Passing a sufficient amount of memory counts to the global counter in advance for each connection can reduce the number of changes to the global counter and avoid frequent changes to global data caused by increasing the amount of scattered memory each time. Global data is written only when mem_count > glob_mem_counter, and the glob_mem_counter is added to an integer multiple of connection_memory_chunk_size.

Therefore, the frequency of global counter updates can be controlled by connection_memory_chunk_size. When connection_memory_chunk_size is set to a large value, the memory information data aggregated to the global counter each time will be large, which will be mistaken for OOM risk and cause OOM error in advance. Therefore, the value of connection_memory_chunk_size should not be set too large. However, this value cannot be set too small, otherwise the global counter will be frequently updated.

The mode parameter is a combination of enum_mem_cnt_mode. For example, the mode of a SUPER user is MEM_CNT_UPDATE_GLOBAL_COUNTER when a connection is established, while the mode of a normal user is MEM_CNT_UPDATE_GLOBAL_COUNTER | MEM_CNT_GENERATE_ERROR | MEM_CNT_GENERATE_LOG_ERROR. This judgment bit is used during memory counting to determine whether to generate an error and kill the connection. In other words, SUPER users are not limited by the limit parameter when performing queries, but common users are affected by these parameters.

static void prepare_new_connection_state(THD *thd) {
    ...
    thd->mem_cnt->set_orig_mode(is_admin_conn ? MEM_CNT_UPDATE_GLOBAL_COUNTER // Set the mode based on identity.
                                                : (MEM_CNT_UPDATE_GLOBAL_COUNTER |
                                                   MEM_CNT_GENERATE_ERROR |
                                                   MEM_CNT_GENERATE_LOG_ERROR));
    ...
}

1.3.2 Key Interfaces

alloc_cnt()

The feature of alloc_cnt() is to update or add the memory count information at the connection and global levels. This function is called along with memory requests and mainly involves the following operations:

Modify mem_counter, max_conn_mem, and glob_mem_counter. max_conn_mem is updated with mem_counter and glob_mem_counter is added to the global memory counter in a lazy manner. Delta is interpolated to the global counter only if max_conn_mem is greater than glob_mem_counter. Since access to the global counter requires a large global lock, the number of changes and locks is reduced.
Error messages are generated, including those at the connection and global levels. generate_error sets the THD::KILL_CONNECTION status for THD based on the error message. Then, the connection is killed at the point where the status is detected. After that, the physical memory decreases, THD is destructed (the counter object on the THD is destructed), and the statistics are updated accordingly.

bool Thd_mem_cnt_conn::alloc_cnt(size_t size) {
  mem_counter += size;
  max_conn_mem = std::max(max_conn_mem, mem_counter);
  // The error message at the connection level.
  if (mem_counter > m_thd->variables.conn_mem_limit) {
      (void)generate_error(ER_DA_CONN_LIMIT, m_thd->variables.conn_mem_limit,
                            mem_counter);
  }
  // The following conditions are specified: enabling global update, enabling memory tracking, and the storage capacity exceeding the capacity prepared in advance.
  if ((curr_mode & MEM_CNT_UPDATE_GLOBAL_COUNTER) &&
      m_thd->variables.conn_global_mem_tracking &&
      max_conn_mem > glob_mem_counter) {
    // Control the update frequency of global counters.
    const ulonglong curr_mem =
        (max_conn_mem / m_thd->variables.conn_mem_chunk_size + 1) *
        m_thd->variables.conn_mem_chunk_size;
    ulonglong delta = curr_mem - glob_mem_counter;
    ulonglong global_conn_mem_counter_save;
    ulonglong global_conn_mem_limit_save;
    {
      MUTEX_LOCK(lock, &LOCK_global_conn_mem_limit);
      global_conn_mem_counter += delta;
      global_conn_mem_counter_save = global_conn_mem_counter;
      global_conn_mem_limit_save = global_conn_mem_limit;
    }
    glob_mem_counter = curr_mem;
    max_conn_mem = std::max(max_conn_mem, glob_mem_counter);
      
    // The error message at the global level.
    if (global_conn_mem_counter_save > global_conn_mem_limit_save) {
      (void)generate_error(ER_DA_GLOBAL_CONN_LIMIT, global_conn_mem_limit_save,
                            global_conn_mem_counter_save);
    }
  }
  return true;
}

free_cnt()

Unlike the way statistics are added, this function has a single function, and only performs subtraction on mem_counters at the connection level. How can the global count information be reduced? Global data is modified in the reset() function, which is also used to reduce global resource contention. Obviously, in most cases, the global count information will lag behind the connection count information.

void Thd_mem_cnt_conn::free_cnt(size_t size) {
  mem_counter -= size;
}

reset()

The free_cnt() operation only subtracts the memory count at the connection level. The global count data is updated in the reset() function, which ensures that the current global count is up-to-date. The function mainly involves the following operations:

Reset the mode. Some previous operations may modify the mode of the counter. For example, in the connection preparation phase. reset() ensures that the counter is in the correct mode before the update to avoid operation errors in users with different permissions, such as SUPER and common users.
Update the three pieces of count data. When the glob_mem_counter is> mem_counter, it indicates that the previous free_cnt operation has reduced mem_counter, and the glob_mem_counter and global data are updated. Otherwise, it indicates that there is connection-level memory information that is not added to the global memory statistics, and the difference also needs to be filled. Insufficient memory may also occur during the reset() process, where it is necessary to call the error generation function to report the error message and set the killed flag for THD.

int Thd_mem_cnt_conn::reset() {
  // Reset the mode.
  restore_mode();
  max_conn_mem = mem_counter;
  // Update the count data.
  if (m_thd->variables.conn_global_mem_tracking &&
      (curr_mode & MEM_CNT_UPDATE_GLOBAL_COUNTER)) {
    ulonglong delta;
    ulonglong global_conn_mem_counter_save;
    ulonglong global_conn_mem_limit_save;
    if (glob_mem_counter > mem_counter) {
      delta = glob_mem_counter - mem_counter;
      MUTEX_LOCK(lock, &LOCK_global_conn_mem_limit);
      assert(global_conn_mem_counter >= delta);
      global_conn_mem_counter -= delta;
      global_conn_mem_counter_save = global_conn_mem_counter;
      global_conn_mem_limit_save = global_conn_mem_limit;
    } else {
      delta = mem_counter - glob_mem_counter;
      MUTEX_LOCK(lock, &LOCK_global_conn_mem_limit);
      global_conn_mem_counter += delta;
      global_conn_mem_counter_save = global_conn_mem_counter;
      global_conn_mem_limit_save = global_conn_mem_limit;
    }
    glob_mem_counter = mem_counter;
    if (is_connection_stage &&
        (global_conn_mem_counter_save > global_conn_mem_limit_save))
      return generate_error(ER_DA_GLOBAL_CONN_LIMIT, global_conn_mem_limit_save,
                            global_conn_mem_counter_save);
  }
  if (is_connection_stage && (mem_counter > m_thd->variables.conn_mem_limit))
    return generate_error(ER_DA_CONN_LIMIT, m_thd->variables.conn_mem_limit,
                          mem_counter);
  is_connection_stage = false;
  return 0;
}

flush()

flush() clears the memory count of the current connection and deducts the global memory count. Before deleting the counter object, you must call this function to ensure that the count information is correct.

void Thd_mem_cnt_conn::flush() {
  max_conn_mem = mem_counter = 0;
  if (glob_mem_counter > 0) {
    MUTEX_LOCK(lock, &LOCK_global_conn_mem_limit);
    global_conn_mem_counter -= glob_mem_counter;
  }
  glob_mem_counter = 0;
}

2. Process of Memory Limit

2.1 Execution Process

Taking the simplest handle_connection as an example (non-thread pool model), the following figure shows the memory limit operations corresponding to the connection establishment, statement execution, and connection closure processes.

...
if (thd_prepare_connection(thd))
  handler_manager->inc_aborted_connects();
else {
  while (thd_connection_alive(thd)) {
    if (do_command(thd)) break;
  }
  end_connection(thd);
}
close_connection(thd, 0, false, false);
...

2.2 Key Functions

Operations such as construction and destruction of the counter and update of the count information are introduced earlier. In this section, the processing logic of memory requests is described. The memory involved in connection establishment and query execution is basically performed through the my_malloc() (such as the structure data and sort buffer) and allocate_from() (temporary tables) interfaces. The corresponding release functions are my_free() and deallocate_from(). The processing logic for the counter in the two memory application manners is the same. In this section, the counter operation logic is further described by taking my_malloc() and my_free() as an example.

2.2.1 my_malloc()

my_malloc() involves the following operations:

Build a memory block header, which stores information such as the size, magic, and psi_memory_key.
Call the pfs_memory_alloc_vc() interface in the PSI_thread_service_v5 service to assign a value to the key. The counter is updated in this interface.

void *my_malloc(PSI_memory_key key, size_t size, myf flags) {
  // Call malloc to allocate a memory block that contains the header information.
  my_memory_header *mh;
  size_t raw_size;
  raw_size = PSI_HEADER_SIZE + size;
  mh = (my_memory_header *)my_raw_malloc(raw_size, flags);
  // Initialize the data structure of the header and call the pfs_memory_alloc_vc to assign a value to head->key.
  if (likely(mh != nullptr)) {
    void *user_ptr;
    mh->m_magic = PSI_MEMORY_MAGIC;
    mh->m_size = size;
    // Call the service.
    mh->m_key = PSI_MEMORY_CALL(memory_alloc)(key, raw_size, &mh->m_owner);
    user_ptr = HEADER_TO_USER(mh);
    MEM_MALLOCLIKE_BLOCK(user_ptr, size, 0, (flags & MY_ZEROFILL));
    return user_ptr;
  }
  return nullptr;
}

2.2.2 pfs_memory_alloc_vc()

This function is the entry for adding count data. It involves the following work:

Find the PFS_memory_class corresponding to the key.
Obtain PFS_thread to update the statistics with the counter enabled. PFS_memory_key types enable counter objects whenever the register memory_class logic is executed.
Returns the value of the key that is marked by PSI_MEM_CNT_BIT(1 << 31) if the counter is enabled.

PSI_memory_key pfs_memory_alloc_vc(PSI_memory_key key, size_t size,
                                    PSI_thread **owner) {
  // Obtain the PFS_memory_class corresponding to the key.
  PSI_memory_key result_key = key;
  ...
  PFS_memory_class *klass = find_memory_class(key);
  // Enable the thread monitoring dimension and non-global monitoring mode.
  if (flag_thread_instrumentation && !klass->is_global()) {
    PFS_thread *pfs_thread = my_thread_get_THR_PFS();
    // Determine whether to enable the counter during the PFS_memory_class initialization phase.
    if (klass->has_memory_cnt()) {
      if (pfs_thread->m_cnt_thd != nullptr && pfs_thread->mem_cnt_alloc(size)) // The entry for adding memory information.
        // Flag bit to mark whether the key has been processed by the counter.
        result_key |= PSI_MEM_CNT_BIT;
    }
    // Update statistics.
    ...
    *owner_thread = pfs_thread;
  } else {
    // Update statistics.
    ...
    *owner_thread = nullptr;
  }
  return result_key;
}

2.2.3 my_free() && pfs_memory_free_vc()

Contrary to the preceding two functions, my_free() first calls pfs_memory_free_vc() to release the key, including the deduction and update of the counter information, and then releases the entire memory area, including the header.

Q: Which memory will be counted by the counter?

A: In psi_memory_key.cc, the PSI_FLAG_MEM_COLLECT flag bit introduced by the new feature labels the memory that needs to be limited in the all_server_memory array.

3. A Simple Test of Memory Limit

3.1 Test Preparation

A common user account for RDS_test is created.
Big data records are created.
A smaller connection_memory_limit is set.

create user RDS_test identified by 'RDS_test';
grant select on test.* to RDS_test;
use test;
create table t(id int primary key, c longtext);
insert t values (1, lpad('RDS', 6000000, 'test'));
set global connection_memory_limit=1024 * 1024 * 2;

3.2 Test Content

Executed by a common user:

MySQL [test]> show variables like "connection_memory_limit";
+-----------------------------------+--------------+
| Variable_name          |  Value   |
+-----------------------------------+--------------+
| connection_memory_limit |  2097152 |
+------------------------------------+-------------+
MySQL [test]> select count(c) from t group by c;
ERROR 4082 (HY000): Connection closed. Connection memory limit 2097152 bytes exceeded. Consumed 7079568 bytes.

Executed by a SUPER user:

MySQL [test]> show variables like "connection_memory_limit";
+----------------------------------+------------+
| Variable_name          | Value   |
+----------------------------------+------------+
| connection_memory_limit | 2097152 |
+----------------------------------+-------------+
MySQL [test]> select count(c) from t group by c;
+----------+
| count(c) |
+----------+
|      1 |
+----------+

3.3 Test Result

After introducing this feature, the memory usage of common users is limited. When exceeding the limit value, the connection is directly killed. However, the memory usage of SUPER users is still not limited.

4. Summary and Outlook

The new memory limit feature brought by MySQL 8.0.28 can be summarized as follows:

• For SUPER users and common users, the memory limit differs. The former is not limited and may face OOM errors.

• Global data updates lag behind connection data updates, reducing contention for global locks. During the memory statistics increase phase, connection_memory_chunk_size controls the update frequency. During the memory statistics decrease phase, connection and global information update through free_cnt() and reset(), respectively.

• If a large connection_memory_chunk_size is set, it may easily report an OOM error in advance and, therefore, kill the connection. A small connection_memory_chunk_size may lead to frequent access to the global lock.

• Memory statistics and throttling operations rely on PFS_thread, and counter data updates first pass through this object.

Combining Part I and Part II, it is easy to see that InnoDB's memory is mostly controllable. Most of the memory is controlled by a specified size, and additional memory generated can also be roughly inferred. During MySQL service, many memory losses occur that cannot be accurately measured. If they are not well-controlled, OOM errors may occur. The official introduction of memory usage limits at the connection and global levels provides optimization to reduce OOM risks. However, the OOM issue cannot be entirely avoided and requires further optimization. In the latest MySQL 8.0.31 release notes, the official added some monitoring information to the memory limit. It is believed that new features and improvements will be introduced in the future, and the RDS MySQL kernel team will continue to optimize the OOM problem.

Additionally, other MySQL structures, such as net_buffer, join_buffer, and sort_buffer, occupy a significant amount of memory during operation. Moreover, many temporary memories for recovery and initialization are generated during server startup, requiring further research and discussion.

Community

MySQL Memory Allocation and Management (Part III)

1. Changes in the Memory Limit Feature

1.1 Added Variables

1.2 Changes in the Data Structure

1.3 Adding Data Structures

1.3.1 Data Structure

1.3.2 Key Interfaces

alloc_cnt()

free_cnt()

reset()

flush()

2. Process of Memory Limit

2.1 Execution Process

2.2 Key Functions

2.2.1 my_malloc()

2.2.2 pfs_memory_alloc_vc()

2.2.3 my_free() && pfs_memory_free_vc()

3. A Simple Test of Memory Limit

3.1 Test Preparation

3.2 Test Content

3.3 Test Result

4. Summary and Outlook

References

Read previous post:

Read next post:

ApsaraDB

You may also like

Comments

ApsaraDB

Related Products

ApsaraDB RDS for SQL Server

ApsaraDB RDS for MySQL

Tair

PolarDB for MySQL