Assistant Engineer
Assistant Engineer
  • UID634
  • Fans0
  • Follows0
  • Posts44

[MySQL]MySQL 8.0 version update - performance optimization

More Posted time:Oct 26, 2016 13:54 PM
This article mainly summarizes some performance-related changes made in MySQL 8.0. With the release of new minor versions, this article will be updated until the official GA version is available.
MySQL 8.0.0
WL#9387: InnoDB: Group purging of rows by table ID
The earliest form of this issue was a bug reported by Domas, an engineer with Facebook. InnoDB adopts multi-thread for Undo Purge operations, but its undo allocation policy, the direct weighted round-robin distribution, is not very reasonable. It means that if you delete a large amount of data from a table, the N purge threads may generate a large amount of index lock conflicts (such as indexed page merge and reorganization).
In WL#9387, during undo log parsing, grouped storage is achieved by table_id. In distribution, the records of the same table ID are ensured to be distributed to the same thread. (Reference function trx_purge_attach_undo_recs)
Of course, this also means reasonable single table operations with no conflicts, and multi-thread purge won't be used, which is also a disadvantage.
WL#8423: InnoDB: Remove the buffer pool mutex
This can be regarded as a widely-favored improvement. A patch contributed by Percona (bug#75534) mainly splits the huge buffer pool mutex lock of InnoDB to reduce lock conflicts:

Distribute idle block (buf_LRU_get_free_block):
• Obtain from the free list: buf_pool_t::free_list_mutex
• Evict an idle page from unzip_lru/lru and buf_pool_t::LRU_list_mutex is required
Batch scan of LRU (buf_do_LRU_batch): buf_pool_t::LRU_list_mutex
Batch scan of FLUSH_LIST (buf_do_flush_list_batch): buf_pool_t::flush_list_mutex
Add the dirty page to flush_list(buf_flush_insert_into_flush_list): buf_pool_t::flush_list_mutex
After the dirty page is written back to the disk, remove from the flush list (buf_flush_write_complete): buf_pool_t::flush_state_mutex/flush_list_mutex
Evict page from LRU (buf_LRU_free_page):buf_pool_t::LRU_list_mutex, and buf_pool_t::free_list_mutex(buf_LRU_block_free_non_file_page)
buf_flush_LRU_list_batch uses mutex_enter_nowait to obtain the block lock. If the obtain fails, it indicates it is occupied by other sessions and this block will be ignored.
Some variables are changed from being protected by buf_pool_t::mutex to by memory barrier (os_rmb or os_wmb), as shown in the following functions:

Through the lock splitting, the competition for global big locks is reduced, improving the scalability of the buffer pool. This feature has been available in Percona Server since many years ago, but it was not integrated until the MySQL 8.0 version.
WL#7170: InnoDB buffer estimates for tables and indexes
It is mainly used to provide more accurate information for the optimizer, that is, whether the data is in the disk or memory, so that the optimizer can make more accurate calculation of the cost.
A global object (buf_stat_per_index_t) is added to manage all the indexed page counts.
In order to avoid introduction of new global lock overhead, a lock-free hash structure (“include/ut0lock_free_hash.h) is implemented and used to store the index information. The key value is the index ID. (At present, the index ID is unique. But this rule is subject to changes in future.)
Increase the page count:
1. The page is just read into the memory from the disk (buf_page_io_complete > buf_page_monitor)
2. Create a new page (btr_page_create)
Decrease the page count: When a page is released from the LRU (buf_LRU_block_remove_hashed)
The new information_schema.innodb_cached_indexes is added to print the page count of every index in the memory. The structure is as follows:
mysql> show create table INNODB_CACHED_INDEXES\G
*************************** 1. row ***************************
  `SPACE_ID` int(11) unsigned NOT NULL DEFAULT '0',
  `INDEX_ID` bigint(21) unsigned NOT NULL DEFAULT '0',
  `N_CACHED_PAGES` bigint(21) unsigned NOT NULL DEFAULT '0'
1 row in set (0.00 sec)

### Associate with the table name/index name

    tables.name AS table_name,
    indexes.name AS index_name,
    cached.n_cached_pages AS n_cached_pages
    information_schema.innodb_cached_indexes AS cached,
    information_schema.innodb_sys_indexes AS indexes,
    information_schema.innodb_sys_tables AS tables
    cached.index_id = indexes.index_id
    indexes.table_id = tables.table_id;

Related worklog: WL#7168: API for estimates for how much of table and index data that is in memory buffer
Add an option to dynamically shut down deadlock detection, which is very effective for hotspot updates scenarios.
This optimization is contributed by AliSQL. It mainly optimized the InnoDB Redo scalability. Leveraging the dual buffer mechanism, mtr commit is allowed while a log is being written to the disk.
WL#7093: Optimizer provides InnoDB with a bigger buffer
In order to cut down Btree lock occupation, InnoDB adopts a small cache buffer for data reading. For continuous record scans, InnoDB adopts row caching in some scenarios meeting strict conditions to read eight continuous records (and converts the record format into MySQL Format) and stores the data in the thread-private row_prebuilt_t::fetch_cache. Such a route-finding operation will obtain multiple records. After the server layer processes a record, it gets the data from the cache directly without another route-finding attempt, until all the data in the cache is retrieved.
WL#7093 introduced a new interface. Because the optimizer can estimate the number of rows to be read, it is able to provide a more suitable size of row buffer to the storage engine to store the required data. The performance of a large batch of continuous data scans will greatly benefit from the bigger record buffer.
The optimizer determines whether to enable the record buffer. A new class of Record_buffer is added for management. The size of the record buffer cannot exceed 128KB. It is currently hardcoded and cannot be customized.
The function for judging and allocating the record buffer is set_record_buffer and it is passed to the engine layer through the new API (handler::ha_set_record_buffer).
The buffer itself is engine-independent. It is distributed at the server layer and passed to the engine layer through the handler m_record_buffer.
A new interface is added to judge whether Record buffer is supported. It is only supported by InnoDB at present the following conditions should be met (ref set_record_buffer):
1. The access type is not ref, ref_or_null, index_merge, range, index or ALL.
2. It is not a temporary table.
3. It is not a loose index scan.
4. The judgment is made in the InnoDB engine layer ((row_prebuilt_t::can_prefetch_records))
return select_lock_type == LOCK_NONE // Read-only query
                && !m_no_prefetch   // Allow prefetch
                && !templ_contains_blob // No big columns such as BLOB, TEXT, JSON, and GEOMETRY
                && !templ_contains_fixed_point // Not of the spatial data type DATA_POINT
                && !clust_index_was_generated   // User-defined primary key or unique index (used as private key implicitly) is required
                && !used_in_HANDLER // Not accessing through handler
                && !innodb_api // Not accessing through InnoDB memcached or similar
                && template_type != ROW_MYSQL_DUMMY_TEMPLATE //Not a check table
                && !in_fts_query; // Not full-text index query

In InnoDB, when the record buffer is configured, the record buffer provided by the server layer, instead of row_prebuilt_t::fetch_cache, is used.
Introduction to this improvement by the official blog: http://mysqlserverteam.com/mysql-8-0-faster-batch-record-retrieval/
WL#9250: Split LOCK_thd_list and LOCK_thd_remove mutexes
The worklog aims to improve the performance of short-connection scenarios, but THD list operations may lead to fierce lock competition.
The solution is also old school - the partitioning. The chain table thd_list is partitioned into multiple arrays. At present there are eight partitions and the corresponding locks LOCK_thd_remove and LOCK_thd_list locks are also partitioned.