×
Community Blog Optimizing MySQL Composite Index in InnoDB

Optimizing MySQL Composite Index in InnoDB

In this article, we will understand how to use MySQL Composite index in InnoDB from only one example scenario.

In one of my business scenarios, I needed to query the latest five audit data records.

SELECT `id`, `title`
FROM `th_content`
WHERE `audit_time` < 1541984478
    AND `status` = 'ONLINE'
ORDER BY `audit_time` DESC, `id` DESC
LIMIT 5;

The monitoring data showed that the CPU usage at the time exceeded 100% and I noticed that many similar queries had the status of create sort index after running show processlist.

View the table structure

CREATE TABLE `th_content` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(500) CHARACTER SET utf8 NOT NULL DEFAULT '' COMMENT '内容标题',
  `content` mediumtext CHARACTER SET utf8 NOT NULL COMMENT '正文内容',
  `audit_time` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '审核时间',
  `last_edit_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '最近编辑时间',
  `status` enum('CREATED','CHECKING','IGNORED','ONLINE','OFFLINE') CHARACTER SET utf8 NOT NULL DEFAULT 'CREATED' COMMENT '资讯状态',
  PRIMARY KEY (`id`),
  KEY `idx_at_let` (`audit_time`,`last_edit_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

The index has a Composite index with audit_time on the left side and does not have an index related to status.

Analyze the execution logic of the preceding SQL statement:

  • Find all the primary key IDs that are smaller than the audit time from the Composite index. (If 1 million data records have been audited before this time stamp, the primary key IDs of these 1 million data records will be retrieved from the Composite index.)
  • Sort the 1 million IDs. This sorting operation is to optimize I/O operations in the next step of data retrieval in the table, because primary keys that are close may be retrieved with one disk I/O operation.
  • Retrieve the corresponding 1 million data records in the table and scan the retrieved records to filter rows with the status of "ONLINE".
  • Sort the final query results (if 500,000 rows have the status of ONLINE, continue to sort the 500,000 rows).

Although only five rows are required to be retrieved, in the preceding example that I gave, the query actually scans 1 million rows and sorts 500,000 rows in the database memory due to the large data volume.

Therefore, the query performance is extremely low.

I have drawn the following diagram to show the query process in step 1, where the pink parts indicate data rows that need to be retrieved in the table.

I have created and filled in some data according to the index and storage pattern. If you find any errors, feel free to leave a comment and let me know. I hope that this diagram can help you understand how the Composite index is stored and how to use indexes in queries.

1

Improvement Idea 1

It is not always easy to use indexes in range queries. If a Composite index with audit_time and status is added, what improvements can we obtain?

ALTER TABLE `th_content` ADD INDEX `idx_audit_status` (`audit_time`, `status`);
mysql> explain select `id`, `title` from `th_content` where `audit_time` < 1541984478 and `status` = 'ONLINE' order by `audit_time` desc, `id` desc limit 5;
+----+-------------+------------+-------+------------------------------------------+------------------+---------+------+--------+-------------+
| id | select_type | table      | type  | possible_keys                            | key              | key_len | ref  | rows   | Extra       |
+----+-------------+------------+-------+------------------------------------------+------------------+---------+------+--------+-------------+
|  1 | SIMPLE      | th_content | range | idx_at_ft_pt_let,idx_audit_status        | idx_audit_status | 4       | NULL | 209754 | Using where |
+----+-------------+------------+-------+------------------------------------------+------------------+---------+------+--------+-------------+

Details: Because audit_time is a range query, the index on the second column is unnecessary and only audit_time will be used. Therefore, key_len is 4. In the following Improvement idea 2, where the same two fields are included, key_len is 5.

Let's also analyze the execution process after this index is added:

  • Find the row with the largest audit_time value that is lower than this audit time value from the Composite index.
  • Search for other rows with lower audit_time values than that audit time value. Because < audit_time is a range query and the values of the index on the second column is scattered, we need to further search for matching rows one by one and find all the index rows matching the criterion (status='ONLINE') until we obtain the fifth matching row.
  • Retrieve and query the specific data records in the table.

2

In the preceding diagram, pink indicates rows that meet the requirement of the first index column. Continue the query and three records are found on this leaf node. Move to the left and perform query on the previous leaf node. When five matching rows are found, retrieve the corresponding data records in the table.

Implemented Improvements

Because the index contains status values, it is not required to retrieve corresponding data during the process of filtering rows with the status of "ONLINE". The iops is significantly reduced because only five data rows need to be further retrieved in the table.

Disadvantage of This Index

If fives rows scanned in idx_audit_status all have the status of ONLINE, only five rows need to be scanned.

If only four rows of the first 1 million rows scanned in idx_audit_status have the status of ONLINE, 1 million and 1 rows need to be scanned to obtain the required five rows. The number of rows to be scanned by the index is unknown.

Improvement Idea 2

ALTER TABLE `th_content` DROP INDEX `idx_audit_status`;
ALTER TABLE `th_content` ADD INDEX `idx_status_audit` (`status`, `audit_time`);

3

This allows both sorting and data retrieval in the table to be highly efficient.

0 0 0
Share on

zhoumengkang

3 posts | 0 followers

You may also like

Comments

zhoumengkang

3 posts | 0 followers

Related Products

  • Dedicated Host

    DDH is a solution for security and regulation implementation and flexible resource deployment. It offers dedicated resources in Alibaba Cloud for industries such as government departments, enterprises, and financial institutions.

    Learn More
  • Elasticsearch

    Alibaba Cloud Elasticsearch is based on the open-source Elasticsearch engine and provides commercial X-Pack plug-ins. Designed for scenarios such as search and analytics, Alibaba Cloud Elasticsearch features enterprise-level access control, security monitoring, and automatic updates.

    Learn More
  • Managed Security Service

    Identify vulnerabilities and improve security management of Alibaba Cloud WAF and Anti-DDoS and with a fully managed security service

    Learn More
  • DirectMail

    A simple and efficient email service to help you send transactional notifications and batch email

    Learn More