By Baotiao
Some veteran Database Administrators (DBAs) may remember that in the early days, it was rumored that a single table in MySQL should not exceed 5 million rows. If a single table exceeded 5 million rows, sharding was required. Many DBAs are concerned that after a MySQL table becomes large, the B-tree height will become very large, thereby affecting instance performance.
In fact, a B-tree is a very flat tree. The vast majority of B-trees do not exceed 4 layers. Let us look at the actual situation.
We take the common Sysbench table as an example.
CREATE TABLE `sbtest1` (
`id` int NOT NULL AUTO_INCREMENT,
`k` int NOT NULL DEFAULT '0',
`c` char(120) NOT NULL DEFAULT '',
`pad` char(60) NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `k_1` (`k`)
) ENGINE=InnoDB AUTO_INCREMENT=10958 DEFAULT CHARSET=latin1
In InnoDB, there are mainly two types of pages: leaf pages and non-leaf pages.
The format of a Leaf Page is as follows. Each record is mainly composed of a Record Header and a Record Body. The Record Header is mainly used to coordinate with Data Dictionary (DD) information to interpret the Record Body. The Record Body is the main content of the record.


For a table such as Sysbench in a 16 KB page, the number of rows that can be stored in a Leaf Page is approximately:
(16 × 1024 - 200 (Length of some Page Headers, tail, and Directory slot)) / ((4 + 4 + 120 + 60) row data length + 5 (header of each row of data) + 6 (XID) + 7 (Roll Pointer)) = 78.5
The format of a Non-leaf Page is as follows:

Because the Sysbench primary key ID is an INT of 4 bytes, the number of rows that a 16 KB page can store is:
(16 × 1024 - 200) / (5 (Row data Header) + 4 (Cluster Key) + 4 (Child Page Number)) = 1233
Then, the calculation formula for different heights is as follows:
| Height | Non-leaf pages | Leaf pages | Rows | Size |
|---|---|---|---|---|
| 1 | 0 | 1 | 79 | 16KB |
| 2 | 1 | 1233 | 97407 | 19MB |
| 3 | 1234 | 1520289 | 120102831 | 23GB |
| 4 | 1521523 | 1874516337 | 148086790623 | 27.9TB |
You can see from the above that if it is a table such as Sysbench, the height of the B-tree will not exceed 4 layers even if the single table has 140 billion rows and the data size is 27.9 TB. Therefore, there is no need to worry about the increase in B-tree height after the data volume becomes large.
Here, if the primary key of Sysbench is BIGINT, which is 8 bytes, what is the situation?
The number of record rows that can be stored in a leaf page is:
(16 * 1024 - 200) / ((8 + 4 + 120 + 60) + 13) = 78.9
You can see that this leaf page record number does not change much.
The number of records that a non-leaf page can store changes slightly more:
(16 * 1024 - 200)/(5+8+4) = 952
| Height | Non-leaf pages | Leaf pages | Rows | Size |
|---|---|---|---|---|
| 1 | 0 | 1 | 79 | 16KB |
| 2 | 1 | 952 | 75208 | 15MB |
| 3 | 953 | 906304 | 71598016 | 13.8GB |
| 4 | 907257 | 862801408 | 68161311232 | 12.8TB |
You can see from the above that if the primary key of Sysbench is changed to BIGINT, a 4-layer B-tree can store 60 billion rows, which is approximately 12 TB of data.
If a table such as Sysbench is not representative, let us look at some more complex tables, such as the commonly used log tables in the Software as a Service (SaaS) scenarios within Polarbench (a tool used to simulate database usage scenarios in various industries).
CREATE TABLE `prefix_off_saas_log_10` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`saas_type` varchar(64) DEFAULT NULL,
`saas_currency_code` varchar(3) DEFAULT NULL,
`saas_amount` bigint(20) DEFAULT '0',
`saas_direction` varchar(2) DEFAULT 'NA',
`saas_status` varchar(64) DEFAULT NULL,
`ewallet_ref` varchar(64) DEFAULT NULL,
`merchant_ref` varchar(64) DEFAULT NULL,
`third_party_ref` varchar(64) DEFAULT NULL,
`created_date_time` datetime DEFAULT NULL,
`updated_date_time` datetime DEFAULT NULL,
`version` int(11) DEFAULT NULL,
`saas_date_time` datetime DEFAULT NULL,
`original_saas_ref` varchar(64) DEFAULT NULL,
`source_of_fund` varchar(64) DEFAULT NULL,
`external_saas_type` varchar(64) DEFAULT NULL,
`user_id` varchar(64) DEFAULT NULL,
`merchant_id` varchar(64) DEFAULT NULL,
`merchant_id_ext` varchar(64) DEFAULT NULL,
`mfg_no` varchar(64) DEFAULT NULL,
`rfid_tag_no` varchar(64) DEFAULT NULL,
`admin_fee` bigint(20) DEFAULT NULL,
`ppu_type` varchar(64) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `saas_log_idx01` (`user_id`) USING BTREE,
KEY `saas_log_idx02` (`saas_type`) USING BTREE,
KEY `saas_log_idx03` (`saas_status`) USING BTREE,
KEY `saas_log_idx04` (`merchant_ref`) USING BTREE,
KEY `saas_log_idx05` (`third_party_ref`) USING BTREE,
KEY `saas_log_idx08` (`mfg_no`) USING BTREE,
KEY `saas_log_idx09` (`rfid_tag_no`) USING BTREE,
KEY `saas_log_idx10` (`merchant_id`)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8
Because there are variable-length fields here, but most refs have values, we assume the case where VARCHAR fields are fully used.
When all these fields are added together and the Record Header information is additionally calculated, it is approximately 974 bytes.
Then, the number of records that a Leaf Page can store is (16 × 1024 - 200) / 974 = 16.6
For a Non-Leaf Page, as with the previous Sysbench BIGINT, the number of records that can be stored is 952.
| Height | Non-leaf pages | Leaf pages | Rows | Size |
|---|---|---|---|---|
| 1 | 0 | 1 | 16 | 16KB |
| 2 | 1 | 952 | 15232 | 15MB |
| 3 | 953 | 906304 | 14500864 | 13.8GB |
| 4 | 907257 | 862801408 | 13804822528 | 12.8TB |
You can see that even for a table where a single row is approximately 1 KB, if the primary key is still BIGINT, the height of the B-tree will definitely be within 4 layers if the data is within 10 TB. At the same time, within 4 layers, this table can store approximately 13.8 billion rows.
Therefore, scenarios where MySQL stores billions of rows work well.
MySQL still has a best practice: "It is not recommended to use Universally Unique Identifier (UUID) as the primary key." Let us examine why.
For example, regarding the preceding prefix_off_saas_log_10, if the primary key is changed to a 32-byte UUID, assuming the Leaf Page remains unchanged,
The record number stored in the Non-Leaf Page:
(16 * 1024 - 200)/(5+32+4) = 394
| Height | Non-leaf pages | Leaf pages | Rows | Size |
|---|---|---|---|---|
| 1 | 0 | 1 | 16 | 16KB |
| 2 | 1 | 394 | 6304 | 6MB |
| 3 | 395 | 155236 | 2483776 | 2GB |
| 4 | 155631 | 61162984 | 978607744 | 981GB |
| 5 | 61318615 | 24098215696 | 385571451136 | 386TB |
As can be seen from the preceding Table, if UUID is used as the primary key, then for the same 4-layer B-tree, if BIGINT is used, 13.8 billion rows of data can be stored, whereas using UUID allows storing only 970 million rows of data.
However, even if UUID is incorrectly used as the primary key, the depth of the MySQL B-tree will not exceed 5 layers. 5 layers can store up to 380 billion rows, or 386 TB of data. In fact, this is impossible because a MySQL single table actually supports a maximum of 64 TB.
Overall, in MySQL, there is no need to worry that the increase in B-tree height will impact performance after the data volume increases. For Data within 10 TB, the B-tree height will definitely be within 4 layers. After exceeding 10 TB, it will also stay at 5 layers and will not go higher because a MySQL single table supports a maximum of 64 TB.
PolarDB supports many large table production instances. Large tables of 10+ TB are actually very common. I have also seen actual sharing from many Database Administrator (DBA) friends at major companies previously. In fact, if the business table schema design is reasonable, large tables are fully supported. There is no need to follow legacy guidelines that no longer apply.
CloudJump II: Optimization and Implementation of Cloud Databases in Shared Storage Scenarios
ApsaraDB - November 7, 2024
ApsaraDB - January 8, 2025
Alibaba Cloud Community - May 2, 2024
ApsaraDB - May 14, 2020
ApsaraDB - January 30, 2024
ApsaraDB - November 21, 2023
Best Practices
Follow our step-by-step best practices guides to build your own business case.
Learn More
PolarDB for MySQL
Alibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn More
PolarDB for PostgreSQL
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn More
PolarDB for Xscale
Alibaba Cloud PolarDB for Xscale (PolarDB-X) is a cloud-native high-performance distributed database service independently developed by Alibaba Cloud.
Learn MoreMore Posts by ApsaraDB