By Baotiao
In many teams, until today, you can still hear a statement: MySQL single table should not exceed 5 million rows. If it exceeds, MySQL will have problems. This has almost become an "industry norm."
But if we stop for a moment and ask: is this conclusion the upper limit of the database capabilities, or is it engineering experience from a certain historical stage? Many discussions actually confuse these two things from the beginning.
I want to put the conclusion up front: In the modern MySQL system, the limit of 5 million rows for a single table has long been invalid. Moreover, in quite a few business scenarios, the capability of a standalone MySQL is already very strong. You do not even need to introduce sharding, nor do you need to rush to use AnalyticDB.
Many people think that "5 million rows per single table" is just a rumor, but in fact, it does have a very clear source.
In the early years, Alibaba internally published a specification with significant impact, early called "Java Programming Specification", later better known as "Alibaba Java Developer Manual", and some people are accustomed to calling it "Programming Military Rules". In this specification, a similar suggestion was explicitly mentioned: single table data volume should not be too large, and splitting needs to be considered if it exceeds a certain size.
It needs to be emphasized that this specification is not "database theory," but under the hardware conditions of that time and MySQL version background, engineering experience summarized for large-scale Internet businesses. It solved the real problems of that year, rather than drawing an eternal red line for database capabilities.
If the time is turned back to more than ten years ago, that cautious or even conservative attitude is completely understandable.
First, hardware conditions were very limited. The main storage medium was still HDD. Random IO was at the millisecond level, and input/output operations per second (IOPS) were also very low. Once the table became large, the B+Tree height increased. The extra one or two random IOs during the query procedure were clearly visible in terms of actual latency.
Second, the capability of MySQL itself was far inferior to today. In version 5.5 and earlier versions, Online DDL almost did not exist. Performing DDL on a large table could easily result in a locked table. In mild cases, it would impact the business. In severe cases, it would directly lead to accidents. From the operations perspective, controlling the single table size was essentially controlling the threat.
Third, storage and backup also need to be considered. In that era, disk capacity was limited, and the backup window was long. A large table meant longer backup time and a higher probability of failure.
In summary, 5 million rows is not a "calculated" number, but a relatively conservative engineering conclusion obtained after superimposing multiple realistic constraints.
A concern that has been repeatedly raised for a long time is: After the table becomes large, will the B+Tree become "very high."
This concern is not logically wrong, but in today's discussion, a very critical fact is often missing: even in large table scenarios, InnoDB 's B+Tree height is still very limited.
You can directly reference the conclusion of an article that has performed detailed deduction and actual testing on the InnoDB B+Tree structure to explain this problem.
In this article, the author provides a very clear calculation result based on the InnoDB page size, index record size, and fan-out factor: even with data sizes of tens of millions, hundreds of millions, or even larger, the height of common index B+Tree remains stable at 3 to 4 layers.
This result is very important because it directly refutes the intuitive judgment that "a large table inevitably leads to a very high B+Tree." In other words, traversing from the root node to the leaf node requires only a very few pointer jumps. Even if these accesses actually occur on the disk, the increased Count of I/O operations is extremely limited.
Combined with the actual latency of today's NVMe SSDs, a single random I/O is already in the range of tens of microseconds. Therefore, the additional cost brought by one more layer of B+Tree is almost negligible in the overall query path. This is also why the B+Tree height has not become a performance bottleneck in real large table scenarios.
If the evolution of MySQL kernel capabilities has solved some problems, what truly makes "large tables no longer daunting" is the overall upgrade of hardware.
The popularity of NVMe SSDs and PCIe buses has brought about orders of magnitude changes in the latency and throughput capabilities of random I/O. The memory size has changed from tens of GB in the past to hundreds of GB or even TBs today. Under such conditions, many queries in OLTP scenarios occur entirely in memory.
In addition, with the Online DDL capability that has gradually matured since MySQL 5.6 and the upgrade of physical backup tools such as xtrabackup, TB-level or even larger tables have long been normal operations at the operations level.
When viewed within today's MySQL system, the underlying hardware is also constantly updating.
The latest generation of NVMe SSDs, combined with PCIe 5.0, can already provide sequential bandwidth of tens of GB/s. On the machine used by the author for the PolarDB TPCC benchmark, because multiple NVMe SSDs were configured, the single-machine sequential I/O bandwidth could reach 30+ GB/s. Under such bandwidth conditions, the single table size is no longer "stuck" by the local disk capacity in terms of performance.
Even if the single-machine local disk cannot accommodate all data, you can expand the single table size by using disks. Although there will be a certain loss of performance in extreme scenarios, for the vast majority of business payloads, this loss is controllable and far less exaggerated than imagined.
As far as the author knows, in real production environments, a large number of users are already using single tables at the level of tens of TB, and the business continues to run stably. What this class of system truly needs to follow is often no longer "whether the table is too big," but whether the SQL design, index policy, and overall payload model are reasonable.
A frequently ignored fact is that when many teams discuss sharding, they have not actually used the capabilities of single-machine MySQL to the limit.
In modern hardware and ApsaraDB environments, the throughput capability of single-machine MySQL in OLTP scenarios is not weaker than many people imagine. Many so-called requirements that "need to use AnalyticDB" are essentially just simple Statistics and aggregation, which can be completely solved within MySQL through reasonable index and SQL design.
The result is often that the system prematurely introduces sharding, or even AnalyticDB, and the complexity rises rapidly. SQL becomes difficult to maintain, transactional consistency problems begin to emerge, and the real performance bottleneck has never been verified.
The statement that a single table should be limited to 5 million rows is not a rumor; it has a very clear historical background and realistic basis. However, the problem is that it describes the past world, not today's world.
Today, whether it is MySQL itself, the underlying hardware, or the overall architecture of ApsaraDB, fundamental changes have taken place. For the vast majority of businesses, instead of prematurely introducing sharding and complex data architectures, it is better to first fully utilize the capabilities of standalone MySQL.
In many cases, the best architecture design is not paying in advance for the size ten years from now, but using the simplest solution today to stably solve actual problems.
Tair (Redis® OSS-Compatible) Imperceptible Switchover Technology Ensures Business Continuity
ApsaraDB - February 10, 2026
ApsaraDB - August 4, 2023
Alibaba Cloud Storage - April 25, 2019
Alibaba Clouder - August 6, 2019
Alibaba Cloud Serverless - July 5, 2023
Alibaba Clouder - June 3, 2019
Best Practices
Follow our step-by-step best practices guides to build your own business case.
Learn More
AnalyticDB for MySQL
AnalyticDB for MySQL is a real-time data warehousing service that can process petabytes of data with high concurrency and low latency.
Learn More
PolarDB for MySQL
Alibaba Cloud PolarDB for MySQL is a cloud-native relational database service 100% compatible with MySQL.
Learn More
Bastionhost
A unified, efficient, and secure platform that provides cloud-based O&M, access control, and operation audit.
Learn MoreMore Posts by ApsaraDB