By Wangde and Zhiqian
Image Space is a free service provided by Taobao intelligent image center for merchants to store and manage images. Due to the huge number of images accumulated by Taobao and Tmall users (just imagine how many images are uploaded to Taobao and Tmall every day by merchants and consumers) as well as the astonishingly increasing speed, Image Space undergoes immense pressure in terms of storage capacity and write performance. Especially before every Double 11, merchants update a large number of stock-keeping units (SKUs) that leads to a sharp increase in image data.
Every day, products and comments on Taobao and Tmall contribute to adding a large number of new images.
Once, before the Double 11 Shopping Festival, when most databases of Alibaba still ran on the InnoDB engine, we were evaluating the potential risks related to the shopping festival. When an Image Space developer asked whether the databases had sufficient disk capacity, we confidently said, "Take it easy. We doubled the storage four months ago." However, we were soon proved wrong. In less than five months, the amount of data accumulated was more than what we estimated for six or seven years. Also, as the daily growth soared, the expanded storage was increasingly becoming insufficient.
To address the aforementioned challenge, the most simple and direct way is to scale-out the storage, which is less risky. But, this measure doesn't solve the challenge once and for all. With the current speed of data growth, it is inevitable to scale out the storage again and again in the future. Additionally, it is unacceptable to double the cost whenever space runs out.
The other way to address this challenge is to change the engine. At that time, Alibaba had just developed a proprietary engine named X-Engine. X-Engine focuses on high performance and low cost. Compared with the B+ Tree-based storage, such as InnoDB that suffers from a high space waste on data pages, Log-Structured Merge-Tree-based (LSM Tree-based) X-Engine features compact data storage and effective space utilization. In addition, prefix-compression technology is introduced to further reduce the space usage of compact data.
In-place updates are not necessary for data blocks in X-Engine. Therefore, X-Engine conveniently compresses data using universal compression algorithms, such as zlib, zstd, and snapy. All the data in lower levels of LSM-Tree is compressed by default. Based on the results of many comparison tests, X-Engine chooses the zstd compression algorithm as the default, while retaining support for other algorithms. Additionally, backend compaction keeps deleting invalid entries and reclaiming space. LSM Tree update and deletion operations write new entries, and old entries are regarded as invalid when they are no longer needed.
Owing to the preceding technical features, X-Engine supports such enormous space-saving that 7 times less space was saved after migrating Image Space from InnoDB to X-Engine as shown in the following figure:
Now, you may wonder how we can save so much on costs through data migration from InnoDB to X-Engine.
Image Space is a frequently used application at Taobao and Tmall. For this reason, if the performance does not meet the requirements, X-Engine will not be adopted. Thanks to the lightweight write mechanism of LSM, the advantageous write performance of X-Engine in conjunction with the introduction of the group commit and pipeline transaction processing mechanisms greatly enhanced the concurrency of write processing. On the other hand, LSM is weak in processing read requests, while the tiered structure and the multi-version data generated by append-writes extend the query path of read requests. To address this problem, X-Engine has made many optimizations, such as multi-granularity caches (memtable, block cache, and row cache), bloom filter, and range scan filter (Surf and SIGMOD'18). These improvements effectively reduce the number of point queries, range scans, and prefetches of asynchronous I/O. In a few words, we strove to make X-Engine a storage engine with balanced read-write performance and outstanding cost-effectiveness. For read-write optimizations of X-Engine, read this article: Detailed Explanation of X-Engine SIGMOD.
Verified by database administrators (DBAs) and business developers, X-Engine fully meets the business requirements for read-write performance and latency. Shortly, Taobao Image Space databases were migrated to X-Engine, leading to reduced storage costs.
X-Engine with tiered storage architecture is ideal for business workloads with the following characteristics.
ApsaraDB - October 20, 2020
Alibaba Clouder - November 27, 2018
Kaiwai - September 9, 2019
AlibabaCloud_Network - November 21, 2018
ITDSN - May 11, 2020
Alibaba Clouder - January 22, 2018
An on-demand database hosting service for MySQL, SQL Server and PostgreSQL with automated monitoring, backup and disaster recovery capabilitiesLearn More
An on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilitiesLearn More
An on-demand database hosting service for MySQL with automated monitoring, backup and disaster recovery capabilitiesLearn More
An on-demand database hosting service for SQL Server with automated monitoring, backup and disaster recovery capabilitiesLearn More
More Posts by ApsaraDB