Taobao historical order records are supported by a PolarDB-X cluster based on X-Engine. This fixes the known issues that are caused by the use of HBase databases, reduces storage costs, and allows users to query order records at all times.

Background information

Taobao is a popular online shopping platform in China that is developed by Alibaba Cloud. Taobao serves hundreds of millions of active users.

The platform provides support for approximately 100 million transactions on physical and virtual commodities every day. Each transaction process involves various phases, such as member information verification, commodity library inquiry, order creation, inventory reduction, discounts, order payment, logistics information update, and payment confirmation. Each phase involves database record creation and status update. The entire process requires hundreds of database transactions, and the entire database cluster performs tens of billions of transactional read and write operations every day. The database team faces the challenge of high storage costs of the increasing volume of data that is generated every day while ensuring the stable performance of the database system.

Order records are the most critical information in the entire transaction process and are required for order queries and dispute resolution. Therefore, the order records must be permanently stored in databases. Since Taobao was founded in 2003, trillions of database records that are related to orders have been generated, and the records have occupied petabytes of disk space.

The following sections describe how Taobao ensures low latency when users query order records without increasing storage costs.

Architecture evolution

The architecture of order record databases has evolved through four phases as traffic increases.

  • Phase 1

    In this phase, traffic was low, and Taobao used an Oracle database to store all order information. Order creation and historical order queries were performed on the same database.

  • Phase 2

    As the volume of historical order data increased, the Oracle database can no longer meet the performance and capacity requirements at the same time. Therefore, the database was split into an online database and a historical database. Historical order records that were generated three months ago were migrated to the historical database. However, in this phase, users can only query historical order records in the previous three months because the historical database contained large volumes of data.

  • Phase 3

    To fix the issues that are related to storage costs and historical order record queries, Taobao migrated historical order records to an HBase database.

    HBase provides primary and indexing tables. Users can query the primary tables for order details and the indexing tables for order IDs based on the IDs of buyers or sellers. However, order records may not be migrated to the historical database in chronological order, and some types of order records are not migrated to the database. As a result, the order records may not be sorted by time. If users query order records, the order records in query results are not strictly listed in chronological order.

  • Phase 4

    The historical database is built in a PolarDB-X cluster that is based on X-Engine. This reduces storage costs and fixes the out-of-time-order issue.

Business pain points

The architecture evolution shows that the business and database teams suffered from the following pain points over the previous 10 years since the historical database was introduced:

  • Storage costs

    A large volume of data is written every day. Low-cost storage is required.

  • Query performance

    Various query features are required to meet specific requirements, such as query by time and query by order type. Databases must support secondary indexes that can ensure data consistency and performance.

  • Query latency

    The query latency must be low to ensure user experience. For example, queries on historical order records of 90 days ago are much fewer than queries on historical order records in the previous 90 days, but still require low latency.

Historical order database solution that is based on X-Engine

The transaction order system has been iterated for 10 years in terms of the architecture, in which the online and historical databases are separated. Most service code is compatible with this architecture, which is also inherited in the solution. The architecture reduces risks that are caused by the reconstruction and migration of service code. Initially, the HBase cluster is replaced with the PolarDB-X cluster that is based on X-Engine.

  • The online database is still deployed in a MySQL cluster that is based on the InnoDB storage engine, and stores only order records for the previous 90 days. The data volume is small, which ensures a high cache hit rate and reduces the read and write latency.
  • Order records that are generated 90 days ago are migrated from the online database to the historical database by using data synchronization and are deleted from the online database.
  • The storage engine of the historical database is changed to X-Engine. The database stores all order records that are generated 90 days ago and is used to perform read and write operations on the order records.

After the new solution is used, the storage costs are the same as the storage fees that are generated when the HBase database is used. The historical database is compatible with the online database, and identical indexes can be created on the two databases. This fixes the out-of-time-order issue. In the historical database, hot data is separated from cold data to reduce the read latency.

Summary

The order records on Taobao are stored in streamline mode. Records that are recently written are frequently accessed at first, and the access frequency decreases over time. X-Engine separates hot data from cold data and is suitable for this access scenario. A single database cluster that is based on X-Engine is sufficient for these access scenarios.

For example, a new or existing business needs to store a large number of streamline records. If hot data and cold data are not separated on the business layer, we recommend that you use the distributed PolarDB-X cluster that is based on X-Engine to ensure scalability without increasing storage costs.

X-Engine is available in Alibaba Cloud. You can purchase X-Engine based on your business requirements. For more information, see Create an ApsaraDB RDS for MySQL instance.