PolarDB for MySQL provides the following editions: Cluster Edition, Single Node, and Archive Database. This topic describes the benefits, architecture, and applicable scenarios of PolarDB of the Archive Database Edition.
Challenges and requirements for archiving historical data
In most cases, new data is read or updated more frequently than historical data. Historical data such as messages or orders generated one year ago is seldom accessed. A large volume of data that is not often accessed or never accessed is stored in your database system as your business develops. This can cause the following issues:
- Historical data and new data are stored in the same database system. This can result in insufficient disk space.
- A large volume of data shares the memory, cache space, and disk IOPS capabilities of the database system. This can deteriorate the database performance.
- The operation to back up a large volume of data requires a long period of time and can fail. Even if the operation is successful, the storage of the backup files is an issue that needs to be solved.
These issues can be resolved by archiving historical data. Historical data can be stored as files by using low-cost storage services, such as Object Storage Service (OSS) or Database Backup (DBS). In real business scenarios, historical data is not completely static. Historical data generated multiple months or years ago may be queried or updated in real time or occasionally. For example, historical data such as historical orders in Taobao or Tmall, historical messages in DingTalk, and historical Cainiao logistics orders can be queried within Alibaba Group.
To resolve the issues related to reads and updates of historical data, a separate database can be used as an archive database that stores only archived data. An archive database must meet the following requirements:
- It must provide a large storage capacity to save online data that is continuously generated. This way, you do not need to worry about the storage capacity.
- It must provide the same interfaces as your online databases. For example, the archive database must support MySQL protocols in the same manner as the online databases. This ensures that your applications can access the online databases and archive database, without the need to modify your code.
- It must be cost-efficient. For example, you can compress data to reduce the consumed disk space and use low-cost storage media to store large volumes of data.
- It must provide read and write capabilities that meet the requirements of low-frequency reads and writes.
MySQL fails to provide a solution that meets all of the previous requirements, though MySQL is the most widely used open source database system in the world. Engines such as TokuDB and MyRocks provide high compression ratios. However, the volume of data that can be stored by using one of these engines is limited by the disk capacity of each physical machine.
Solution: PolarDB of the Archive Database Edition
To address the preceding challenges and meet the requirements to store archived data, PolarDB provides the Archive Database Edition. This edition provides features that are developed based on the following technological innovations and breakthroughs:
- This edition uses X-Engine as the storage engine. X-Engine is developed by Alibaba Cloud based on the log-structured merge-tree (LSM tree). X-Engine provides powerful data compression capabilities that allow you to use archive databases at a low cost. X-Engine uses the LSM tree and the Zstandard (ZSTD) data compression algorithm to increase the data compression ratio. Compared to InnoDB, X-Engine helps you save up to 70% storage space. For more information about X-Engine, see X-Engine overview.
- By default, a cluster of the Archive Database Edition contains one compute node. The compute node is a dedicated node that reduces the costs incurred on PolarProxy and the overheads of synchronizing redo logs. The Archive Database Edition has limits, especially in terms of the compatibility with MySQL, due to the use of X-Engine. For more information, see Limits.
- PolarDB supports online expansion of the storage capacity based on shared distributed storage.
PolarDB connects computing resources and storage resources over a high-speed network and
transmits data by using the remote direct memory access (RDMA) protocol. This eliminates
the bottleneck of I/O performance. X-Engine integrated in PolarDB provides these benefits.
X-Engine is integrated in PolarDB by using the following technological innovations. This enables PolarDB to run in a dual-engine architecture.
- The write-ahead logging (WAL) log streams of X-Engine are combined with the redo log streams of InnoDB. This way, the same log streams and transmission channels are used to support InnoDB and X-Engine. The management logic and the logic of interaction with the shared storage remain unchanged. This architecture can be reused by other engines that are introduced later.
- The I/O module of X-Engine is adapted to Polar File System (PFS) of PolarDB InnoDB. This ensures that InnoDB and X-Engine use the same distributed storage. Backups are accelerated based on the underlying distributed storage.
Single compute node architecture of Archive Database
A PolarDB cluster of the Cluster Edition Edition consists of one primary node and at least one read-only node based on shared storage. The primary node can process read and write requests. A read-only node can process only read requests. In scenarios for which the Archive Database Edition is suitable, a large storage capacity is required and the number of read and write requests is small. Therefore, the computing resources of the primary node are sufficient to process all read and write requests and the capabilities offered by read-only nodes are unnecessary. If the specifications of the primary node and read-only nodes are the same, 50% of the computing resources are wasted.
The Archive Database Edition can help reduce storage costs based on the data compression capability of X-Engine. The Archive Database Edition uses only the primary node to provide services. This eliminates the costs of computing resources offered by read-only nodes. A longer time is required for PolarDB clusters that do not have read-only nodes to recover in disaster recovery scenarios where the primary node stops providing services. However, Archive Database still ensures 99.95% availability based on the high availability capabilities that are provided by the underlying distributed storage. In most cases, business scenarios do not require high-availability services for low-frequency reads and writes. In these scenarios, data is imported to Archive Database in batches in an asynchronous manner and archive databases are suitable.
Read-only nodes are not provided in the single-node architecture of the archive database. When you perform O&M operations on a node, such as restarting the node after a minor version update, the temporary read-only node deployed within the system is promoted to the primary node to reduce adverse impacts on reads and writes to the Archive Database Edition.
- The Archive Database Edition provides a large storage capacity. Based on the 200 TB storage capacity and the compression capability of X-Engine, a PolarDB cluster of the Archive Database Edition can store more than 500 TB raw data. The Archive Database Edition uses a serverless architecture so that the storage capacity can automatically increase as the data volume increases. This way, you do not need to specify the storage capacity when you purchase the PolarDB cluster. You are charged for the actual storage capacity that you use.
- PolarDB of the Archive Database Edition supports the official MySQL protocols. Compared to other solutions that back up historical data to NoSQL services such as HBase, the Archive Database Edition allows applications to access both online databases and archive databases without the need to modify the code.
- The Archive Database Edition uses the backup capability provided by the underlying distributed storage of PolarDB to back up a large volume of data in a short period. The backup files can be uploaded to and permanently stored in low-cost storage, such as OSS.
PolarDB of the Archive Database Edition provides a large storage capacity and can be used to store the historical data of multiple services. This ensures centralized storage and management for all historical data. The Archive Database Edition is suitable for the following scenarios:
- PolarDB of the Archive Database Edition is used to store cold data of self-managed databases. The self-managed databases can be MySQL, TiDB, PostgreSQL, SQL Server, or other relational databases.
- PolarDB of the Archive Database Edition is used to store archived data for ApsaraDB RDS for MySQL or PolarDB for MySQL. You can migrate the historical data that is not often accessed to PolarDB for MySQL X-Engine. This way, the storage space of online databases can be released to reduce costs and improve performance.
- PolarDB of the Archive Database Edition is used as a relational database service that provides a large storage capacity. This is applicable to scenarios in which a large volume of data needs to be written but the data is accessed at a low frequency, such as monitoring logs.
You can use Data Transmission Service (DTS) to continuously migrate data from online databases to PolarDB of the Archive Database Edition in real time. You can also use Data Management (DMS) to periodically import online data to PolarDB of the Archive Database Edition.
Supported kernel versions
Only PolarDB for MySQL that runs MySQL 8.0 is supported.
Node specifications and pricing
Archive Database supports seven dedicated node specifications. For more information, see Billable items.
How does the Archive Database Edition ensure service availability and data reliability when only one primary node is used?
The Archive Database Edition is a database service that is used to store data for a specific purpose and contains only one compute node. Archive Database uses new technologies such as computing scheduling within seconds and distributed multi-replica storage to ensure high service availability and high data reliability.