This topic describes the benefits, architectures, and scenarios of PolarDB Archive Database.
Challenges and requirements for archiving historical data
In most cases, new data is read or updated more frequently than historical data. Historical data such as messages or orders generated one year ago is seldom accessed. A large volume of data that is not often accessed or never accessed is stored in your database system as your business develops. This can cause the following issues:
- Historical data and new data are stored in the same database system. This can result in insufficient disk space.
- A large volume of data shares the memory, cache space, and disk IOPS capabilities of the database system. This can deteriorate the database performance.
- The operation to back up a large volume of data requires a long period of time and can fail. Even if the operation is successful, the storage of the backup files is an issue that needs to be solved.
These issues can be resolved by archiving historical data. Historical data can be stored as files by using low-cost storage services, such as Object Storage Service (OSS) or Database Backup (DBS). In real business scenarios, historical data is not completely static. Historical data generated multiple months or years ago may be queried or updated in real time or occasionally. For example, historical data such as historical orders in Taobao or Tmall, historical messages in DingTalk, and historical Cainiao logistics orders can be queried within Alibaba Group.
To resolve the issues related to reads and updates of historical data, a separate database can be used as an archive database that stores only archived data. An archive database must meet the following requirements:
- It must provide a large storage capacity to save online data that is continuously generated. This way, you do not need to worry about the storage capacity.
- It must provide the same interfaces as your online databases. For example, the archive database must support MySQL protocols in the same manner as the online databases. This ensures that your applications can access the online databases and archive database, without the need to modify your code.
- It must be cost-efficient. For example, you can compress data to reduce the consumed disk space and use low-cost storage media to store large volumes of data.
- It must provide read and write capabilities that meet the requirements of low-frequency reads and writes.
MySQL fails to provide a solution that meets all of the previous requirements, though MySQL is the most widely used open source database system in the world. Engines such as TokuDB and MyRocks provide high compression ratios. However, the volume of data that can be stored by using one of these engines is limited by the disk capacity of each physical machine.
Solution: PolarDB Archive Database Edition
To address the preceding challenges and meet the requirements to store archived data, PolarDB provides the Archive Database Edition. This edition provides features that are developed based on the following technological innovations and breakthroughs:
- This edition uses X-Engine as the storage engine. X-Engine is developed by Alibaba Cloud based on the log-structured merge-tree (LSM tree). X-Engine provides powerful data compression capabilities that allow you to use archive databases at a low cost. X-Engine uses the LSM tree and the Zstandard (ZSTD) data compression algorithm to increase the data compression ratio. Compared to InnoDB, X-Engine helps you save up to 70% storage space. For more information about X-Engine, see Introduction to X-Engine. The Archive Database Edition has limits, especially in terms of the compatibility with MySQL, due to the use of X-Engine. For more information, see Limits.
- PolarDB supports online expansion of the storage capacity based on shared distributed storage.
PolarDB connects computing resources and storage resources over a high-speed network
and transmits data by using the remote direct memory access (RDMA) protocol. This
eliminates the bottleneck of I/O performance. X-Engine integrated in PolarDB provides these benefits.
X-Engine is integrated in PolarDB by using the following technological innovations. This enables PolarDB to run in a dual-engine architecture.
- The write-ahead logging (WAL) log streams of X-Engine are combined with the redo log streams of InnoDB. This way, the same log streams and transmission channels are used to support InnoDB and X-Engine. The management logic and the logic of interaction with the shared storage remain unchanged. This architecture can be reused by other engines that are introduced later.
- The I/O module of X-Engine is adapted to Polar File System (PFS) of PolarDB InnoDB. This ensures that InnoDB and X-Engine use the same distributed storage. Backups are accelerated based on the underlying distributed storage.
Compute node architectures of Archive Database
Archive Database supports Archive Database (High Compression Ratio) and Archive Database Cluster Edition. Archive Database (High Compression Ratio) uses the single-node architecture. An Archive Database Cluster Edition cluster provides a primary node and multiple read-only nodes. The primary node processes read and write requests, and an Archive Database cluster contains at least one read-only node. An Archive Database cluster supports the Dedicated and General-purpose specifications.
- Archive Database (High Compression Ratio)
By default, a cluster of the Archive Database (High Compression Ratio) Edition contains one compute node. The compute node is a dedicated node that reduces the costs incurred on PolarProxy and the overheads of synchronizing redo logs. However, in Archive Database scenarios that require large storage capacity and fewer reads and writes, computing resources of the primary node cannot be used up. Therefore, the read capability that is provided by read-only nodes is unnecessary. If the specifications of the primary node and read-only nodes are the same, 50% of the computing resources are wasted. The Archive Database Edition can help reduce storage costs based on the data compression capability of X-Engine. The Archive Database (High Compression Ratio) Edition uses only the primary node to provide services. This eliminates the costs of computing resources offered by read-only nodes. A longer time is required for PolarDB clusters that do not have read-only nodes to recover in disaster recovery scenarios where the primary node stops providing services. However, Archive Database (High Compression Ratio) still ensures 99.95% availability based on the high availability capabilities that are provided by the underlying distributed storage. In most cases, business scenarios do not require high-availability services for low-frequency reads and writes. In these scenarios, data is imported to Archive Database in batches in an asynchronous manner and archive databases are suitable.Read-only nodes are not provided in the single-node architecture of Archive Database. When you perform O&M operations on a node, such as restart the node after the minor version upgrade, the temporary read-only node deployed within the system is upgraded to the primary node to reduce adverse impacts on reads and writes to the Archive Database. The following figure shows the single-node architecture of Archive Database.
- Archive Database Cluster EditionA Archive Database Cluster Edition cluster of the Archive Database (High Compression Ratio) Edition consists of one primary node and at least one read-only node based on shared storage. The primary node can handle read and write requests. A read-only node can handle only read requests. The multi-node architecture of Archive Database Cluster Edition inherits the advantages of Archive Database as well as ensures the high availability of PolarDB clusters. When the primary node in a cluster fails, the cluster can automatically fail over to a read-only node. Then, the read-only node serves as the new primary node. This ensures that the service availability is at least 99.99%. The following figure shows the multi-node architecture of Archive Database Cluster Edition.
- The Archive Database Edition provides a large storage capacity. Based on the 200 TB storage capacity and the compression capability of X-Engine, a PolarDB cluster of the Archive Database Edition can store more than 500 TB raw data. The Archive Database Edition uses a serverless architecture so that the storage capacity can automatically increase as the data volume increases. This way, you do not need to specify the storage capacity when you purchase the PolarDB cluster. You are charged for the actual storage capacity that you use.
- PolarDB Archive Database Edition supports the official MySQL protocols. Compared to other solutions that back up historical data to NoSQL services such as HBase, the Archive Database Edition allows applications to access both online databases and archive databases without the need to modify the code.
- The Archive Database Edition uses the backup capability provided by the underlying distributed storage of PolarDB to back up a large volume of data in a short period. The backup files can be uploaded to and permanently stored in low-cost storage, such as OSS.
- The multi-node architecture of Archive Database Cluster Edition uses X-Engine, which provides powerful data compression capabilities to reduce storage costs and ensure high availability of clusters. When the primary node in a cluster fails, the cluster can automatically fail over to a read-only node. Then, the read-only node serves as the new primary node. This ensures that the service availability is at least 99.99%.
PolarDB Archive Database Edition provides a large storage capacity and can be used to store the historical data of multiple services. This ensures centralized storage and management for all historical data. The Archive Database Edition is suitable for the following scenarios:
- PolarDB Archive Database Edition is used to store cold data of self-managed databases. The self-managed databases can be MySQL, TiDB, PostgreSQL, SQL Server, or other relational databases.
- PolarDB Archive Database Edition is used to store archived data for ApsaraDB RDS for MySQL or PolarDB for MySQL. You can migrate the historical data that is not often accessed to PolarDB for MySQL X-Engine. This way, the storage space of online databases can be released to reduce costs and improve performance.
- PolarDB Archive Database Edition is used as a relational database service that provides a large storage capacity. This is applicable to scenarios in which a large volume of data needs to be written but the data is accessed at a low frequency, such as monitoring logs.
You can use Data Transmission Service (DTS) to continuously migrate data from the online database to PolarDB Archive Database in real time. You can also use Data Management Service (DMS) to periodically import online data to PolarDB Archive Database.
Supported kernel versions
Only PolarDB for MySQL that runs MySQL 8.0 is supported.
Node specifications and pricing
Archive Database supports Archive Database (High Compression Ratio) and Archive Database Cluster Edition. For more information, see Specifications of compute nodes.
For more information about billing rules for Archive Database (High Compression Ratio) and Archive Database Cluster Edition, see Billing rules of compute nodes.
How does the Archive Database (High Compression Ratio) Edition ensure service availability and data reliability when only one primary node is used?
The Archive Database (High Compression Ratio) Edition is a database service that is used to store data for a specific purpose and contains only one compute node. Archive Database (High Compression Ratio) uses new technologies such as computing scheduling within seconds and distributed multi-replica storage to ensure high service availability and high data reliability.