All Products
Search
Document Center

Cost-effective historical data storage in ApsaraDB RDS databases

Last Updated: Oct 20, 2021

Background information

Due to the development of the mobile Internet, a large amount of business data is generated every day. Business growth brings an exponential increase in data volume, but the frequency of accessing historical data over time decreases. If all the data is stored in relational databases, a series of issues may occur.

Challenges:

  • Increase in storage costs: Storage costs are proportional to the data volume. The exponential increase in the data volume results in an exponential increase in storage costs.

  • Decrease in query performance: The query performance significantly decreases after the storage of a single instance exceeds 1 TB.

  • Complex O&M: If you use sharding to mitigate performance degradation that is caused by the increase in the data volume, you must pay for huge O&M and development costs.

Requirements:

  • Controllable storage costs: The storage costs of historical data are one-tenth the storage costs of online data.

  • Automatic scaling: The automatic horizontal scaling of compute and storage capabilities frees you from worrying about the O&M difficulties that are brought by sharding.

  • Low costs of changing schemas: Schemas can be changed in a quick manner, or dynamic schemas can be used. This way, you do not need to go through time-consuming procedures to change the schemas of archive databases.

  • Low modification costs: SQL can be used to access data.

  • Real-time query requirements: In scenarios such as querying bills and chat history, the response time (RT) of querying historical data must be close to that of querying online data.

  • Data analysis requirements: Although historical data is accessed at a low frequency, mining and analysis of full data are necessary in some scenarios, such as generating Alipay annual bills.

ApsaraDB for Lindorm (Lindorm, previously ApsaraDB for HBase Performance-enhanced Edition) can meet diverse requirements, such as low storage costs, simple O&M, automatic scaling, and stable performance. It can work with relational databases to offer an optimal archive database solution that allows you to query historical data in real time at low costs.

Solution architecture

  • Lindorm Tunnel Service (LTS) provides data synchronization services. You can integrate LTS with relational databases such as MySQL to synchronize full data and incremental data. LTS also provides enterprise-grade data synchronization capabilities, such as multi-table migration, data changes, and the detection of DDL changes. This helps you migrate data in an easy and efficient manner.

  • Lindorm allows you to store large amounts of data at a low cost of CNY 0.11/GB/month. Lindorm provides the automatic scaling feature and supports the pay-as-you-go billing method. Lindorm also provides the capabilities of processing multi-model data by using multiple machines. This helps you meet data storage requirements in various scenarios. Lindorm can also seamlessly connect to open source products in analytics ecosystems, such as Apache Spark, Apache Hive, Apache Flink, and Presto, to meet the needs for complex data analysis and maximize the value of data.

Benefits

Ease of use

  • You can configure data migration settings in a visualized manner within a few minutes.

  • You can use the solution to synchronize full data and incremental data. This helps you minimize your adoption costs.

  • The solution provides diverse capabilities, such as multi-table migration and data changes, to help you merge tables and change field combinations in an easy manner.

  • Comprehensive monitoring and alerting ensure stable performance for data synchronization.

Ultimate cost-effectiveness

  • Lindorm offers a capacity-optimized specification for storage. The storage costs are as low as CNY 0.11/GB/month. This allows you to use the built-in buffering and acceleration layer to ensure high performance of real-time queries at minimal costs. This also offers an optimal storage solution for real-time archive databases.

  • In terms of performance, the Lindorm wide table engine has made various performance breakthroughs in throughput and latency. Its benchmark performance is seven times that of the open source HBase service. For more information, see Test results. The Lindorm time series engine incorporates diverse innovative architecture designs for high performance. Its benchmark performance ranks the first in the list released by the China Academy of Information and Communications Technology. Its benchmark performance significantly outperforms other time series databases.

  • Lindorm supports automatic separation of cold and hot data. For scenarios in which hot data turns into cold data over time, such as monitoring, chat history, and transaction bills, Lindorm can automatically identify and separate hot data from cold data. After Lindorm separates the hot data from the cold data, the hot data is stored in high-performance storage media and the cold data is stored in low-cost storage media. The unit price difference between the two types of storage media is up to 10:1. You do not need to modify your code to have read or write access to tables in which the cold data is separated from the hot data. In addition, the access to the hot data is accelerated to improve performance.

  • Lindorm supports adaptive compression. The system automatically chooses a compression algorithm, such as dictionary encoding, prefix encoding, delta encoding, or entropy encoding, based on the types and characteristics of data. Compared with general algorithms in the industry, the adaptive compression feature of Lindorm improves the compression ratio by 10% to 30%.

Cloud native and high scalability

  • The compute-storage separation architecture of Lindorm allows you to separately scale compute and storage resources to minimize the waste of resources.

  • Lindorm also provides serverless services that allow you to perform instant scaling based on your business requirements and use the pay-as-you-go billing method. Lindorm Serverless is built based on multi-tenant data isolation, intelligent scheduling, and elastic Infrastructure as a Service (IaaS). It provides enterprise-grade service level agreement (SLA) guarantee that meets the availability requirements of most of the internal services of enterprises. This reduces the O&M workloads of front-line employees in terms of capacity management and eliminates the stability risks that are caused by traffic fluctuations.

Multi-model capabilities and data retrieval

  • Lindorm is compatible with major open source standard APIs such as HBase, Phoenix (SQL), and Cassandra (CQL) APIs to minimize modification costs. In addition, it offers a wide range of features, such as global secondary indexes, multi-dimensional retrieval, dynamic columns, and time-to-live (TTL) features. Lindorm is suitable for various scenarios, such as metadata, orders, bills, profiles, social media, feeds, and logs.

  • Lindorm allows you to enable its search engine with a few clicks. The search engine is compatible with the standard APIs of the open source Solr platform. Lindorm provides capabilities such as full-text indexing, aggregation computing, and complex multi-dimensional queries to accelerate data retrieval and meet your needs for real-time complex analysis.

Big data ecosystem

  • Lindorm can seamlessly connect to open source products such as Apache Spark, Apache Hive, Apache Flink, and Presto in the big data ecosystem. Lindorm supports multiple connection methods, such as calling API operations to access data and reading data from files. Lindorm can meet your needs for analyzing large amounts of data in an easy and efficient manner.

Typical use cases

  • In this use case, the transaction records of a user are written to the MySQL databases by using applications. LTS synchronizes the data in the MySQL databases to the Lindorm database in real time. The transaction records of the previous three months were constantly changing and are stored in the MySQL databases for queries. The transaction records that were generated more than three months ago are stored in the Lindorm database for queries. The storage costs drop by more than 90% because the historical transaction records are stored in the storage that uses the capacity-optimized specification.

  • In some scenarios, you may need to specify complex conditions when you perform real-time data queries. For example, when you run queries, you may combine multiple conditions, such as the time, location, amount, and transaction remarks. The Lindorm search engine offers diverse features, such as full-text indexing, aggregation computing, and complex multi-dimensional queries. These features provide an easy way for you to meet the preceding requirements without the need to modify your service code.

  • LTS synchronizes the bill data in the Lindorm database to an offline computing platform such as Apache Spark or MaxCompute for computing. Operations data and reports are generated for analysis based on your business needs. Then, the operations data is returned to the Lindorm database for real-time queries.

Instructions

Technical analysis of Lindorm: An affordable data storage solution