The rapid development of information technologies has witnessed an increased number of types of data generated in business production for all walks of life. The data includes structured business data, such as metadata, operational data, equipment or system measurement data, and logs. The data also includes semi-structured business operational data, logs, images, and files. When you design an IT architecture for a traditional solution, you must use different storage and analysis technologies to store, query, and analyze multiple types of data. The following figure shows an example of an architecture for a traditional solution.
This kind of technical solutions is a typical technological fragmentation solution. Multiple databases are required to separately process different types of data. This kind of solutions has the following drawbacks:
Numerous complex technical components are involved.
The selection of technologies is complex.
Links for data access and data synchronization are long.
These drawbacks impose high requirements on technical personnel and result in long business rollout cycles, high failure rates, and high maintenance costs. This can cause huge problems for the information system construction. Technological fragmentation also leads to a fragmented technical architecture. This is not conducive to the long-term evolution and development of technical architectures and eventually disables technical architectures from meeting business development requirements. For example, you must configure cross-zone high availability, enable global synchronization, or reduce storage costs to meet the requirements of your ever-growing business. In this case, each technical component must evolve and develop independently. This requires a time-consuming and labor-intensive effort. As the business grows, more types of data are generated. The demand for differentiated processing of different kinds of data increases day by day. This exacerbates data fragmentation.
A major problem in the development of information technologies is that the ever-increasingly diverse workload types result in large numbers of types of data. This requires more complex and expensive storage architectures to meet the requirements for differentiated data processing. The wide application of new information technologies such as 5G, Internet of Things (IoT), and Internet of Vehicles (IoV) further intensifies this problem. To solve this problem, Alibaba Cloud launched ApsaraDB for Lindorm (Lindorm) to meet the unified storage, query, and analysis needs of multi-model data. Compared with traditional solutions, Lindorm can significantly simplify the storage architecture, improve system stability, and reduce construction costs. The following figure shows the Lindorm system.
Lindorm innovatively uses a cloud native multi-model integration architecture in which storage is decoupled from computing. It meets the demands for resource decoupling and auto scaling in the era of cloud computing. Lindorm uses a cloud native storage engine named LindormStore as a unified storage base. Multiple dedicated engines are built based on the base, including the wide table engine, the time series engine, the search engine, and the file engine. Lindorm runs on the multi-model engine system and allows you to use SQL to interact with your Lindorm instances and run associated queries across models. Lindorm is also compatible with the open standards of the following open source software: Apache HBase, Apache Phoenix, Apache Cassandra, OpenTSDB, Apache Solr, and Hadoop Distributed File System (HDFS). This helps you seamlessly migrate existing business workloads to Alibaba Cloud. At the top layer of the architecture, Lindorm uses a unified data stream bus to transmit data and capture data changes between engines in real time. This allows Lindorm to provide multiple capabilities, such as data migration, real-time subscription, data transmission between Lindorm and data lakes, data synchronization from data warehouses to Lindorm, active geo-redundancy, and backup and recovery.
LindormStore is a distributed storage system provided for cloud storage infrastructure. It is compatible with HDFS protocols. LindormStore can run in local disk environments to meet the requirements of some Apsara Stack and exclusive customers. It provides a unified and environment-independent standard API for the multi-model engine system and external computing systems. The following figure shows the architecture.
LindormStore provides the following storage types: performance, standard, and capacity. You can use a combination of storage types based on the cold and hot data in real scenarios. LindormStore can work with the cold and hot data separation capability provided by the multi-model engine system so that you can allocate storage resources for cold and hot data. This helps reduce the storage cost of large amounts of data.
Wide table engine
Lindorm provides a wide table engine named LindormTable for distributed storage of large amounts of semi-structured and structured data. LindormTable is a NoSQL system and can be used to store metadata, orders, bills, user personas, social information, feeds, and logs. It is compatible with the open standards of multiple open source software, such as Apache HBase, Apache Phoenix, and Apache Cassandra. LindormTable is developed based on the idea of automatic sharding, multi-replica storage, and log-structured merge trees (LSMs). It provides query processing capabilities, such as global secondary indexes, multi-dimensional searches, dynamic columns, and Time to Live (TTL). LindormTable can store hundreds of trillions of rows per table, process tens of millions of concurrent requests, support instant responses within milliseconds, and implement cross-zone disaster recovery for strong consistency. This helps meet the business needs of storing and querying large-scale data online. The following figure shows the overall architecture of LindormTable.
LindormTable stores data persistently in LindormStore. The automatic sharding feature distributes the data in tables to multiple servers in the cluster. Each partition can have one primary replica and multiple secondary replicas. The primary and secondary replicas can be loaded in different zones to ensure high availability and strong consistency within the cluster. How data is synchronized between primary and secondary replicas and the read/write mode depend on the consistency level.
Strong consistency: Only the primary replica supports read and write operations. Data is synchronized to the secondary replicas in asynchronous mode. If the primary replica fails, a secondary replica is promoted to be the primary replica. Before the switchover, the system synchronizes the data from the failed primary replica and verifies that the secondary replica contains all the data. Data synchronization is coordinated by the primary node.
Eventual consistency: Both primary and secondary replicas support read and write operations. Data is synchronized between primary and secondary replicas to ensure the eventual consistency of data.
The multi-replica architecture of LindormTable is based on the PACELC theorem. You can specify a consistency level for each table to ensure different availability and performance for different tables. If you specify the eventual consistency level, the server triggers concurrent access to replicas for each read/write request when specific conditions are met. This significantly increases the request success rate and reduces the response latency. The concurrency control mechanism is built on an internal asynchronous framework. Compared with starting multiple threads, this mechanism consumes almost zero additional resources. The trigger conditions of concurrent access can be divided into two types. The first type is the timeout trigger. You can set GlitchTimeout for each request. Each request is first sent to one replica. If no response is returned from the replica within the specified period, the request is sent to all the remaining replicas. The response with the minimum latency is used. The second type is the blacklist policy. The server adds the replicas that do not respond or respond slowly to the blacklist based on mechanisms such as timeout, error throwing, and detection. This allows requests to bypass the nodes that have hardware limits or flaws in software designs. This maximizes service stability and availability. When the system stops responding due to external factors, such as a power outage, a node loses heartbeats 1 to 2 minutes after it becomes unavailable. The multi-replica architecture of LindormTable can reduce the latency to less than 10 seconds. This significantly improves service availability.
The LSM tree of LindormTable is a data structure used to separate hot data from cold data. It enables automatic separation of hot and cold data for tables of customers and maintains transparent queries. LindormTable works with the hot and cold storage management capabilities of LindormStore at the underlying layer to dramatically reduce the overall storage cost of large amounts of data.
The data model provided by LindormTable is a loose table structure that supports multiple data types. Compared with traditional relational models, LindormTable provides predefined column types and also allows you to dynamically add columns without initiating Data Definition Language (DDL) change requests. This helps adapt to the flexible and changeable big data scenarios. LindormTable also supports global secondary indexes and inverted indexes. The system automatically selects the most suitable indexes based on the query conditions to accelerate the execution of boolean queries. LindormTable is suitable for the queries of large amounts of data, such as user personas and bills.
Time series engine
Lindorm provides a time series engine named LindormTS for storing and processing large amounts of time series data. LindormTS is a distributed time series engine and is compatible with the open standards of OpenTSDB. It uses a combination of range partitioning and hash partitioning and the dedicated LSM architecture and file structure based on the characteristics and query methods of time series data. LindormTS provides low-cost storage, downsampling, aggregate computing, high availability, and disaster recovery for large amounts of time series data. It can store and process measurement data and device operational data in scenarios such as IoT and monitoring. The following figure shows the overall architecture of LindormTS.
TSCore is the core part responsible for data structuring in LindormTS. Its overall architecture is similar to the LSM structure. The data is written into Memchunk and then flushed to disks. However, time series data is written by time sequence by nature. Therefore, LindormTS uses a dedicated file structure TSFile to split data by time window. Data is physically and logically separated by time. This significantly reduces the I/O amplification in compaction and makes the implementation of TTL and separation of hot and cold data efficient.
TSCompute is a component responsible for the real-time computing of time series data. It is used to meet the monitoring requirements, such as downsampling, timeline aggregation, and time series forecasting. TSCompute uses Lindorm Streams to subscribe to data and is completely based on in-memory computing. Therefore, it is lightweight and efficient and is suitable for the built-in computing features of LindormTS. To meet some complex analysis requirements, you can connect LindormTS to systems such as Spark and Flink. This way, the system can support more scenarios and adapt to business changes.
Lindorm provides a search engine named LindormSearch for processing large amounts of data. LindormSearch is a distributed search engine and is compatible with the open standards of the open source Solr platform. It can be seamlessly used to store the indexes of the wide table and time series engines to speed up queries. Its overall architecture is the same as LindormTable and is built based on the idea of automatic sharding, multi-replica storage, and Lucene. LindormSearch offers capabilities such as full-text searches, aggregate computing, and complex multi-dimensional queries. It also provides horizontal scaling, write-once read-many, cross-zone disaster recovery, and TTL to meet the demand for efficient retrieval of large amounts of data. The following figure shows the architecture of LindormSearch.
LindormSearch stores data persistently in LindormStore. The automatic sharding feature distributes the data to multiple SearchSevers in the cluster. Each shard has multiple replicas and contains one primary node and multiple read-only nodes. This improves the efficiency of aggregate queries. These replicas share storage to eliminate data redundancy.
In the Lindorm system, LindormSearch can work as an independent model to provide a loose document view for semi-structured and unstructured data. It is suitable for log data analysis and full-text searches. It can also be used to store the indexes of the wide table and time series engines. The process is transparent to users. Some fields in the wide table and time series engines are automatically synchronized to the search engine over an internal data connection. The data model and read and write access remain unified for users. Users are not aware of the search engine. Cross-engine tasks such as data association, consistency, query aggregation, and TTL management are all processed by the system in a simple and transparent way. This gives a full play to multi-model integration.
Lindorm provides a file engine named LindormFile. LindormFile is a distributed file model service based on the lightweight encapsulation of LindormStore. It provides multiple capabilities, such as authentication and throttling protection, and supports connections over multiple networks. This allows external systems to directly access the underlying files of the multi-model engine system. This makes backup archiving, computing, and analysis more efficient. LindormFile also allows you to generate physical files in the underlying data format in offline computing systems. You can use LindormFile to import the files into LindormStore to minimize the impact on online services. In some storage and computing scenarios, you can store non-computed source files in LindormFile and the computing results in LindormTable. You can then integrate Lindorm with large-scale computing engines such as Spark and Data Lake Analytics (DLA) to deliver simple and efficient data lake analytics capabilities.