This topic describes the architecture of Lindorm, including the background information and overall structure.
Background information
Due to the rapid development of information technologies, more types of data is generated by different business activities in various industries. The data includes structured business data, such as metadata, operational data, measurement data from devices or systems, and logs. The data also includes semi-structured business data, such as operational data, logs, images, and files. In a traditional solution, when you design an IT architecture, you must use different storage and analysis technologies to store, query, and analyze multiple types of data. The following figure shows an architecture in a traditional solution.
This kind of technical solutions integrate fragmented technologies to meet the requirements in different scenarios. Multiple databases are required to separately process different types of data. This kind of solutions have the following drawbacks:
Multiple complex technical components are used.
The selection of technologies is complex.
A long process is required for data access and data synchronization.
These drawbacks impose high requirements on technical personnel and result in long business rollout cycles, high failure rates, and high maintenance costs. This can cause issues related to the development of information systems. Technological fragmentation also leads to a fragmented technical architecture. This is not conducive to the evolution and development of technical architectures and eventually disables technical architectures from meeting business development requirements. For example, you must configure cross-zone high availability, enable global synchronization, or reduce storage costs to meet the requirements of your ever-growing business. In this case, each technical component must be evolved and developed independently. This requires a long period of time and a larger number of personnel. As the business grows, more types of data are generated. The demand for differentiated processing of different types of data increases day by day. This causes more data fragmentation.
A major issue in the development of information technologies is that various types of data is generated due to increasing various business demands. This requires more complex and expensive storage architectures to meet the requirements for differentiated data processing. The application of new information technologies such as 5G, IoT, and Internet of Vehicles (IoV) also causes this issue to occur. To resolve this issue, Alibaba Cloud launched Lindorm to meet the unified storage requirements, query requirements, and analysis requirements of multi-model data. Compared with traditional solutions, Lindorm can simplify the storage architecture, improve system stability, and reduce construction costs. The following figure shows the Lindorm system.
Overall architecture
Lindorm innovatively uses a cloud native multi-model integration architecture in which storage resources are decoupled from computing resources. The architecture can meet the requirements for resource decoupling and auto scaling in the era of cloud computing. Lindorm uses a cloud native distributed file system named LindormDFS as a unified storage base. Multiple dedicated engines are built based on the LindormDFS, including the wide table engine, time series engine, search engine, and streaming engine. Lindorm runs on the multi-model engine system and allows you to use SQL to access your Lindorm instances and perform associated queries across different models. Lindorm is also compatible with the open source standard interfaces of Apache HBase, Apache Cassandra, OpenTSDB, InfluxDB, Apache Kafka, and Hadoop Distributed File System (HDFS). This helps you seamlessly migrate existing business workloads to Alibaba Cloud. At the top layer of the architecture, Lindorm provides the Lindorm Tunnel Service (LTS) to transmit data and capture data changes across engines in real time. This way, Lindorm provides multiple capabilities such as data migration, real-time data change tracking, data transmission between Lindorm and data lakes, data synchronization from data warehouses to Lindorm, active geo-redundancy, and backup and recovery.
LindormDFS
LindormDFS is a distributed storage system designed for cloud storage infrastructure. LindormDFS is compatible with HDFS protocols and can run in local disks to meet the requirements of key customers. LindormDFS provides a unified and environment-independent standard interface for all Lindorm engines and external computing systems. The following figure shows the architecture of LindormDFS.
LindormDFS provides the following storage types: performance, standard, and capacity. You can use a combination of storage types based on the characteristics of cold and hot data in actual business scenarios. LindormDFS can be used together with the cold and hot data separation capability provided by the multi-model engine system. This way, you can allocate storage resources for cold and hot data. This helps you reduce the storage cost of large amounts of data.
In various scenarios such as data computing and analysis, data backup and archiving, and data importing, you can use LindormDFS to access the underlying data files of different Lindorm engines from external systems. This way, the efficiency of data read and write operations is significantly improved. For example, you can generate underlying files of the specific data format in offline computing systems and directly import the files to LindormDFS to reduce the business downtime caused by data importing.
LindormTable
Lindorm provides a wide table engine named LindormTable for distributed storage of large amounts of semi-structured and structured data. LindormTable is a NoSQL database system that can be used to store metadata, orders, bills, user personas, social media information, feeds, and logs. The system is compatible with the standard interfaces of multiple open source software, such as Apache HBase and Apache Cassandra. LindormTable is developed based on the idea of automatic sharding, multi-replica storage, and log-structured merge trees (LSMs). LindormTable provides query processing capabilities, such as global secondary indexes, multi-dimensional searches, dynamic columns, and time to live (TTL). LindormTable can store hundreds of trillions of rows per table, process tens of millions of concurrent requests, support instant responses in a few milliseconds, and implement cross-zone disaster recovery for strong consistency. This helps your business meet the requirements for storing and querying large-scale data online. LindormTable is a NoSQL database system that can be used to store large amounts of semi-structured and structured data. LindormTable is compatible with the standard interfaces of multiple open source software, such as Apache HBase and Apache Cassandra. The following figure shows the overall architecture of LindormTable.
LindormTable persistently stores data in LindormDFS. The automatic sharding feature distributes the data in tables to multiple servers in the cluster. Each partition can have one primary replica and multiple secondary replicas. The primary and secondary replicas can be loaded in different zones to ensure high availability and strong consistency within the cluster. How data is synchronized between primary and secondary replicas and the read/write mode vary based on the consistency level.
Strong consistency: Only the primary replica supports read and write operations. Data is synchronized to the secondary replicas in asynchronous mode. If the primary replica fails, a secondary replica is promoted to be the primary replica. Before the failover is performed, the system synchronizes the data from the failed primary replica and verifies that the secondary replica contains all the data. Data synchronization is coordinated by the primary node.
Eventual consistency: Both primary and secondary replicas support read and write operations. Data is synchronized between primary and secondary replicas to ensure the eventual consistency of data.
The multi-replica architecture of LindormTable is based on the PACELC theorem. You can specify a consistency level for each table to ensure different availability and performance for different tables. If you specify the eventual consistency level, the server triggers concurrent access to replicas for each read/write request when specific conditions are met. This significantly increases the request success rate and reduces the response glitch. The concurrency control mechanism is built based on an internal asynchronous framework. Compared with starting multiple threads, this mechanism rarely consumes additional resources. The trigger conditions for concurrent access can be classified into two types.
The first type is the timeout trigger. You can configure GlitchTimeout for each request. Each request is sent to one replica. If no response is returned from the replica in the specified period, the request is sent to all remaining replicas. The response that can be returned at the minimum latency is used.
The second type is the blacklist policy. The server adds the replicas that do not respond or require a long period of time to respond to the blacklist based on mechanisms such as timeout, error throwing, and detection. This way, requests can bypass the nodes that have hardware limits or flaws in software design. This helps you maximize service stability and service availability. When the system stops responding due to external factors, such as a power outage, a node loses heartbeats one to two minutes after the node becomes unavailable. The multi-replica architecture of LindormTable can reduce the latency and significantly improve service availability.
The LSM tree of LindormTable is a data structure that is used to separate hot data from cold data. This data structure can be used to automatically separate hot data and cold data in tables, and perform transparent queries. LindormTable can be used together with the hot and cold storage management capabilities of LindormDFS at the underlying layer to dramatically reduce the overall storage costs of large amounts of data.
The data model provided by LindormTable is a loose table structure that supports multiple data types. Compared with traditional relational models, LindormTable provides predefined column types and also allows you to dynamically add columns without the need to initiate DDL requests. This helps your business meet the requirements of data flexibility and data modification in big data scenarios. LindormTable also supports global secondary indexes and inverted indexes. The system automatically selects the most suitable indexes based on the specified query conditions to accelerate the execution of boolean queries. LindormTable is suitable for scenarios in which large amounts of data needs to be queried, such as user persona and billing scenarios.
LindormTSDB
Lindorm provides a time series engine named LindormTSDB to store and process large amounts of time series data. LindormTSDB is a distributed time series engine and is compatible with standard interfaces of open source OpenTSDB. LindormTSDB uses a combination of range partitioning and hash partitioning and the dedicated LSM architecture and file structure based on the characteristics and query methods of time series data. LindormTSDB provides low-cost storage, downsampling, aggregate computing, high availability, and disaster recovery for large amounts of time series data. LindormTSDB can store and process measurement data and device operational data in scenarios such as IoT and monitoring. The following figure shows the overall architecture of LindormTSDB.
TSCore is the core component that is used for data structuring in LindormTSDB. The overall architecture of TSCore is similar to the LSM structure. The data is written into Memchunk, and then flushed to disks. Time series data is written based on time sequence. Therefore, LindormTSDB uses a dedicated file structure TSFile to split data by time window. Data is physically and logically separated by time. This significantly reduces the I/O amplification in compaction and makes the implementation of TTL and separation of hot and cold data efficient.
TSCompute is a component that is used for real-time computing of time series data. TSCompute is used to meet monitoring requirements in downsampling and timeline aggregation. TSCompute uses the Lindorm streaming engine to subscribe to data and is completely based on in-memory computing. Therefore, TSCompute is lightweight and efficient and is suitable for the built-in computing features of LindormTSDB. To meet some complex analysis requirements, you can connect LindormTSDB to systems such as Spark and Flink. This way, the system can support more scenarios and adapt to business changes.
LindormSearch
Lindorm provides a search engine named LindormSearch to process large amounts of data. LindormSearch is a distributed search engine and is compatible with standard interfaces of Apache Solr. LindormSearch can be used to store the indexes of LindormTable and LindormTSDB to accelerate queries. The overall architecture of LindormSearch is the same as the overall architecture of LindormTable, and is built based on concepts related to automatic sharding, multi-replica storage, and Lucene. LindormSearch provides capabilities such as full-text searches, aggregate computing, and complex multi-dimensional queries. It also provides horizontal scaling, write-once read-many, cross-zone disaster recovery, and TTL to meet the requirements for efficient retrieval of large amounts of data. The following figure shows the architecture of LindormSearch.
LindormSearch persistently stores data in LindormDFS. The automatic sharding feature distributes the data to multiple SearchServers in the cluster. Each shard has a primary replica and multiple read-only replicas. This improves the efficiency of aggregate queries. These replicas share storage to eliminate data redundancy.
In the Lindorm system, LindormSearch can be used as an independent model to provide a loose document view for semi-structured and unstructured data. LindormSearch is suitable for log data analysis and full-text searches. LindormSearch can also be used to store the indexes of LindormTable and LindormTSDB. The process is transparent to users. Some fields in LindormTable and LindormTSDB are automatically synchronized to LindormSearch over an internal data link. The data model and read and write access requests remain unified for users. Users are not aware of the search engine. Cross-engine tasks such as data association, consistency, query aggregation, and TTL management are all processed by the system in a simple and transparent manner. This helps users use the integrated multi-model architecture in an efficient manner.
Lindorm Streaming Engine
The Lindorm streaming engine is developed to process streaming data. This engine provides streaming data storage and lightweight computing capabilities. The Lindorm streaming engine is compatible with the Kafka API and Flink SQL. You can use this engine to process and use streaming data in your business.
The Lindorm streaming engine integrates the streaming storage component and the streaming computing component. The two components are deployed as a whole to process streaming data with high performance in real time. The streaming storage component subscribes to the log data in message services, writes the data to the Lindorm streaming engine, and persistently stores the data in LindormDFS. The streaming storage component is compatible with the APIs of Apache Kafka and can provide high throughput and scalability with low costs. The streaming computing component processes log data from message services in real time. The processing result can be synchronized to LindormTable or LindormTSDB. The streaming computing component is compatible with Flink SQL.
LDPS
Lindorm provides a compute engine named Lindorm Distributed Processing System (LDPS). LDPS is a distributed computing service that is provided based on the cloud native architecture. The compute nodes of LDPS run in the Alibaba Cloud Serverless Kubernetes (ASK) clusters. LDPS supports computing models and programming interfaces of Spark Community Edition. LDPS also integrates the features of LindormDFS, and fully uses the underlying storage features and indexing capabilities to complete distributed jobs in an efficient manner. LDPS provides high-performance computing services in scenarios such as data production, interactive analytics, and machine learning. LDPS also provides job management interfaces. You can monitor and maintain Spark jobs in the Spark Web UI (SparkUI).