All Products
Search
Document Center

Lindorm:Service architecture

Last Updated:Aug 28, 2022

This topic describes the architecture of Lindorm, including the background information and overall structure.

Background information

Due to the rapid development of information technologies, the number of different types of data that is generated for business activities in all walks of life is increased. The data includes structured business data, such as metadata, operational data, equipment or system measurement data, and logs. The data also includes semi-structured business operational data, logs, images, and files. In a traditional solution, when you design an IT architecture, you need to use different storage and analysis technologies to store, query, and analyze multiple types of data.

This kind of technical solutions is a typical technological fragmentation solution. Multiple databases are required to separately process different types of data. This kind of solutions has the following drawbacks:

  • Multiple complex technical components are used.

  • The selection of technologies is complex.

  • Links for data access and data synchronization are long.

These drawbacks impose high requirements on technical personnel and result in long business rollout cycles, high failure rates, and high maintenance costs. This can cause issues related to the development of information systems. Technological fragmentation also leads to a fragmented technical architecture. This is not conducive to the long-term evolution and development of technical architectures and eventually disables technical architectures from meeting business development requirements. For example, you must configure cross-zone high availability, enable global synchronization, or reduce storage costs to meet the requirements of your ever-growing business. In this case, each technical component must be evolved and developed independently. This requires long period of time and a larger number of personnel. As the business grows, more types of data are generated. The demand for differentiated processing of different types of data increases day by day. This causes more data fragmentation.

A major issue in the development of information technologies is that various types of data is generated due to increasing various business demands. This requires more complex and expensive storage architectures to meet the requirements for differentiated data processing. The application of new information technologies such as 5G, IoT, and Internet of Vehicles (IoV) also causes this issue to occur. To resolve this issue, Alibaba Cloud launched Lindorm to meet the unified storage requirements, query requirements, and analysis requirements of multi-model data. Compared with traditional solutions, Lindorm can simplify the storage architecture, improve system stability, and reduce construction costs.

Architecture

Lindorm innovatively uses a cloud native multi-model integration architecture in which storage is decoupled from computing. The architecture can meet the requirements for resource decoupling and auto scaling in the era of cloud computing. Lindorm uses a cloud native storage engine named LindormStore as a unified storage base. Multiple dedicated engines are built based on the base, including the wide table engine, time series engine, search engine, and file engine. Lindorm runs on the multi-model engine system and allows you to use SQL to interact with your Lindorm instances and run associated queries across different models. Lindorm is also compatible with the open source standard interfaces of Apache HBase, Apache Phoenix, Apache Cassandra, OpenTSDB, Apache Solr, and Hadoop Distributed File System (HDFS). This can help you seamlessly migrate existing business workloads to Alibaba Cloud. At the top layer of the architecture, Lindorm uses a unified data stream bus to transmit data and capture data changes across engines in real time. This way, Lindorm can provide multiple capabilities, such as data migration, real-time subscription, data transmission between Lindorm and data lakes, data synchronization from data warehouses to Lindorm, active geo-redundancy, and backup and recovery.

Storage engine

LindormStore is a distributed storage system provided for cloud storage infrastructure. LindormStore is compatible with HDFS protocols. LindormStore can run in local disk environments to meet the requirements of some Apsara Stack users and key customers. LindormStore provides a unified and environment-independent standard interface for the multi-model engine system, and external computing systems.

LindormStore provides the following storage types: performance, standard, and capacity. You can use a combination of storage types based on the characteristics of cold and hot data in actual business scenarios. LindormStore can be used together with the cold and hot data separation capability provided by the multi-model engine system. This way, you can allocate storage resources for cold and hot data. This helps reduce the storage cost of large amounts of data.

Wide table engine

Lindorm provides a wide table engine named LindormTable for distributed storage of large amounts of semi-structured and structured data. LindormTable is a NoSQL database system that can be used to store metadata, orders, bills, user personas, social media information, feeds, and logs. The system is compatible with the standard interfaces of multiple open source software, such as Apache HBase, Apache Phoenix, and Apache Cassandra. LindormTable is developed based on the idea of automatic sharding, multi-replica storage, and log-structured merge trees (LSMs). LindormTable provides query processing capabilities, such as global secondary indexes, multi-dimensional searches, dynamic columns, and time to live (TTL). LindormTable can store hundreds of trillions of rows per table, process tens of millions of concurrent requests, support instant responses in a few milliseconds, and implement cross-zone disaster recovery for strong consistency. This helps meet the business requirements for storing and querying large-scale data online.

LindormTable persistently stores data in LindormStore. The automatic sharding feature distributes the data in tables to multiple servers in the cluster. Each partition can have one primary replica and multiple secondary replicas. The primary and secondary replicas can be loaded in different zones to ensure high availability and strong consistency in the cluster. How data is synchronized between primary and secondary replicas and the read/write mode vary based on the consistency level.

  • Strong consistency: Only the primary replica supports read and write operations. Data is synchronized to the secondary replicas in asynchronous mode. If the primary replica fails, a secondary replica is promoted to be the primary replica. Before the failover is performed, the system synchronizes the data from the failed primary replica and verifies that the secondary replica contains all the data. Data synchronization is coordinated by the primary node.

  • Eventual consistency: Both primary and secondary replicas support read and write operations. Data is synchronized between primary and secondary replicas to ensure the eventual consistency of data.

The multi-replica architecture of LindormTable is based on the PACELC theorem. You can specify a consistency level for each table to ensure different availability and performance for different tables. If you specify the eventual consistency level, the server triggers concurrent access to replicas for each read/write request when specific conditions are met. This significantly increases the request success rate and reduces the response glitch. The concurrency control mechanism is built based on an internal asynchronous framework. Compared with starting multiple threads, this mechanism consumes almost zero additional resources. The trigger conditions for concurrent access can be classified into two types.

  • The first type is the timeout trigger. You can configure GlitchTimeout for each request. Each request is sent to one replica. If no response is returned from the replica in the specified period, the request is sent to all remaining replicas. The response that can be returned at the minimum latency is used.

  • The second type is the blacklist policy. The server adds the replicas that do not respond or require a long period of time to respond to the blacklist based on mechanisms such as timeout, error throwing, and detection. This way, requests can bypass the nodes that have hardware limits or flaws in software design. This helps maximize service stability and service availability. When the system stops responding due to external factors, such as a power outage, a node loses heartbeats 1 to 2 minutes after the node becomes unavailable. The multi-replica architecture of LindormTable can reduce the latency to less than 10 seconds. This helps significantly improve service availability.

The LSM tree of LindormTable is a data structure that is used to separate hot data from cold data. This data structure can be used to automatically separate hot data and cold data in tables, and perform transparent queries. LindormTable can be used together with the hot and cold storage management capabilities of LindormStore at the underlying layer to dramatically reduce the overall storage costs of large amounts of data.

The data model provided by LindormTable is a loose table structure that supports multiple data types. Compared with traditional relational models, LindormTable provides predefined column types and also allows you to dynamically add columns without the need to initiate DDL requests. This helps meet the requirements of data flexibility and data modification in big data scenarios. LindormTable also supports global secondary indexes and inverted indexes. The system automatically selects the most suitable indexes based on the specified query conditions to accelerate the execution of boolean queries. LindormTable is suitable for scenarios in which large amounts of data needs to be queried, such as user persona and billing scenarios.

Time series engine

Lindorm provides a time series engine named LindormTS to store and process large amounts of time series data. LindormTS is a distributed time series engine and is compatible with standard interfaces of open source OpenTSDB. LindormTS uses a combination of range partitioning and hash partitioning and the dedicated LSM architecture and file structure based on the characteristics and query methods of time series data. LindormTS provides low-cost storage, downsampling, aggregate computing, high availability, and disaster recovery for large amounts of time series data. LindormTS can store and process measurement data and device operational data in scenarios such as IoT and monitoring.

TSCore is the core component that is used for data structuring in LindormTS. The overall architecture of TSCore is similar to the LSM structure. The data is written into Memchunk, and then flushed to disks. Time series data is written based on time sequence. Therefore, LindormTS uses a dedicated file structure TSFile to split data by time window. Data is physically and logically separated by time. This significantly reduces the I/O amplification in compaction and makes the implementation of TTL and separation of hot and cold data efficient.

TSCompute is a component that is used for real-time computing of time series data. TSCompute is used to meet monitoring requirements, such as downsampling, timeline aggregation, and time series forecasting. TSCompute uses Lindorm Streams to subscribe to data and is completely based on in-memory computing. Therefore, TSCompute is lightweight and efficient and is suitable for the built-in computing features of LindormTS. To meet some complex analysis requirements, you can connect LindormTS to systems such as Spark and Flink. This way, the system can support more scenarios and adapt to business changes.

Search engine

Lindorm provides a search engine named LindormSearch to process large amounts of data. LindormSearch is a distributed search engine and is compatible with standard interfaces of Apache Solr. LindormSearch can be used to store the indexes of the wide table engine and time series engine to accelerate queries. The overall architecture of LindormSearch is the same as the overall architecture of LindormTable, and is built based on concepts related to automatic sharding, multi-replica storage, and Lucene. LindormSearch provides capabilities such as full-text searches, aggregate computing, and complex multi-dimensional queries. It also provides horizontal scaling, write-once read-many, cross-zone disaster recovery, and TTL to meet the requirements for efficient retrieval of large amounts of data.

LindormSearch persistently stores data in LindormStore. The automatic sharding feature distributes the data to multiple SearchServers in the cluster. Each shard has multiple replicas and contains one primary node and multiple read-only nodes. This improves the efficiency of aggregate queries. These replicas share storage to eliminate data redundancy.

In the Lindorm system, LindormSearch can be used as an independent model to provide a loose document view for semi-structured and unstructured data. LindormSearch is suitable for log data analysis and full-text searches. LindormSearch can also be used to store the indexes of the wide table engine and time series engine. The process is transparent to users. Some fields in the wide table engine and time series engine are automatically synchronized to the search engine over an internal data link. The data model and read and write access requests remain unified for users. Users are not aware of the search engine. Cross-engine tasks such as data association, consistency, query aggregation, and TTL management are all processed by the system in a simple and transparent manner. This helps users use the integrated multi-model architecture in an efficient manner.

File engine

Lindorm provides a file engine named LindormFile. LindormFile is a distributed file model service based on the lightweight encapsulation of LindormStore. LindormFile provides multiple capabilities, such as authentication and throttling protection, and supports connections over multiple networks. This way, external systems can directly access the underlying files of the multi-model engine system. This makes backup archiving, computing, and analysis more efficient. LindormFile also allows you to generate physical files in the underlying data format in offline computing systems. You can use LindormFile to import the files into LindormStore to minimize the impact on online services. In some storage and computing scenarios, you can store source files that are not computed in LindormFile and the computing results in LindormTable. Then, you can integrate Lindorm with large-scale compute engines such as Spark and Data Lake Analytics (DLA) to provide simple and efficient data lake analytics capabilities.

Compute engine

A compute engine is a distributed computing service that is provided based on the cloud native architecture. Compute nodes run in the Alibaba Cloud Serverless Kubernetes (ASK) clusters. The compute engine supports computing models and programming interfaces of Spark Community Edition. The compute engine also integrates the features of the Lindorm storage engine, and fully uses the underlying storage features and indexing capabilities to complete distributed jobs in an efficient manner. The compute engine provides high-performance computing services in scenarios such as data production, interactive analytics, and machine learning. The compute engine also provides job management interfaces. You can monitor and maintain Spark jobs in the Spark Web UI (SparkUI).