All Products
Search
Document Center

Lindorm:Engines

Last Updated:Mar 28, 2024

Lindorm provides the wide table engine, time series engine, search engine, file engine, compute engine, and streaming engine. Lindorm is compatible with standard APIs of multiple open source software and services, such as Apache HBase, Apache Cassandra, Amazon Simple Storage Service (Amazon S3), OpenTSDB, Apache Solr, Hadoop Distributed File System (HDFS), and Apache Kafka. Lindorm also provides capabilities such as SQL queries, time series data processing, and text-based data query and analysis.

To meet the requirements of dynamic workloads, each engine can separately scale computing resources and storage resources based on your business requirements. The wide table engine and the time series engine provide high concurrency and high throughput.

Select an engine

Different engines are suitable for different scenarios. You can select one or more engines based on your business requirements. The following table describes the engines supported by Lindorm.

Engine

Compatibility

Scenario

Description

Wide table engine (LindormTable)

Compatible with SQL, the HBase API, Cassandra Query Language (CQL), and the Amazon S3 API.

Suitable for managing and analyzing metadata, orders, bills, user personas, social information, feeds, logs, trajectories.

The wide table engine is used for the distributed storage of large amounts of semi-structured data and structured data. It supports global secondary indexes, multi-dimensional searches, dynamic columns, and the time-to-live (TTL) feature. The wide table engine can handle tens of millions of concurrent requests and store petabytes of data. It also provides the hot and cold data separation feature. Compared with the performance of open source Apache HBase, the read/write performance is increased by 2 to 6 times, the percentile 99% (P99) latency is decreased by 90%, the compression ratio is decreased by 100%, and the storage cost is decreased by 50%. The wide table engine provides the built-in Lindorm Ganos service that is applicable to spatial or spatio-temporal data. You can use the Lindorm Ganos service to query and analyze large amounts of historical trajectory data.

Time series engine (LindormTSDB)

Provides an HTTP API and is compatible with the OpenTSDB API.

Suitable for storing and processing time series data such as measurement data and operational data of devices in scenarios such as IoT and monitoring.

The time series engine is a distributed storage engine that is used to process large amounts of time series data. It supports SQL queries. The time series engine provides a dedicated compression algorithm for time series data. This helps improve the data compression ratio. The time series engine allows you to use multiple dimensions to query and aggregate large amounts of time series data by timeline. The engine also supports downsampling and elastic scaling.

Search engine (LindormSearch)

Compatible with SQL and the Apache Solr API.

Suitable for querying large amounts of data, such as logs, text, and documents. For example, you can use the search engine to search for logs, bills, and user personas.

Lindorm provides a distributed search engine. The search engine uses an architecture in which storage is decoupled from computing. The search engine can be seamlessly used to store the indexes of the wide table engine and the time series engine to accelerate data retrieval. The search engine provides various capabilities, including full-text searches, aggregation, and complex multi-dimensional queries. It also supports an architecture that consists of one write replica and multiple read-only replicas and provides features such as horizontal scaling, cross-zone disaster recovery, and TTL to meet the requirements of efficient retrieval of large amounts of data.

File engine (LindormDFS)

Compatible with the HDFS API.

Suitable for scenarios in which enterprise-grade data lakes are used for storage, Apache Hadoop is used as a storage base, or historical data is archived and compressed.

The file engine provides cloud native storage capabilities. It is compatible with HDFS communication protocols. You can directly connect to the file engine by using open source HDFS clients. You can call the HDFS API to use all the features of the file engine. You can also seamlessly connect the file engine to all open source HDFS ecosystems and cloud computing ecosystems. The file engine is developed and optimized based on HDFS. It can store exabytes of data at a low cost and perform automatic scale-up operations within a few minutes. The file engine also provides multiple features, such as horizontal bandwidth scaling. The file engine is suitable for building enterprise-grade low-cost data lakes based on HDFS. The decoupled storage and computing architecture of the file engine helps reduce the overall cost.

Compute engine (LDPS)

Compatible with the Apache Spark API.

Suitable for scenarios such as the production of large amounts of data, interactive analytics, computational learning, and graph computing.

The compute engine provides distributed computing services based on a cloud native architecture. It supports Community Edition computing models and programming interfaces. The compute engine also integrates the features of the Lindorm storage engine and uses the underlying data storage features and indexing capabilities to efficiently complete distributed jobs.

Streaming engine

Compatible with SQL and the Apache Kafka API.

Suitable for scenarios such as IoT data processing, application log processing, logistics aging analysis, travel data processing, and real-time trajectory processing.

The Lindorm streaming engine is used to store and process streaming data. It provides lightweight computing capabilities. You can use the stream computing engine to store streaming data to Lindorm to meet the requirements for the processing and application of streaming data. You can use the Lindorm Ganos service provided by the wide table together with the Lindorm streaming engine to implement real-time trajectory analysis features, such as electronic geofencing and regional statistic collection.

Select the number and specification of nodes

Lindorm supports the horizontal scale-out of engine nodes. You can add more nodes to resolve issues such as high latency, unstable performance, and excessive workloads on each node. However, if a large number of requests are sent to access hot data in a single node, you can only upgrade the specification of the node to avoid the issues caused by hot data. A node with higher specification is more capable to handle the requests sent to access hot data. In addition, nodes with higher specifications can provide more stable performance for your business and avoid excessive workloads or out of memory (OOM) issues caused by a large number of requests sent to access hot data and the traffic generated by the requests.

Therefore, we recommend that you select the specification of nodes in your instance based on the requirements of your business. You can upgrade the specifications of nodes in your Lindorm instance in the Lindorm console. For more information, see Change the engine specification of an instance. If you do not know how to select the node specification or need help when you upgrade the node specification, contact the technical support of Lindorm (DingTalk ID: s0s3eg3).

LindormTable

LindormTable nodes support specifications that range from 4 CPU cores and 8 GB of memory to 32 CPU cores and 256 GB of memory. The number of LindormTable nodes in a Lindorm instance can be increased. You can select a node specification based on the number of requests per second and the number of regions on a single node in your business.

Note

If you select Lindorm for Product Type when you create the Lindorm instance, the minimum specification of the LindormTable node contains 4 CPU cores and 16 GB of memory.

We recommend that you select the specification of LindormTable nodes based on the following rules:

  • If the number of requests sent to access a single node per second is less than 1,000 and the number of regions on a single node is less than 500, select the specification with 4 CPU cores and 16 GB of memory.

  • If the number of requests sent to access a single node per second is larger than 1,000 and less than 20,000, and the number of regions on a single node is larger than 500 and less than 1,000, select the specification with 8 CPU cores and 32 GB of memory.

  • If the number of requests sent to access a single node per second is larger than 20,000, and the number of regions on a single node is larger than 1,000, select the specification with 16 CPU cores and 64 GB of memory.

    Important

    When you select a node specification, you must consider other factors rather than the number of requests per second and the number of regions on a single node.

    • If you select the node specification for a complex business exactly based on the preceding rules, the business may not run stably and the latency may increase. Therefore, if your business meets one of the following conditions, we recommend that you select a node specification that is higher than the specifications described in the preceding rules.

      • The row that may be accessed contains kilobytes or even megabytes of data.

      • Complex filter conditions are specified in SCAN requests.

      • Most data that is accessed by requests is not cached in the memory but is stored on disks.

      • The Lindorm instance contains a large number of tables.

    • If your business provides online services, select a node specification with large memory to cache more data for better query performance.

    • If your business needs to run heavy-load tasks offline, such as MapReduce tasks and Spark tasks, or the TPS and QPS of your business are very high, we recommend that you select a node specification with more CPU cores.

    • If the CPU utilization of the nodes remains 70% or higher, we recommend that you upgrade the node specification.

LindormTSDB

LindormTSDB nodes support specifications that range from 4 CPU cores and 8 GB of memory to 32 CPU cores and 256 GB of memory. You can select the number and specification of LindormTSDB nodes based on the TPS in your business. You can select the number and specification of LindormTSDB nodes based on the TPS in your business.

Note

If you select Lindorm for Product Type when you create the Lindorm instance, the minimum specification of the LindormTSDB node contains 4 CPU cores and 16 GB of memory.

We recommend that you select the number and specification of LindormTSDB nodes based on the following rules:

  • If the TPS is less than 1,900,000, you can select three nodes each with 4 CPU cores and 16 GB of memory.

  • If the TPS is larger than 1,900,000 and is less than 3,900,000, you can select three nodes each with 8 CPU cores and 32 GB of memory.

  • If the TPS is larger than 3,900,000 and is less than 7,800,000, you can select three nodes each with 16 CPU cores and 64 GB of memory.

  • If the TPS is larger than 7,800,000 and is less than 11,000,000, you can select three nodes each with 32 CPU cores and 128 GB of memory.

Note

You can select the number and specification of LindormTSDB nodes based on the preceding rules when you want to maximize the data processing performance in your business. You must also consider other factors when you select the number and specification of LindormTSDB nodes, such as the business model type, the data size of a batch, and the number of concurrent requests. For more information, see Results of write tests and Results of query tests.

LindormDFS

LindormDFS nodes handle data read and write requests, manage data blocks, and provide support for the HDFS protocol. You can select the number of LindormDFS nodes based on the data size and bandwidth requirements in your business.

  • Data size: Each LindormDFS node supports a storage capacity ranging from 10 TB to 50 TB.

  • Bandwidth: Each LindormDFS node supports a data transmission bandwidth ranging from 100 MB/s to 200 MB/s.