Released by ELK Geek
This blog aims to provide answers to the following questions:
We will explain why combining relational databases with Elasticsearch is beneficial from both technical and business perspectives.
In relational databases, indexing follows the principle of giving the highest priority to the leftmost alternative. You have to create indexes in advance according to the order required for queries, and cannot arbitrarily combine fields for indexing. For example, when a table contains 30 fields, your relational database, even if it is MongoDB, a popular NoSQL product, will not be able to support queries based on arbitrary field combinations.
Relational databases provide limited support for join queries. In business application systems, the number of correlated tables is usually limited to two or three. In my opinion, any business scenario with three correlated tables should undergo a feasibility evaluation by technical architects because development engineers generally only allow up to two correlated tables. In addition, the data volumes of the correlated tables must be balanced, and correlation across homogeneous database sources is definitely forbidden.
What about correlation across heterogeneous database sources? This is allowed in theory, but impractical in practice. Relational databases generally use the B+Tree structure for index-based queries. This structure has an inherent bottleneck in processing vast amounts of data, leading to sharp performance degradation or even breakdown when processing millions of data records or complex queries.
Although relational databases support synchronization between primary and secondary databases and read/write splitting, their cluster nodes are loosely structured and have weak sensing capabilities. In addition, their silo architectures hinder elastic scaling.
Elasticsearch is built based on the Lucene core library, uses inverted index algorithms, and integrates multiple efficient algorithms. Elasticsearch creates indexes for all fields by default and supports the arbitrary combination of index fields, leading to very efficient queries. Elasticsearch provides more efficient and flexible indexing compared with relational databases. In a business system, many scenarios require generic searches, which are queries based on arbitrary field combinations.
Elasticsearch data models adopt the Free Scheme mode and JavaScript Object Notation (JSON) format. This allows you to flexibly add data fields or set field hierarchies. Different from relational databases that only support multi-table join queries, Elasticsearch supports denormalized correlation, which allows you to integrate table data from multiple businesses into one index by nesting objects.
Elasticsearch is designed with a distributed architecture and replica and sharding mechanisms. This enables its clusters to scale out elastically and handle vast amounts of data queries. Even if an index contains billions of data records, the query response time is less than a second.
Elasticsearch indexes support elastic searches. You can specify one or more indexes for a search. Search results are filtered and merged by Elasticsearch. This facilitates flexible operations in business systems, especially in terms of queries of real-time data and historical data, where both unified query conditions and syntax can be executed.
Elasticsearch supports isolation between data fields and index fields. Data fields and index fields are both enabled by default. In actual business scenarios, many fields in tables do not need to be retrieved. Instead, they are used to store data and facilitate retrieval during queries. Similarly, many retrieval fields do not store raw data, but are used for retrieval and filtering.
Relational databases are positioned as versatile products, with a strong transaction mechanism and average data writing performance and data query capabilities. Elasticsearch is positioned as a query-based analytic product, with a weak transaction mechanism and excellent query performance. DBs and Elasticsearch each have their own advantages and disadvantages and cannot replace each other.
Since we have entered the era of the Internet, mobile Internet, and Internet of things (IoT), the data volumes of business systems have been increasing exponentially. The traditional strategy to cope with this is to adopt sharding, including database and table sharding at the physical layer and portioning at the logical layer. Non-relational databases are used to store non-critical data, while relational databases must be used to store critical business data. For example, customer order data in the logistics and express delivery industry cannot tolerate loss or error, and therefore must be stored in relational databases.
The microservice architecture model is the most popular solution for handling complex business system requirements. Microservices involve both upper-layer application services and underlying data services, which are split and decomposed based on domain models. (Microservices are not discussed in detail in this article.) In this case, an application service is split into dozens, hundreds, or even more microservices of various sizes. Similarly, a data service is split into several microservices of various sizes.
Sharding addresses the requirements for data storage, but imposes new requirements for merge queries. The query conditions in business systems are usually dynamic. It is impossible to fix the query conditions to particular values, and even more impossible to take all the dynamic conditions into account during the design of sharding. In this case, you usually need to select a more powerful query database such as Elasticsearch.
The microservice model addresses single coupling in business systems. However, it greatly increases query complexity. To run a query in a complex system, you need to integrate data services from multiple domains. This is especially true for data services in a core domain that is involved throughout a system, such as the order domain in the logistics and express delivery industry.
Relational databases are used to store business data. They adopt a strong transaction isolation mechanism to prevent loss, obfuscation, and overwriting of data, ensuring timeliness and reliability.
Elasticsearch is used to query business data. Data from different database or table shards, or data from databases or tables in different domains, is merged and synchronized to Elasticsearch indexes. This enables efficient queries.
The combination of DB and Elasticsearch involves problems of synchronization timeliness and data consistency. Solutions to these problems are discussed in the next article.
Now that we have shown the necessity of combining relational databases with Elasticsearch, the following sections describe how data models in business scenarios are mapped in combined scenarios.
1) In one-to-one mapping, each table in a relational database corresponds to one index in Elasticsearch.
2) The relational database acts as a raw data source, and Elasticsearch takes over the query engine role of the database and replaces list query scenarios.
3) Each table adopts a horizontal sharding design, which requires Elasticsearch to support merge queries. As shown in the figure, in an order scenario when average daily orders amount to more than several million in the e-commerce industry, merge queries are required by the backend business system.
4) There are many possible combinations of business query conditions for each table, and the database has a limited index query capability. Therefore, the full-field index query capability of Elasticsearch is needed to replace list query scenarios. For example, in a product search scenario in the e-commerce industry, nearly all of the dozens of basic product fields can be combined for search operations.
1) In one-to-many mapping, one table is mapped to multiple indexes.
2) The table acts as the main object of index A and is mapped in one-to-one mode.
3) The table also acts as the subobject of index B and is embedded to the main object.
4) Based on the microservice architecture design, a business system is divided into multiple subdomains, which are further divided. For example, a business system in the e-commerce industry may be divided into an order domain and a product domain. In this case, an order table needs to be mapped to an order index and a product index.
1) Many-to-many mapping, where multiple tables are mapped to multiple indexes, is highly complex.
2) A medium or large business system is divided into multiple domains. Each domain is further divided into multiple subdomains. Therefore, a network relationship is formed among the domains and business data is interrelated.
3) Databases are inefficient in multi-table join queries and provide limited support for cross-database queries. Therefore, the merge query feature of Elasticsearch is required.
4) Indexes are merged into different index files according to the requirements of domains. Different indexes are applied differently: General-purpose indexes are used in multiple domains, while special ones are used in a single domain.
1) Many-to-many mapping that involves multiple tables from multiple sources represents the most complex mapping relationship. In this case, tables are mapped to indexes in a cross mode.
2) In a large or medium-sized business system, different data storage systems are used according to business scenarios.
3) Businesses may use a wide range of relational databases. For example, one project might use MySQL, while another uses PostgreSQL, and a third uses SQL Server. Therefore, general-purpose queries cannot be implemented in the business system.
4) Business can also use a wide range of non-relational databases are diversified, such as Redis, a key-value database, and MongoDB, a document database. Therefore, general-purpose queries cannot be implemented in the business system.
5) Based on the requirements for generic queries across heterogeneous data sources, Elasticsearch is required to merge data.
Elasticsearch was originally positioned as a search engine product, but was later positioned as a data analytic product in the form of a NoSQL database. Elasticsearch does not support strict transaction isolation. However, by virtue of its advanced features, Elasticsearch can be widely used in application systems, effectively compensating for the shortcomings of relational databases.
This article describes the background and application scenarios of the combined use of DBs and Elasticsearch. It was not my intention to say either of the two is better. Instead, you need to balance DB and Elasticsearch in order to flexibly apply them to application systems based on application and business requirements.
DB: Refers to relational databases and other databases with a strict transaction isolation mechanism, such as MySQL, SQL Server, PostgreSQL, Oracle, and DB2. Top positions in the DB-Engines Ranking are all occupied by relational databases.
Elasticsearch: A popular open-source search engine product. It is also a NoSQL non-relational database without a strict transaction isolation mechanism. Currently Elasticsearch ranks seventh in the DB-Engines Ranking.
Application: In this article, applications refer to business application systems in Online Transaction Processing (OLTP) scenarios, not Online Analytical Processing (OLAP) scenarios. These are business systems that use a large number of transactional database products.
Index: An index is a data retrieval algorithm in relational databases and a virtual data storage space in Elasticsearch, which is similar to a table in a database.
Li Meng is an Elasticsearch Stack user and a certified Elasticsearch engineer. Since his first explorations into Elasticsearch in 2012, he has gained in-depth experience in the development, architecture, and operation and maintenance (O&M) of the Elastic Stack and has carried out a variety of large and mid-sized projects. He provides enterprises with Elastic Stack consulting, training, and tuning services. He has years of practical experience and is an expert in various technical fields, such as big data, machine learning, and system architecture.
Declaration: This article is reproduced with authorization from Li Meng, the original author. The author reserves the right to hold users legally liable in the case of unauthorized use.
Reduce Service Use by 80% by Migrating from MongoDB to Elasticsearch
Combining Elasticsearch with DBs: Real-time Data Synchronization
2,599 posts | 758 followers
FollowAlibaba Clouder - January 7, 2021
Alibaba Clouder - January 7, 2021
Alibaba Clouder - July 5, 2019
Alibaba Clouder - April 17, 2018
Alibaba Cloud Native Community - January 13, 2023
Alibaba Cloud Data Intelligence - June 20, 2024
2,599 posts | 758 followers
FollowApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreApsaraDB Dedicated Cluster provided by Alibaba Cloud is a dedicated service for managing databases on the cloud.
Learn MoreMore Posts by Alibaba Clouder