Community Blog What Is Elasticsearch and How Does Elasticsearch Work

What Is Elasticsearch and How Does Elasticsearch Work

This article introduces what Elasticsearch is and how Elasticsearch works, and more...

What Is Elasticsearch?

Elasticsearch (Elastic Search) is an open-source product launched in 2010. Elasticsearch is a Lucene-based, distributed, real-time search and analytics engine. It is a product released under the Apache License. Elasticsearch is the mainstream search engine for enterprises. It provides distributed services, allowing you to store, query, and analyze large amounts of datasets in near real-time. Elasticsearch is typically used as a basic engine or technology to support complex queries and high-performance applications.

Over the years, the Elasticsearch ecosystem has evolved into Elastic Stack, which covers Elasticsearch, Logstash, and Kibana. Elasticsearch is a search engine, Logstash is responsible for data collection, conversion, and output, and Kibana provides powerful data visualization. According to DB-Engines, Elasticsearch ranks first among open-source databases. Elasticsearch has been widely used by developers.


What is the use of Elasticsearch?

Elasticsearch (Elastic Search) is suitable for a wide range of usage scenarios, such as full-text search, log analysis, operation and maintenance (O&M), monitoring, and security analysis.

Elasticsearch (Elastic Search) in Full-text search

Elasticsearch provides the full-text search feature, which is suitable for e-commerce commodities, apps, internal information of enterprises, and IT systems.

Assume that you want to run an electronic mall that supports commodity retrieval. In this case, you can use Elasticsearch to store commodity catalogs and inventory data and provide commodity retrieval and automatic recommendation services for your customers.

Elasticsearch (Elastic Search) in Log analysis

In complex business scenarios, various logs are generated, such as Apache logs, system logs, and MySQL logs. In most cases, it is difficult to retrieve valid data from these logs, but you are charged for their storage. Elasticsearch can connect to a variety of common data sources by using Beats and Logstash. It can also use integrated Kibana to efficiently analyze logs in a visualized manner.

Elasticsearch (Elastic Search) in O&M and monitoring

If you deploy Docker containers or databases such as MySQL and MongoDB on an Elastic Compute Service (ECS) instance or physical machine, or you work with complex IoT scenarios, you can use Elasticsearch together with Beats, Logstash, or ElasticFlow to collect all logs in a centralized manner in real-time and create indexes. Then, use Kibana to flexibly build a visual O&M dashboard based on the collected data. The dashboard displays information such as the hostname, IP address, deployment, and colors that indicate the health status of your ECS instances.

Elasticsearch (Elastic Search) in Security analysis

You can use Elasticsearch to analyze and retrieve a large number of historical logs to efficiently audit security. Besides, you can quickly identify real-time events in the system based on the responses from Elasticsearch. This helps you mitigate risks promptly.

Why is Elasticsearch so fast?

Elasticsearch allows you to deploy an Elasticsearch cluster across zones. In cross-zone deployment, the system automatically selects the zones that have sufficient Elastic Compute Service (ECS) instances. If replica shards are configured and nodes in one zone fail, the nodes in the remaining zones can still provide services without interruption. This significantly enhances the availability of the cluster. Also, you can perform a switchover in the console to isolate the faulty nodes. The system then adds computing resources to the remaining zones to make up for the resources lost in the zone that contains the faulty nodes.

Elasticsearch cluster across zones

If you want to deploy an Elasticsearch cluster across zones, you do not need to specify each zone. The system selects the zones.

Elasticsearch Nodes

  • You must purchase three dedicated master nodes.
  • The numbers of data nodes, warm nodes, and client nodes must be a multiple of the number of zones.

Elasticsearch Replica shards of indexes

-If your Elasticsearch cluster is deployed across two zones but nodes in one zone fail, the nodes in the remaining zone continue to provide services. Therefore, you must configure at least one replica shard for each index.

  • If your Elasticsearch cluster is deployed across three zones but nodes in one or two of them fail, the nodes in the remaining zones continue to provide services. Therefore, you must configure at least two replica shards for each index.

Elasticsearch Switchover and recovery

  • If nodes in a zone fail, you can perform a switchover for the zone in the Elasticsearch console. To ensure normal read and write operations after the switchover, we recommend that you configure replica shards for indexes. The state of the zone changes from Enabled to Disabled, and the nodes in the zone are removed from your Elasticsearch cluster. Network data sent from clients is then transferred to the remaining zones that are in the Enabled state. To ensure that your Elasticsearch cluster has sufficient computing resources and that read and write operations on indexes are not affected, Elasticsearch adds nodes to the remaining zones that are in the Enabled state. These nodes include dedicated master nodes, client nodes, and data nodes.
  • If the nodes recover, you can perform recovery for the zone in the Elasticsearch console. After the recovery, the state of the zone changes from Disabled to Enabled. Network data sent from clients is then transferred to all zones that are in the Enabled state. Elasticsearch adds the nodes that were removed during the switchover to the zone again. Then, it removes the nodes that were added to the remaining zones during the switchover. When Elasticsearch removes data nodes, it migrates the data on them to other data nodes.

How Does Elasticsearch Work?

Elasticsearch is deployed in the CIDR blocks of Elastic Compute Service (ECS) instances, which is equivalent to purchasing a large number of ECS instances. You may purchase many Elasticsearch clusters, each cluster contains many nodes, and each node is an ECS instance. All ECS instances are deployed in the Virtual Private Cloud (VPC) network of the system and support zone-disaster recovery across zones. This means services are easily deployed in different zones in a region. By configuring IP address mappings between Alibaba Cloud VPCs and your VPCs, you can deploy nodes of each cluster in different zones.

For disaster recovery, nodes regularly backup snapshots to OSS. If a data fault occurs, it's easy and quick to restore data from OSS. Ultra-disks, solid-state drives (SSDs), and on-premises disks are used for overall data storage. Alibaba Cloud Elasticsearch has recently improved its kernel to support storage and computing separation. An Elasticsearch index needs to be sharded for convenient storage. To improve query efficiency, each shard has multiple replicas that improve the speed by expanding the storage space. However, this causes a large amount of redundant data which results in high storage costs. Besides, to improve query efficiency, more memory overhead is incurred when you write data which results in slow write speed. In this situation, Alibaba Cloud Elasticsearch optimizes the kernel by separating storage from computing. This allows it to shards and maps multiple replicas of data to the same physical media. Compared with native Elasticsearch, Alibaba Cloud Elasticsearch reduces storage costs by at least 50%, improves real-time data writing performance by 70%, and improves replica and shard change performance by 99%.


About AWS Elasticsearch (Amazon Elasticsearch Service)

AWS Elasticsearch is a fully managed service that offers easy-to-use Elasticsearch API operations and real-time analytics capabilities. Amazon Elasticsearch Service) also provides the availability, scalability, and security required for production workloads. You may use AWS Elasticsearch to deploy, protect, manage, and scale Elasticsearch clusters for scenarios such as log analysis, full-text search, and application monitoring.

About Alibaba Cloud Elasticsearch

Alibaba Cloud Elasticsearch provides a fully managed Elasticsearch service and is compatible with the open-source version. It optimizes kernel performance and provides commercial features (formerly X-Pack) that are out-of-the-box (OFTB), highly available, elastically scalable, and billed in a pay-as-you-go model. In the following figure, we compare Alibaba Cloud Elasticsearch and other vendors' Elasticsearch products in terms of reliability, security, and system hosting.

AWS Elasticsearch to Alibaba Cloud Elasticsearch Migration

In China's cloud service market, Alibaba Cloud has become popular among developers due to its convenience and stability. This article is intended for customers who want to migrate data from an Amazon Elasticsearch Service (AWS Elasticsearch) domain to an Alibaba Cloud Elasticsearch cluster. The following figure shows the reference architecture for the migration.


Check it here for AWS Elasticsearch to Alibaba Cloud Elasticsearch Migration tutorial.

Related Elasticsearch Product:

Alibaba Cloud Elasticsearch

Alibaba Cloud Elasticsearch is a cloud-based Service that offers built-in integrations such as Kibana, commercial features, and Alibaba Cloud VPC, Cloud Monitor, and Resource Access Management. With Pay-As-You-Go billing, Alibaba Cloud Elasticsearch costs 30% less than self-built solutions and saves you the hassle of maintaining and scaling your platform.

Related Elasticsearch Learning Course

Elasticsearch Learning Course

This course is associated with An Introduction to Elasticsearch. You must purchase the certification package before you can complete all lessons for a certificate.

Related Elasticsearch Blog

Alibaba Cloud Elasticsearch: What's New and Latest Features

This article discussed various aspects of Alibaba Cloud Elasticsearch. It explored how Alibaba Cloud Elasticsearch compares with similar solutions provided by other vendors. It also explained the solution's output modes, architecture, regions in which the service is available, and detailed log analysis by illustrating an example.

Alibaba Cloud Elasticsearch: Building a "Search for Images by Image" Search Engine in Four Steps Using Vector Indexing

"Search by Image" is a relatively common feature in shopping guide websites, and there are many ways to implement it, such as "Hash fingerprint and Hamming distance calculation" and "feature vector and Milvus". In actual scenarios, however, it is difficult to achieve quickness, precision, and simplicity.

Use DTS for Real-time Data Synchronization between ApsaraDB RDS for MySQL and Alibaba Cloud Elasticsearch

Data Transmission Service (DTS) synchronizes production data from an ApsaraDB RDS for MySQL instance to an Alibaba Cloud Elasticsearch instance in real-time after you create a real-time data synchronization task in the DTS console. This article focuses on the various supported real-time synchronization types and SQL operations. It further elaborates on the configuration procedure to support synchronization.

0 0 0
Share on

Alibaba Clouder

2,606 posts | 737 followers

You may also like