Elasticsearch Machine Learning is a tool and a framework that use the machine learning technology to analyze and predict Elasticsearch data. The application of natural language processing (NLP) models in Elasticsearch enables Elasticsearch to have machine learning capabilities such as sentiment analysis, entity recognition, text classification, and question answering, which help improve the search experience of users and reduce the difficulty in using Elasticsearch. This topic describes the application of Elasticsearch Machine Learning and the application of text embedding models in Elasticsearch. This topic also describes Elastic Eland.
Background information
Elasticsearch is a Lucene-based search engine. It provides the full-text search feature and extended features such as machine learning. Elasticsearch Machine Learning is mainly used to perform anomaly detection, predictive analysis, and other analysis on time series data. In Elasticsearch V8.X, the machine learning feature is continuously improved to provide more capabilities, such as integration with BERT technologies and support for NLP tasks. The improvements allow you to use Elasticsearch in scenarios such as sentiment analysis, entity recognition, text classification, and question answering. The application of NLP models in Elasticsearch helps improve the search experience of users and reduce the difficulty in using Elasticsearch.
Application of Elasticsearch Machine Learning
Analysis type | Description | References |
Elasticsearch Machine Learning is mainly used to detect anomalous behavior in time series data, such as anomalous behavior in log files and financial transactions. In this process, unsupervised learning is implemented. Elasticsearch uses a statistical model to detect outliers and unusual patterns in data. | ||
Elasticsearch Machine Learning can be used to classify and perform regression analysis on structured data. In this process, supervised learning is implemented. The supervised learning mode is suitable for scenarios in which questions are defined and related data tags are determined. | ||
Elasticsearch Machine Learning can be integrated with other NLP and machine learning tools to support tasks such as text classification and entity recognition. Transformer models that use the BERT model structure and the WordPiece algorithm are supported. Note The frameworks that are supported by different Elasticsearch versions vary. The frameworks that are supported by open source Elasticsearch of the Community edition prevail. In most cases, models that are trained based on the supported frameworks can be deployed in Elasticsearch by using Elastic Eland. You can check whether models can be deployed based on the results of the compatibility test that is performed on the models and related API operations. |
Application of text embedding models in Elasticsearch
Search is one of the core features of Elasticsearch. Full-text search and analysis depend on the underlying search capability of Elasticsearch. You can leverage the capability to find the required information in large amounts of data. Elasticsearch provides multiple types of built-in text analyzers and tokenizers, such as the standard tokenizer, Ngram tokenizer, and Pinyin tokenizer. The text analyzers and tokenizers mainly index and analyze data based on the literal forms of text, lacking capabilities such as semantic understanding, context perception, and ambiguity elimination. To resolve the issue, Alibaba Cloud combines Elasticsearch with text embedding models. The text embedding models can provide more abundant semantic representations based on contexts, eliminate ambiguity in semantics, and improve the quality of search and analysis to achieve deeper semantic understanding and context perception.
Alibaba Cloud allows you to upload third-party text embedding models to Elasticsearch clusters of V7.11 or later and combine the models with Elasticsearch ingestion pipelines to convert text information into vector data based on the capabilities of the models before index creation. You can also use external services such as Alibaba Cloud Model Studio to complete the conversion of text information into vector data outside Elasticsearch clusters and then write the vector data to an Elasticsearch cluster. This can reduce the preprocessing load on the Elasticsearch cluster and the resource consumption, and improve the stability of write and query performance. The performance of different models in benchmark tests and tasks varies. You can select a model based on your business requirements. The following table lists the references for using the methods that can be used to convert text information into vector data.
Tool | References |
Alibaba Cloud Model Studio | |
Elastic Eland |
Introduction to Elastic Eland
Elastic Eland is a Python client of Elasticsearch. Elastic Eland provides an integrated solution that allows you to transform a pre-training model in the Transformer library of Hugging Face into a TorchScript model, split the TorchScript model into chunks, and then import the chunks to Elasticsearch. A TorchScript model can run in an environment that does not have a Python interpreter.
Only the open source Elasticsearch Platinum edition and open source Elasticsearch Enterprise edition support the use of Elastic Eland to upload models. Alibaba Cloud subscribes to Elasticsearch of the Platinum edition by default. You can directly upload models to Alibaba Cloud Elasticsearch. You can use Elastic Eland to upload models in online or offline mode. Open source Elasticsearch recommends Hugging Face models.
Elasticsearch clusters of V7.11 or later support Elastic Eland. You can refer to the documentation of open source Elasticsearch to learn more about the compatibility between Elasticsearch versions and Elastic Eland.