PolarSearch - PolarDB - Alibaba Cloud Documentation Center

PolarSearch is a high-performance distributed data retrieval and analysis engine provided by PolarDB. It is developed based on OpenSearch and is compatible with the Elasticsearch and OpenSearch ecosystems. It allows for millisecond-level full-text retrieval, vector retrieval, and intelligent analysis of multimodal data such as text documents, image features, and logs through API or SDK. You don't need to manually synchronize data from PolarDB to other data retrieval platforms.

Note

This feature is in grayscale release. To use this feature, submit a ticket to enable it.

With PolarSearch, you can perform:

Full-text retrieval

curl -X GET "http://<endpoint>:<port>/articles/_search" -H "Content-Type:application/json" -d '
{
  "query": {
    "match": {
      "content": "PolarSearch"
    }
  }
}'

Vector retrieval

curl -X GET "http://<endpoint>:<port>/my-vector-index/_search" -H "Content-Type:application/json" -d '
{
  "size": 2,
  "query": {
    "knn": {
      "vector_field": {
        "vector": [0.1, 0.5, -0.3, 0.8],
        "k": 2
      }
    }
  }
}'

Technical architecture

PolarSearch leverages the shared distributed storage and a cloud-native compute-storage decoupled architecture provided by PolarStore. It integrates a self-developed intelligent search engine and distributed computing framework, and is compatible with the Elasticsearch DSL syntax protocol. PolarSearch can be used for the storage, analysis, and multimodal real-time retrieval of petabytes of heterogeneous data. It helps you quickly establish high-concurrency and high-availability data search services to unlock the value of data.

Benefits

Improved efficiency: Eliminates the need to manually build a data synchronization pipeline from MySQL to the search engine. Reduces retrieval workload processing time from minutes to milliseconds and shortens the development cycle by 50%.
Cost optimization: Offers an alternative to the conventional "database + file storage + compute engine" architecture, which involves multiple engines and systems. Utilizing PFS, which is a multi-level distributed shared storage, it reduces TCO by 40%.
Business innovation: Leverages the storage and mining of unstructured data and AI vector retrieval capabilities to construct AI infrastructure such as intelligent recommendation, RAG knowledge bases, and agent memory bases.

Use cases

E-commerce content platform and SaaS services

Fuzzy search, semantic matching, and personalized recommendations for product titles and product pages.
Real-time analysis of keywords and sentiment mining in user comments and user-generated content (UGC).

Enterprise RAG knowledge base and document management

Full-text indexing and retrieval for documents in various formats such as PDF and Word.
Vectorized storage of image features for image searches.

Agent memory base and agent data management

Short-term memory for elements such as the context of the current conversation, session context information, and temporary variables.
Long-term memory that stores long-term interaction data such as user preferences, historical query content, and LLM parameters.

Log analysis and service monitoring

Real-time retrieval, aggregation of statistics, and abnormal alerts for petabytes of log data.
Association analysis and visualized reports for multi-dimensional log fields.

Internet of things and real-time IoT data stream

Massive concurrent writes to and rapid retrieval of time series data from IoT devices.
Dynamic aggregation and multi-condition filtering of sensor data streams.

Core features

High availability and scalability

The distributed architecture ensures automatic load balancing and seamless switchover in the event of a single node failure, achieving a service availability of 99.99%.
Online scaling is supported. Storage and computing resources are expanded as needed to handle hundreds of millions of data.

Intelligent search engine

Supports the creation of inverted secondary indexes for InnoDB primary table data on primary nodes, offering transaction-level visibility.
Supports full-text index queries on InnoDB primary table data to be identified by the optimizer and automatically routed to the search node for retrieval.
Supports a multi-dimensional mixed index that consists of text segmentation, semantic vectorization, and numerical range, which enhances query performance by more than 10 times.
Provides a built-in Chinese NLP enhanced model and achieves advanced features including synonym expansion, pinyin correction, and intent recognition.

Multimodal data fusion

Supports unified storage and multi-channel fusion retrieval for various data types, including scalar forward indexes, full-text inverted indexes, and vector indexes.
Offers storage, retrieval, and content parsing extensions for large amounts of heterogeneous unstructured data, including images and documents.

Real-time retrieval and aggregation analysis

Data is retrievable within hundreds of milliseconds after being written. Operations such as complex condition filtering, bucket statistics, and Top K sorting are supported.
Built-in scenario-based functions are provided for time-series data rolling window calculation and geofence identification scenarios.