PolarSearch - Elasticsearch-compatible search for PolarDB - PolarDB

PolarSearch is PolarDB's distributed search and analytics engine built on OpenSearch. Compatible with Elasticsearch and OpenSearch ecosystems, it delivers millisecond-level full-text search, vector search, and multimodal data analysis via APIs and SDKs. With the built-in AutoETL feature, PolarDB data is automatically aggregated and synchronized to PolarSearch in real time with no manual data pipelines required.

With PolarSearch, you can perform:

Full-text retrieval

curl -X GET "http://<endpoint>:<port>/articles/_search" -H "Content-Type:application/json" -d '
{
  "query": {
    "match": {
      "content": "PolarSearch"
    }
  }
}'

Vector retrieval

curl -X GET "http://<endpoint>:<port>/my-vector-index/_search" -H "Content-Type:application/json" -d '
{
  "size": 2,
  "query": {
    "knn": {
      "vector_field": {
        "vector": [0.1, 0.5, -0.3, 0.8],
        "k": 2
      }
    }
  }
}'

Technical architecture

PolarSearch runs on PolarStore's shared distributed storage with a cloud-native compute-storage decoupled architecture. It integrates a proprietary search engine and distributed computing framework compatible with Elasticsearch DSL, storing, analyzing, and retrieving petabytes of multimodal data in real time.

Benefits

Improved efficiency: Eliminates data sync pipelines from or PostgreSQL to a search engine, reducing retrieval workload latency from minutes to milliseconds and cutting development time by 50%.
Cost optimization: Replaces the conventional "database + file storage + compute engine" architecture with PFS multi-level distributed shared storage, reducing TCO by 40%.
Business innovation: Powers AI infrastructure — intelligent recommendations, RAG knowledge bases, and agent memory bases — through unstructured data storage, mining, and vector retrieval.

Use cases

E-commerce platforms and SaaS

Fuzzy search, semantic matching, and personalized recommendations for product titles and product pages.
Real-time keyword analysis and sentiment mining in user-generated content (UGC).

RAG knowledge bases and document management

Full-text indexing and retrieval across PDF, Word, and other document formats.
Vector-based image feature storage for visual search.

Agent memory and data management

Short-term memory for conversation context, session state, and temporary variables.
Long-term memory for user preferences, historical query content, and LLM parameters.

Log analysis and service monitoring

Real-time retrieval, statistical aggregation, and anomaly alerts across petabytes of log data.
Association analysis and visual reports across multi-dimensional log fields.

IoT and real-time data streams

High-throughput ingestion and fast retrieval of time-series data from IoT devices.
Dynamic aggregation and multi-condition filtering of sensor data.

Core features

High availability and scalability

Automatic load balancing and seamless failover on single-node failure, achieving 99.99% service availability.
Online scaling of storage and compute to handle hundreds of millions of records.

Intelligent search engine

The optimizer identifies full-text index queries on InnoDB primary table data and routes them to search nodes.
Mixed indexes combining text segmentation, semantic vectorization, and numerical range improve query performance by over 10x.
Built-in Chinese NLP model with synonym expansion, pinyin correction, and intent recognition.
Multiple tokenization plugins with support for uploading and updating custom dictionaries.

Multimodal data fusion

Unified storage and retrieval across scalar forward indexes, full-text inverted indexes, and vector indexes.
Extends storage, retrieval, and content parsing for unstructured data such as images and documents.

Real-time retrieval and analytics

Data becomes searchable within hundreds of milliseconds of being written. Supports complex filtering, bucket statistics, and Top K sorting.
Built-in functions for time-series rolling window calculations and geofence detection.

Automated data aggregation and real-time sync (AutoETL)

Aggregates data from single or multiple PolarDB sources and syncs it to PolarSearch in real time.