PolarVector vector search engine - PolarDB for MySQL - PolarDB

AISearch for PolarDB for MySQL integrates vector similarity search into the database kernel, enabling efficient similarity searches on vectors generated from unstructured data, such as text, images, and audio. Build AI applications — semantic search, recommendations, and image retrieval — within your PolarDB cluster without a separate vector database or data sync pipeline. PolarDB supports two AISearch protocols: a fully compatible MySQL protocol and the OpenSearch protocol (PolarSearch), which is compatible with the mainstream search ecosystem.

Key concepts

Vector embedding: Converts unstructured data (text, images) into numerical vectors using a pre-trained model. These vectors capture semantic meaning for similarity comparison.
Similarity search (k-NN): Finds the k vectors closest to a query vector in a dataset. Distance is calculated using mathematical formulas and represents semantic similarity.
Vector index: A data structure that narrows the search scope based on vector distribution, reducing query latency from seconds to milliseconds while maintaining high recall. Required to avoid brute-force comparison in large datasets.
Comparison of vector index algorithms: PolarDB supports multiple vector index algorithms. The two most common are HNSW and IVF, with different trade-offs:
- HNSW (Hierarchical Navigable Small World): Graph-based index with high recall and low latency, but higher memory overhead. Best when the dataset fits in memory and precision is critical.
- IVF (Inverted File): Clustering-based index with lower memory usage. Suited for large datasets in memory-constrained environments, with slightly lower precision than HNSW.

Key advantages

The PolarDB vector engine combines relational database usability with dedicated vector database performance.

Feature	Traditional relational database	Dedicated vector database	PolarDB vector engine
SQL support	Supported	Not supported	Supported
Vector search	Poor performance	High performance	High performance
Learning curve	Low	High	Low
Ecosystem compatibility	Rich	Limited	Dual ecosystem
Scalability	Limited	Good	Excellent
O&M complexity	Simple	Complex	Simple

Core advantages:

All-in-one solution: Manage business and vector data in a single PolarDB cluster — no separate vector database, simpler architecture, lower O&M costs.
Enterprise-grade reliability: Full ACID transaction support for data consistency, with high availability and automatic failover.
Seamless LLM integration: Built-in Qwen LLM inference simplifies AI application development.

Capabilities and performance

Search performance

Latency: P99 < 10 ms, P95 < 5 ms.
Throughput: > 10,000 QPS per node.
Precision: Recall > 99%.

Scalability

Data scale: Petabytes of vector data; billions of vectors searchable.
Concurrency: Tens of thousands of concurrent queries.
Cluster size: Scales dynamically to hundreds of nodes with intelligent data sharding.

Resource efficiency

Storage compression: Over 50% vector data compression.
Memory usage: Terabyte-scale graph indexes via layered caching.
CPU utilization: Exceeds 80% via multi-core parallelism.

Use cases

PolarDB offers two protocols. Choose based on your technology stack and requirements.

Aspect	MySQL protocol	OpenSearch protocol
Access method	Standard SQL	RESTful API (compatible with Elasticsearch/OpenSearch)
Key advantages	Integration with business data: Add vector columns to existing tables with full ACID transaction support and a low learning curve.	Hybrid search capabilities: Combine vectors, full-text search, and scalars in a single query, backed by a mature ecosystem.
Underlying dependency	Depends on In-Memory Column Index (IMCI). Runs on IMCI read-only nodes to isolate analytical from transactional workloads.	Runs on an independent search node (PolarSearch) to provide a search engine-like service.
Data synchronization	Not required. Data written to the primary database is automatically visible to the IMCI read-only nodes.	Not required. Data written to the primary database is automatically visible to the search node.

Intelligent Q&A and customer service bots

Business challenge: Keyword matching fails to capture user intent, leading to inaccurate answers.
Solution: Convert knowledge base Q&A pairs into vectors. When a user asks a question, convert it to a vector and use similarity search to find the most relevant answers.
Protocol recommendation:
- MySQL protocol: Best for basic semantic matching with quick integration into existing MySQL applications.
- OpenSearch protocol: Best for complex filtering with keywords, categories, or other criteria using hybrid search.

Personalized recommendation systems

Business challenge: Recommending products or content based on user behavior (views, clicks, purchases).
Solution: Represent users and items (products, articles, videos) as vectors. Calculate similarity to retrieve candidates, then pass them to a ranking model.
Protocol recommendation:
- OpenSearch protocol: IVF indexing and Product Quantization (PQ) handle large-scale retrieval while controlling memory costs.

Search by image and multi-modal retrieval

Business challenge: Finding similar images by uploading an image, or finding images from text descriptions.
Solution: Store image and text vectors in PolarDB. Convert any input (image or text) to a query vector for similarity search.
Protocol recommendation:
- MySQL protocol: Essential when you need transactional consistency between image vectors and business data (product IDs, prices).

FAQ

What are the advantages of the PolarDB vector engine compared to a dedicated vector database, such as Milvus?

Integrated architecture and lower total cost of ownership (TCO).

No data synchronization required: Vector and business data reside in one system, eliminating the data pipelines of traditional "MySQL + vector database" architectures.
Simplified technology stack: No separate database system needed, lowering development, O&M, and training costs.
Transactional consistency: The MySQL protocol lets you complete vector and business data operations in a single transaction.

Why does AISearch with the MySQL protocol require IMCI read-only nodes?

AISearch runs compute-intensive analytical queries. IMCI read-only nodes isolate these from typically OLTP workloads on the primary node, ensuring both workload isolation and performance optimization.

Columnar index read-only nodes use columnar storage and parallel computing to accelerate vector queries.
Physical isolation prevents vector queries from affecting core service stability and response time.