AISearch for PolarDB for MySQL integrates vector similarity search into the database kernel, enabling efficient similarity searches on vectors generated from unstructured data, such as text, images, and audio. Build AI applications — semantic search, recommendations, and image retrieval — within your PolarDB cluster without a separate vector database or data sync pipeline. PolarDB supports two AISearch protocols: a fully compatible MySQL protocol and the OpenSearch protocol (PolarSearch), which is compatible with the mainstream search ecosystem.
Key concepts
-
Vector embedding: Converts unstructured data (text, images) into numerical vectors using a pre-trained model. These vectors capture semantic meaning for similarity comparison.
-
Similarity search (k-NN): Finds the k vectors closest to a query vector in a dataset. Distance is calculated using mathematical formulas and represents semantic similarity.
-
Vector index: A data structure that narrows the search scope based on vector distribution, reducing query latency from seconds to milliseconds while maintaining high recall. Required to avoid brute-force comparison in large datasets.
-
Comparison of vector index algorithms: PolarDB supports multiple vector index algorithms. The two most common are HNSW and IVF, with different trade-offs:
-
HNSW (Hierarchical Navigable Small World): Graph-based index with high recall and low latency, but higher memory overhead. Best when the dataset fits in memory and precision is critical.
-
IVF (Inverted File): Clustering-based index with lower memory usage. Suited for large datasets in memory-constrained environments, with slightly lower precision than HNSW.
-
Key advantages
The PolarDB vector engine combines relational database usability with dedicated vector database performance.
|
Feature |
Traditional relational database |
Dedicated vector database |
PolarDB vector engine |
|
SQL support |
Supported |
Not supported |
Supported |
|
Vector search |
Poor performance |
High performance |
High performance |
|
Learning curve |
Low |
High |
Low |
|
Ecosystem compatibility |
Rich |
Limited |
Dual ecosystem |
|
Scalability |
Limited |
Good |
Excellent |
|
O&M complexity |
Simple |
Complex |
Simple |
Core advantages:
-
All-in-one solution: Manage business and vector data in a single PolarDB cluster — no separate vector database, simpler architecture, lower O&M costs.
-
Enterprise-grade reliability: Full ACID transaction support for data consistency, with high availability and automatic failover.
-
Seamless LLM integration: Built-in Qwen LLM inference simplifies AI application development.
Capabilities and performance
Search performance
-
Latency: P99 < 10 ms, P95 < 5 ms.
-
Throughput: > 10,000 QPS per node.
-
Precision: Recall > 99%.
Scalability
-
Data scale: Petabytes of vector data; billions of vectors searchable.
-
Concurrency: Tens of thousands of concurrent queries.
-
Cluster size: Scales dynamically to hundreds of nodes with intelligent data sharding.
Resource efficiency
-
Storage compression: Over 50% vector data compression.
-
Memory usage: Terabyte-scale graph indexes via layered caching.
-
CPU utilization: Exceeds 80% via multi-core parallelism.
Use cases
PolarDB offers two protocols. Choose based on your technology stack and requirements.
|
Aspect |
MySQL protocol |
OpenSearch protocol |
|
Access method |
Standard SQL |
RESTful API (compatible with Elasticsearch/OpenSearch) |
|
Key advantages |
Integration with business data: Add vector columns to existing tables with full ACID transaction support and a low learning curve. |
Hybrid search capabilities: Combine vectors, full-text search, and scalars in a single query, backed by a mature ecosystem. |
|
Underlying dependency |
Depends on In-Memory Column Index (IMCI). Runs on IMCI read-only nodes to isolate analytical from transactional workloads. |
Runs on an independent search node (PolarSearch) to provide a search engine-like service. |
|
Data synchronization |
Not required. Data written to the primary database is automatically visible to the IMCI read-only nodes. |
Not required. Data written to the primary database is automatically visible to the search node. |
Intelligent Q&A and customer service bots
-
Business challenge: Keyword matching fails to capture user intent, leading to inaccurate answers.
-
Solution: Convert knowledge base Q&A pairs into vectors. When a user asks a question, convert it to a vector and use similarity search to find the most relevant answers.
-
Protocol recommendation:
-
MySQL protocol: Best for basic semantic matching with quick integration into existing MySQL applications.
-
OpenSearch protocol: Best for complex filtering with keywords, categories, or other criteria using hybrid search.
-
Personalized recommendation systems
-
Business challenge: Recommending products or content based on user behavior (views, clicks, purchases).
-
Solution: Represent users and items (products, articles, videos) as vectors. Calculate similarity to retrieve candidates, then pass them to a ranking model.
-
Protocol recommendation:
-
OpenSearch protocol: IVF indexing and Product Quantization (PQ) handle large-scale retrieval while controlling memory costs.
-
Search by image and multi-modal retrieval
-
Business challenge: Finding similar images by uploading an image, or finding images from text descriptions.
-
Solution: Store image and text vectors in PolarDB. Convert any input (image or text) to a query vector for similarity search.
-
Protocol recommendation:
-
MySQL protocol: Essential when you need transactional consistency between image vectors and business data (product IDs, prices).
-