AnalyticDB for PostgreSQL provides Retrieval-Augmented Generation (RAG) as a managed service. It handles document ingestion, embedding, hybrid search, and reranking so you can build RAG applications without managing the underlying infrastructure.
At a glance:
Ingest documents in 10+ formats — PDF, DOCX, images with OCR, and more — with automatic chunking and embedding.
Run hybrid search (dense vectors, sparse vectors, and full-text search in a single query) to maximize recall.
Rerank results with BAAI General Embedding (BGE) models or large language models (LLMs) before the generation step.
Integrate with LlamaIndex and LangChain, or call the APIs directly from Python, Java, Go, Node.js, PHP, or C#.
Store all document data in your own AnalyticDB for PostgreSQL instance with disk encryption, SSL encryption, and namespace-level multi-tenancy isolation.
How the service works
The RAG service consists of three modules:
Augmented data processing — Documents are parsed by format-specific extractors and split into chunks. Each chunk is converted into vectors (dense, sparse, or multimodal) and stored in your instance.
Augmented semantic search — Incoming queries are analyzed semantically and matched against stored chunks using hybrid search, two-way retrieval, or fusion query, depending on your configuration.
Augmented retrieval — Retrieved chunks are scored and reranked using fine-grained ranking algorithms to improve relevance and diversity before being returned to the generation step.
Key benefits
Quality of augmented generation — The RAG service uses search mechanisms in addition to language model generation technologies to ensure that generated content is more accurate and closer to actual data, significantly improving the quality and credibility of answers.
Large-scale knowledge fusion — The RAG service accesses and incorporates a variety of data sources, such as enterprise knowledge bases and public network resources, to ensure that generated content covers a wider range of knowledge and meets information requirements for different scenarios.
Flexible APIs — The RAG service provides easy-to-use APIs that can be integrated into existing systems without the need to understand complex AI technologies, enabling scenarios such as intelligent chatbots, automatic report generation, and content creation.
Continuous optimization and learning — The RAG service continuously learns the latest data and customer feedback and automatically adjusts optimization policies to adapt to changing requirements and environments, ensuring long-term service quality and user experience.
Key capabilities
Compatibility
Connect to the RAG service using the method that fits your stack:
Direct API access — Read and write vector data, upload pre-split documents, or upload raw documents for automatic processing.
SDK support — Python, Java, Go, Node.js, PHP, and C#.
RAG framework support — LlamaIndex and LangChain.
Document processing
The service extracts text using format-specific extractors, then splits each document into chunks for embedding.
Supported input formats:
| Format | Extractor |
|---|---|
| PNG, JPG, JPEG, BMP | Optical Character Recognition (OCR) |
| Scanned PDF | OCR (image-with-text metadata is added automatically) |
| PDF (non-scanned) | Python bindings such as PyMuPDF |
| HTML, MARKDOWN, JSON, CSV, DOCX, PPTX, TXT | Text extractor |
Chunking options:
Control chunk size and overlap using the
ChunkSizeandChunkOverlapparameters to avoid embedding inefficiencies caused by oversized tokens.Specify custom separators to split text at logical boundaries.
Embedding
Text embedding models: M3E, Text2Vec, Tongyi
Supported dimensions: 512, 768, 1024, 1536
Multimodal image embedding models: CLIP, Tongyi
Supported dimensions: 512, 640, 768, 1024, 1536
Search
Basic search modes:
| Mode | Description |
|---|---|
| Hybrid search | Searches dense vectors and sparse vectors simultaneously |
| Two-way retrieval | Runs full-text search and vector search simultaneously |
| Fusion query | Filters by conditions first, then performs vector search |
Multi-way ranking algorithms:
| Algorithm | How it ranks |
|---|---|
| Reciprocal rank fusion (RRF) | Uses result positions, not scores |
| Weight | Uses scores, not positions |
| Cascaded | Uses full-text search as a filter, then runs top-K vector search |
Augmented search:
Fine-grained ranking — Retrieves more than top-K chunks, then scores and reranks them using BGE models or LLMs.
Window retrieval — Returns several chunks before and after each matched chunk to preserve context that chunking may have split across boundaries.
Security
| Area | Details |
|---|---|
| Data privacy | Document data is stored in your AnalyticDB for PostgreSQL instance. Protect it with disk encryption, SSL encryption, and IP address whitelists. Destroy or disable access to the data at any time. |
| Multi-tenancy isolation | Use namespaces — similar to database schemas — to isolate document collections within a single instance across organizational boundaries. |
| Authentication | Resource Access Management (RAM) authentication and instance username/password authentication. |
Use cases
Intelligent customer service — Generate professional, personalized answers by grounding responses in your enterprise knowledge base.
Content creation — Produce articles, news summaries, and product descriptions based on preset themes or styles.
Knowledge management — Automatically organize and summarize documents to surface easy-to-learn knowledge points and accelerate team knowledge sharing.
Education and training — Generate custom teaching materials — exercises, case studies — tailored to student needs and course content.
Patent search — Apply text processing optimizers to patent documents for high-quality patent similarity search.