Building a retrieval-augmented generation (RAG) application requires moving data from source databases into vector stores—a pipeline that typically involves custom extraction code, file staging, chunking logic, and embedding calls. AI data preparation is a feature of Data Transmission Service (DTS) that handles this entire pipeline, connecting directly to your source database and delivering vectorized data to a vector database or data lakehouse without intermediate storage or manual exports.
Use AI data preparation to power enterprise knowledge bases, AI-assisted content creation, and intelligent customer service systems.
How it works
AI data preparation runs a four-stage ingestion pipeline from your source database to the destination vector store:
-
Connect: DTS connects directly to the source database and pulls both full and incremental data. No file exports or manual uploads are required.
-
Parse: Raw data—whether unstructured documents or structured relational tables—is parsed into a processable format.
-
Chunk: Each document is split into segments sized for the embedding model.
-
Embed: Each chunk is converted into a vector embedding and written to the destination vector database.
As the source data changes, DTS keeps the destination in sync through incremental updates.
Use cases
-
Enterprise knowledge bases: Continuously ingest and update company documents, wikis, and structured records so your RAG application always retrieves current information.
-
AI-assisted content creation: Feed a mixed corpus of unstructured documents and structured data into a unified retrieval index.
-
Intelligent customer service: Connect support documentation and CRM records directly to a RAG pipeline without building custom ingestion code.
Supported data flows
Data preparation tasks
| Source | Destination |
|---|---|
| MySQL | AnalyticDB for PostgreSQL |
RAGFlow knowledge base
| Vector database | Configuration guide | Tutorials |
|---|---|---|
| AnalyticDB for PostgreSQL | Build and use a DTS RAGFlow knowledge base | |
| Lindorm |
Supported regions
-
Data preparation tasks: See List of supported regions.
-
RAGFlow knowledge bases: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), and China (Hong Kong).
Limitations
Data preparation tasks
-
Cross-region tasks are not supported.
-
Create the required table schemas in the destination database before starting a task.
-
Overwriting existing data in the destination database is not supported.
RAGFlow knowledge bases
-
Only the virtual private cloud (VPC) network type is supported.
-
The VPC, vector database, and OSS Bucket must be in the same region.
Billing
For pricing details, see AI data preparation billing methods.