All Products
Search
Document Center

Data Transmission Service:What is AI data preparation

Last Updated:Apr 02, 2026

Building a retrieval-augmented generation (RAG) application requires moving data from source databases into vector stores—a pipeline that typically involves custom extraction code, file staging, chunking logic, and embedding calls. AI data preparation is a feature of Data Transmission Service (DTS) that handles this entire pipeline, connecting directly to your source database and delivering vectorized data to a vector database or data lakehouse without intermediate storage or manual exports.

Use AI data preparation to power enterprise knowledge bases, AI-assisted content creation, and intelligent customer service systems.

How it works

AI data preparation runs a four-stage ingestion pipeline from your source database to the destination vector store:

  1. Connect: DTS connects directly to the source database and pulls both full and incremental data. No file exports or manual uploads are required.

  2. Parse: Raw data—whether unstructured documents or structured relational tables—is parsed into a processable format.

  3. Chunk: Each document is split into segments sized for the embedding model.

  4. Embed: Each chunk is converted into a vector embedding and written to the destination vector database.

As the source data changes, DTS keeps the destination in sync through incremental updates.

Use cases

  • Enterprise knowledge bases: Continuously ingest and update company documents, wikis, and structured records so your RAG application always retrieves current information.

  • AI-assisted content creation: Feed a mixed corpus of unstructured documents and structured data into a unified retrieval index.

  • Intelligent customer service: Connect support documentation and CRM records directly to a RAG pipeline without building custom ingestion code.

Supported data flows

Data preparation tasks

Source Destination
MySQL AnalyticDB for PostgreSQL

RAGFlow knowledge base

Vector database Configuration guide Tutorials
AnalyticDB for PostgreSQL Build and use a DTS RAGFlow knowledge base
Lindorm

Supported regions

  • Data preparation tasks: See List of supported regions.

  • RAGFlow knowledge bases: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), and China (Hong Kong).

Limitations

Data preparation tasks

  • Cross-region tasks are not supported.

  • Create the required table schemas in the destination database before starting a task.

  • Overwriting existing data in the destination database is not supported.

RAGFlow knowledge bases

  • Only the virtual private cloud (VPC) network type is supported.

  • The VPC, vector database, and OSS Bucket must be in the same region.

Billing

For pricing details, see AI data preparation billing methods.