This topic describes how to use the GraphRAG feature of AnalyticDB for PostgreSQL to generate high-quality Q&A pairs.
Traditional method for generating Q&A pairs
Modern Q&A-based intelligent customer service systems rely on Q&A knowledge bases to build their conversational capabilities. However, traditional methods for building these knowledge bases have significant limitations. Existing knowledge bases are typically built from the historical response records of human customer service agents. High-frequency, high-quality Q&A pairs are periodically selected and added to the knowledge base. This experience-driven approach has the following problems:
Knowledge updates lag, creating a reactive response mechanism. When product documentation or business rules change, the knowledge base is updated only when triggered by user questions. This creates a reactive cycle where knowledge is updated only in response to user demand. This delay reduces the timeliness and efficiency of the knowledge base. This can also lead to customer misunderstandings or service risks due to inconsistent information.
Content quality is inconsistent and requires manual screening. The quality of human responses varies because it is affected by an agent's individual knowledge and communication style. This requires significant manual effort for post-processing and quality control to meet the standards for inclusion in the knowledge base. This greatly increases maintenance costs.
Effectiveness is limited during the cold-start phase and requires frequent human intervention. In the initial stage of building a knowledge base, the system's service capability is weak because of a lack of high-quality Q&A data. Human agents must frequently intervene during user interactions. This creates a heavy workload and limits the rapid deployment and use of the intelligent customer service system.
Generate Q&A pairs with LLMs
To address issues with traditional manual methods for building question-and-answer (QA) knowledge bases, such as inefficiency and inconsistent quality, the industry has adopted large language model (LLM) technology to automate the batch generation of QA pairs. However, in practice, workflows built on low-code development platforms such as Dify exhibit several systematic defects when generating QA pairs. Empirical studies show that current automated processes for information extraction from documents primarily face the following problems:
Inconsistent generation quality and poor attention to detail. Real-world tests show that the quality of Q&A pairs generated by Dify workflows is inconsistent. When processing complex or highly technical documents, the process often overlooks implicit knowledge points and semantic details in the text. This results in Q&A pairs that lack accuracy and completeness and fail to meet the precise service needs of business scenarios.
Weak cross-document knowledge integration. Dify workflows only support information extraction from single documents. They cannot link and merge knowledge from multiple documents or build a knowledge graph network with semantic connections. As a result, the generated Q&A pairs are limited to a local context and lack a global perspective. This restricts the comprehensive reasoning and generalization capabilities of the Q&A system.
Prompts rely on manual tuning, which limits automation. To improve generation quality, prompts require frequent adjustment and optimization. This process depends heavily on manual intervention. This approach not only raises the barrier to entry but also reduces the overall automation level of the process. This affects the feasibility of large-scale deployment and continuous iteration.
Generate Q&A pairs with GraphRAG
The GraphRAG service is a rapidly deployable retrieval-augmented generation (RAG) solution from AnalyticDB for PostgreSQL. It deeply integrates knowledge graph capabilities. Compared to traditional vector-based RAG methods, GraphRAG has significant advantages in complex relationship modeling, multi-hop reasoning, and knowledge association across multiple documents.
The overall service flow is divided into three core stages:
Indexing: Uses knowledge extraction models to extract knowledge from documents, generate knowledge graphs, and save them to the graph analysis engine.
Retrieval: Uses knowledge extraction models to extract keywords from queries, traverses subgraphs in the AnalyticDB for PostgreSQL graph analysis engine, and searches for related subgraphs.
Generation: Submits the query and related subgraph context to the large language model to generate results.
Prerequisites
You have set up a GraphRAG service application.
You have uploaded the relevant documents to the GraphRAG application.
Generate high-quality queries
The GraphRAG service automatically extracts content from user-uploaded documents into vector representations and a structured knowledge graph. This data is stored in the AnalyticDB for PostgreSQL graph analytics engine. In this system, users can obtain high-quality answers by simply entering natural language questions (queries). Therefore, the core task of building high-quality Q&A samples has shifted from "how to generate answers" to "how to generate high-quality queries".
Because AnalyticDB for PostgreSQL integrates knowledge graph information extracted from multiple documents and has cross-document relationship modeling capabilities, you only need to provide the correct semantic guidance. This activates the entity and relationship network in the graph to generate high-quality queries that are contextually relevant and span across documents.
Based on this feature, Alibaba Cloud proposes a "meta-query" method. This method uses instructions to guide the large language model to automatically generate diverse and semantically rich queries from multiple documents and functional modules.
Meta-query example
In the dialog box on the Retrieval page of the GraphRAG application, you can enter the following text to generate queries.
Based on the content of Document 1, Document 2, and Document 3, and from the perspectives of Module 1, Module 2, and Module 3, extract 50 high-quality questions. These questions should address various issues that users might have when using this product.
Generate high-quality answers
You can submit the generated queries one by one using the retrieval feature of the AnalyticDB for PostgreSQL GraphRAG application. The system uses its combined vector and knowledge graph capabilities to return high-quality, structured answers that span across documents. After you confirm that the generated Q&A pairs meet the quality requirements, you can import them into the knowledge base. This completes the automated construction of the Q&A data.
Business case
The following is a comparison of the actual output from a leading e-commerce customer that generated Q&A pairs using a Dify workflow and the AnalyticDB for PostgreSQL GraphRAG feature. This comparison highlights the differences between the two methods in terms of answer quality, information completeness, and cross-document understanding.
Generated queries
Queries generated by the Dify workflow | Queries generated by GraphRAG |
| Questions about product names:
Questions about main features:
Questions about feature paths:
Questions about scenarios:
Questions about key metrics:
Questions about data definitions:
Questions about after-sales service:
Questions about sales performance:
Other related questions:
Comprehensive questions:
|
A comparison of the queries generated by Dify and GraphRAG shows the following:
The queries generated by Dify are mostly simple and limited in scope. They usually focus on a single feature or document keyword. Users can find the answers directly in the document by searching for the keywords.
The queries generated by GraphRAG are more complex and practical. They cover knowledge dimensions from local to global and are closer to how real users ask questions. These queries often involve cross-document understanding and multi-hop reasoning, which demonstrates stronger semantic expression and practical value.
Generated answers
Answer generated by the Dify workflow | Answer generated by GraphRAG |
How can I generate periodic performance reports for review and assessment? The performance report module in the sales performance report provides reports for different periods, such as daily, weekly, and monthly. These reports are divided into four dimensions: summary, platform/distribution, store/distributor, and store group. The monthly report also includes performance goals and completion progress, which allows operations staff to quickly generate periodic performance reports. | How does the sales performance report support sales team performance evaluation? The sales performance report is an important tool for sales team performance evaluation. It provides multi-dimensional data analytics and goal monitoring features to help teams efficiently evaluate sales performance, identify problems, and optimize strategies. The sales performance report supports sales team performance evaluation in the following main ways.
Summary The sales performance report provides sales teams with a comprehensive and efficient performance evaluation tool. It includes features such as goal setting, multi-dimensional data analytics, trend analysis, refund and after-sales assessment, quick queries, and real-time data support. These features not only help teams accurately identify the reasons for performance growth or decline but also guide them in optimizing operational strategies to improve overall sales performance. References [KG] New Version of Sales Performance Report.pdf [KG] Data Portal.pdf [DC] New Version of Sales Performance Report.pdf [KG] relation.txt [KG] Real-time Product Analysis.pdf |
For the same query input, the answer from the Dify workflow is limited to quoting and repeating content from a single document. It lacks contextual understanding and information integration capabilities. In contrast, GraphRAG uses its knowledge graph and vector search mechanisms to effectively integrate related knowledge from multiple documents. It outputs a more logical and information-dense answer, which significantly improves the quality and practicality of the Q&A result.
References
For more information, see Using the GraphRAG service.