All Products
Search
Document Center

AnalyticDB:Generate high-quality Q&A pairs using GraphRAG

Last Updated:Dec 18, 2025

This topic describes how to use the GraphRAG feature of AnalyticDB for PostgreSQL to generate high-quality Q&A pairs.

Traditional method for generating Q&A pairs

Modern Q&A-based intelligent customer service systems rely on Q&A knowledge bases to build their conversational capabilities. However, traditional methods for building these knowledge bases have significant limitations. Existing knowledge bases are typically built from the historical response records of human customer service agents. High-frequency, high-quality Q&A pairs are periodically selected and added to the knowledge base. This experience-driven approach has the following problems:

  • Knowledge updates lag, creating a reactive response mechanism. When product documentation or business rules change, the knowledge base is updated only when triggered by user questions. This creates a reactive cycle where knowledge is updated only in response to user demand. This delay reduces the timeliness and efficiency of the knowledge base. This can also lead to customer misunderstandings or service risks due to inconsistent information.

  • Content quality is inconsistent and requires manual screening. The quality of human responses varies because it is affected by an agent's individual knowledge and communication style. This requires significant manual effort for post-processing and quality control to meet the standards for inclusion in the knowledge base. This greatly increases maintenance costs.

  • Effectiveness is limited during the cold-start phase and requires frequent human intervention. In the initial stage of building a knowledge base, the system's service capability is weak because of a lack of high-quality Q&A data. Human agents must frequently intervene during user interactions. This creates a heavy workload and limits the rapid deployment and use of the intelligent customer service system.

Generate Q&A pairs with LLMs

To address issues with traditional manual methods for building question-and-answer (QA) knowledge bases, such as inefficiency and inconsistent quality, the industry has adopted large language model (LLM) technology to automate the batch generation of QA pairs. However, in practice, workflows built on low-code development platforms such as Dify exhibit several systematic defects when generating QA pairs. Empirical studies show that current automated processes for information extraction from documents primarily face the following problems:

  • Inconsistent generation quality and poor attention to detail. Real-world tests show that the quality of Q&A pairs generated by Dify workflows is inconsistent. When processing complex or highly technical documents, the process often overlooks implicit knowledge points and semantic details in the text. This results in Q&A pairs that lack accuracy and completeness and fail to meet the precise service needs of business scenarios.

  • Weak cross-document knowledge integration. Dify workflows only support information extraction from single documents. They cannot link and merge knowledge from multiple documents or build a knowledge graph network with semantic connections. As a result, the generated Q&A pairs are limited to a local context and lack a global perspective. This restricts the comprehensive reasoning and generalization capabilities of the Q&A system.

  • Prompts rely on manual tuning, which limits automation. To improve generation quality, prompts require frequent adjustment and optimization. This process depends heavily on manual intervention. This approach not only raises the barrier to entry but also reduces the overall automation level of the process. This affects the feasibility of large-scale deployment and continuous iteration.

Generate Q&A pairs with GraphRAG

The GraphRAG service is a rapidly deployable retrieval-augmented generation (RAG) solution from AnalyticDB for PostgreSQL. It deeply integrates knowledge graph capabilities. Compared to traditional vector-based RAG methods, GraphRAG has significant advantages in complex relationship modeling, multi-hop reasoning, and knowledge association across multiple documents.

image

The overall service flow is divided into three core stages:

  • Indexing: Uses knowledge extraction models to extract knowledge from documents, generate knowledge graphs, and save them to the graph analysis engine.

  • Retrieval: Uses knowledge extraction models to extract keywords from queries, traverses subgraphs in the AnalyticDB for PostgreSQL graph analysis engine, and searches for related subgraphs.

  • Generation: Submits the query and related subgraph context to the large language model to generate results.

Prerequisites

Generate high-quality queries

The GraphRAG service automatically extracts content from user-uploaded documents into vector representations and a structured knowledge graph. This data is stored in the AnalyticDB for PostgreSQL graph analytics engine. In this system, users can obtain high-quality answers by simply entering natural language questions (queries). Therefore, the core task of building high-quality Q&A samples has shifted from "how to generate answers" to "how to generate high-quality queries".

Because AnalyticDB for PostgreSQL integrates knowledge graph information extracted from multiple documents and has cross-document relationship modeling capabilities, you only need to provide the correct semantic guidance. This activates the entity and relationship network in the graph to generate high-quality queries that are contextually relevant and span across documents.

Based on this feature, Alibaba Cloud proposes a "meta-query" method. This method uses instructions to guide the large language model to automatically generate diverse and semantically rich queries from multiple documents and functional modules.

Meta-query example

In the dialog box on the Retrieval page of the GraphRAG application, you can enter the following text to generate queries.

Based on the content of Document 1, Document 2, and Document 3, and from the perspectives of Module 1, Module 2, and Module 3, extract 50 high-quality questions. These questions should address various issues that users might have when using this product.

Generate high-quality answers

You can submit the generated queries one by one using the retrieval feature of the AnalyticDB for PostgreSQL GraphRAG application. The system uses its combined vector and knowledge graph capabilities to return high-quality, structured answers that span across documents. After you confirm that the generated Q&A pairs meet the quality requirements, you can import them into the knowledge base. This completes the automated construction of the Q&A data.

Business case

The following is a comparison of the actual output from a leading e-commerce customer that generated Q&A pairs using a Dify workflow and the AnalyticDB for PostgreSQL GraphRAG feature. This comparison highlights the differences between the two methods in terms of answer quality, information completeness, and cross-document understanding.

Generated queries

Queries generated by the Dify workflow

Queries generated by GraphRAG

  • What are the main functional modules of the sales performance report?

  • What order types are excluded by default from the sales performance report data?

  • How do I set annual goals and sales promotion goals?

  • How do I monitor goal completion progress?

  • How do I identify the specific reasons for performance fluctuations?

  • What are the generation rules for periodic reports (daily, weekly, monthly)?

  • How do I export detailed data?

  • How do I customize the data display fields?

  • How are data permissions controlled for different user roles?

  • What are the definitions of the core metrics?

  • How do I switch the data view (platform/distributor/store group)?

  • How do I view data on a mobile device?

  • Can goals be broken down for each team?

  • What is period-over-period progress?

  • How do I view the average shipping time?

  • How do I view the real-time data portal?

  • How do I view the active stores in a store group?

Questions about product names:

  • What is the real-time data portal?

  • What are the main features of the sales performance report?

  • What are the main features of the data intelligence mobile application?

  • What products are included in global management?

  • What are the application scenarios for the real-time data dashboard?

Questions about main features:

  • How does the real-time data portal help users view sales dynamics?

  • How does the sales performance report support the sales team's performance evaluation?

  • How does the mobile application improve the flexibility of data access?

  • What are the specific features of the performance monitoring module?

  • What key metrics are displayed on the core metric cards?

Questions about feature paths:

  • How do I navigate from the sales performance report to the real-time data portal?

  • How can users select a specific store or distributor in the sales performance report?

  • How does configuring statistical rules affect the data displayed in the sales performance report?

  • How does the product manual help users understand and use the sales performance report?

  • How do I use the feedback feature to suggest improvements?

Questions about scenarios:

  • How can a boss or operations manager use the sales performance report to set sales goals?

  • How can operations staff analyze the causes of performance fluctuations using the performance overview module?

  • How are daily, weekly, and monthly reports used for internal reporting and performance evaluation?

  • How does setting sales goals for store groups help merchants with data classification?

  • How does the view switching feature enhance the flexibility of using the report?

Questions about key metrics:

  • How is net sales calculated?

  • What is the core function of the sales amount?

  • What customer purchasing behaviors are reflected in the number of sales orders?

  • What orders are excluded by default from the core metric cards?

  • What data is used for the comparative analysis of performance completion progress?

Questions about data definitions:

  • In the default definitions, how are special orders and statistically excluded orders handled?

  • How does splitting or not splitting combo products affect the statistical results?

  • What is the difference between the confirmation status of an ERP after-sales order and the after-sales status of a platform order?

  • Why might the data in the real-time data portal be inconsistent with the order page?

  • Are canceled orders included in the statistics of the real-time data portal?

Questions about after-sales service:

  • What are the specific features of real-time after-sales alerts?

  • How does after-sales analysis for best-selling products help merchants optimize after-sales service?

  • How does analyzing the reasons for after-sales issues with products help reduce the return rate?

  • How does analyzing the reasons for after-sales issues by channel improve the quality of after-sales service?

  • How do real-time logistics alerts help merchants deal with logistics problems?

Questions about sales performance:

  • How does the performance monitoring module help merchants set sales goals?

  • From what dimensions does the performance overview module display detailed data?

  • How does the performance report module generate reports for different periods?

  • How can the store/distributor dimension be used for fine-grained operational analysis?

  • How do permission scopes control the data content visible to RAM users?

Other related questions:

  • How does the feedback feature enhance the user experience?

  • How does the quick query feature improve user efficiency?

  • How does the annual goal card display annual sales performance?

  • How does the performance trend graph display this year's and last year's performance data?

  • How does switching views affect the data summary on the annual goal card?

Comprehensive questions:

  • Why can't some accounts see the content of the performance monitoring section?

  • Why does the data seen by different accounts vary?

  • Are entity codes based on regular product codes or sub-product statistics?

  • Why can't I see order data from stall-based sales in my self-operated store data?

  • How does the real-time data portal ensure data security and accuracy?

A comparison of the queries generated by Dify and GraphRAG shows the following:

  • The queries generated by Dify are mostly simple and limited in scope. They usually focus on a single feature or document keyword. Users can find the answers directly in the document by searching for the keywords.

  • The queries generated by GraphRAG are more complex and practical. They cover knowledge dimensions from local to global and are closer to how real users ask questions. These queries often involve cross-document understanding and multi-hop reasoning, which demonstrates stronger semantic expression and practical value.

Generated answers

Answer generated by the Dify workflow

Answer generated by GraphRAG

How can I generate periodic performance reports for review and assessment?

The performance report module in the sales performance report provides reports for different periods, such as daily, weekly, and monthly. These reports are divided into four dimensions: summary, platform/distribution, store/distributor, and store group. The monthly report also includes performance goals and completion progress, which allows operations staff to quickly generate periodic performance reports.

How does the sales performance report support sales team performance evaluation?

The sales performance report is an important tool for sales team performance evaluation. It provides multi-dimensional data analytics and goal monitoring features to help teams efficiently evaluate sales performance, identify problems, and optimize strategies. The sales performance report supports sales team performance evaluation in the following main ways.

  1. Goal setting and progress monitoring.

    The sales performance report allows teams to set annual, monthly, and sales promotion goals, and break them down into store groups or individual stores. This way, sales teams can clearly understand the goal completion status for each team or store.

    Annual and monthly goals: Sales teams can set goals based on sales amount or net sales amount and monitor completion progress in real time. For example, if a team has not met its goal, the system displays the gap between the current performance and the goal, which makes it easier to adjust strategies promptly.

    Sales promotion goals: Before major promotional events (such as 6.18, 11.11, and 12.12), teams can set promotion goals in advance and use the report to track the comparison between actual performance and the goal.

  2. Multi-dimensional data analytics.

    The sales performance report provides detailed data metrics from multiple dimensions to help teams deeply analyze the reasons for performance changes.

    Platform/distribution dimension: The report provides sales data for platforms and distributors, which makes it easy for teams to identify which channels are performing well or have shortcomings.

    Store/distributor dimension: By analyzing the performance of specific stores or distributors, teams can identify growth points and problem areas.

    Product dimension: The report can be further broken down to the product level. This helps teams understand which products are driving performance growth and which are dragging down overall performance.

  3. Trend analysis and historical comparison.

    The sales performance report provides a more intuitive perspective for performance evaluation through trend graphs and historical data comparisons.

    Daily/monthly trend graphs: Teams can choose to view daily performance trends for the current month versus the previous month, or monthly performance trends for this year versus last year, to identify periodic fluctuation patterns.

    Period-over-period and year-over-year analysis: The report calculates the period-over-period and year-over-year change rates for key metrics such as net sales amount and sales amount. This helps teams evaluate short-term performance changes and long-term growth trends.

  4. Refund and after-sales analysis.

    The sales performance report also integrates refund and after-sales data to help teams comprehensively evaluate sales performance.

    Refund rate analysis: The report provides statistics on the proportion of refunds-only before shipping, refunds-only after shipping, and returns with refunds, which helps teams identify stages with high refund rates.

    Payment refund amount and quantity: By analyzing the amount and number of confirmed refunds among successfully paid orders, teams can assess the impact of refunds on net sales and optimize their operational strategies.

  5. Quick query and export features.

    The sales performance report supports flexible query and export features, which makes it easy for teams to quickly obtain the data they need.

    Quick query: Teams can save frequently used query conditions based on different views (such as platform/store or distributor) to improve work efficiency.

    Data export: The report supports exporting detailed records, with a single export limit of 30,000 records, which is convenient for teams to perform batch analytics or reporting.

  6. Real-time data support.

    The sales performance report is used in conjunction with the real-time data portal to provide teams with the latest sales data support.

    Today's overview: Teams can obtain key metrics for the day, such as sales amount, net sales amount, and refund amount, through the real-time data portal to stay on top of business dynamics.

    Performance monitoring section: This section displays the performance completion rate for each store group, helping teams track sales progress in real time.

Summary

The sales performance report provides sales teams with a comprehensive and efficient performance evaluation tool. It includes features such as goal setting, multi-dimensional data analytics, trend analysis, refund and after-sales assessment, quick queries, and real-time data support. These features not only help teams accurately identify the reasons for performance growth or decline but also guide them in optimizing operational strategies to improve overall sales performance.

References

[KG] New Version of Sales Performance Report.pdf

[KG] Data Portal.pdf

[DC] New Version of Sales Performance Report.pdf

[KG] relation.txt

[KG] Real-time Product Analysis.pdf

For the same query input, the answer from the Dify workflow is limited to quoting and repeating content from a single document. It lacks contextual understanding and information integration capabilities. In contrast, GraphRAG uses its knowledge graph and vector search mechanisms to effectively integrate related knowledge from multiple documents. It outputs a more logical and information-dense answer, which significantly improves the quality and practicality of the Q&A result.

References

For more information, see Using the GraphRAG service.