Using the GraphRAG service - AnalyticDB - Alibaba Cloud Documentation Center

Traditional Retrieval-Augmented Generation (RAG) is a powerful technique, but it has limitations. It often struggles with complex questions that require understanding the relationships between different pieces of information. This is because standard RAG relies on finding text chunks that are merely similar to a query, not necessarily logically connected. GraphRAG enhances this process by building a knowledge graph from your documents. This allows your AI application to perform more sophisticated reasoning, understand complex relationships, and deliver more accurate and contextually aware answers.

How GraphRAG works

GraphRAG introduces a knowledge graph into the RAG pipeline, transforming the process into three main stages:

Indexing: Uses knowledge extraction models to extract knowledge from documents, generate knowledge graphs, and save them to the graph analysis engine.
Retrieval: Uses knowledge extraction models to extract keywords from queries, traverses subgraphs in the AnalyticDB for PostgreSQL graph analysis engine, and searches for related subgraphs.
Generation: Submits the query and related subgraph context to the large language model to generate results.

Version limitations

AnalyticDB for PostgreSQL 7.0 instances with kernel version 7.2.1.3 or later.
Versions 7.3.0.0 and 7.3.1.0 do not support the adbpg_graphrag plugin.

Note

You can view the minor version on the Basic Information page of an instance in the AnalyticDB for PostgreSQL console. If your instance does not meet the required versions, update the minor version of the instance.

Step 1: Install the required extensions

The plpython3u, age, and adbpg_graphrag plugins are automatically installed on AnalyticDB for PostgreSQL version 7.0 instances with kernel version 7.2.1.4 or later. You only need to add ag_catalog to the search_path as described above.

Verify the plpython3u extension This extension is usually installed by default. To verify, connect to your database and run:

SELECT * FROM pg_extension WHERE extname = 'plpython3u';

The following result indicates that the plpython3u extension is successfully installed. If no result is returned, the extension is not installed in the public schema of the specified database.

  oid  |  extname   | extowner | extnamespace | extrelocatable | extversion | extconfig | extcondition 
-------+------------+----------+--------------+----------------+------------+-----------+--------------
 14674 | plpython3u |       10 |           11 | f              | 1.0        |           | 
(1 row)

Install and configure the age extension.
1. The ag_catalog schema is added to the search path specified by the search_path parameter to simplify queries. You can use one of the following methods:
  - Session-level configuration.
```
SET search_path TO public, ag_catalog;
```
  - Database-level permanent configuration.
```
ALTER DATABASE <database_name> SET search_path TO public, ag_catalog;
```
2. (Optional) The initial account or a privileged account that has the RDS_SUPERUSER permission is used to grant other users access to the ag_catalog schema.
```
GRANT USAGE ON SCHEMA ag_catalog TO <username>;
```
To install the adbpg_graphrag extension, please contact technical support for assistance.
The adbpg_graphrag extension depends on the plpython3u and age extensions. Before installing this extension, make sure the dependent extensions are installed.

Step 2: Initialize the GraphRAG service

Before you can use the service, you must initialize it with your configuration settings. This only needs to be done once per session or until the server restarts.

Syntax

SELECT adbpg_graphrag.initialize(config json);

Parameters

The adbpg_graphrag.initialize function takes a JSON variable as an input parameter to configure the GraphRAG service. The configuration parameters are described as follows:

Parameter	Description
llm_model	Specifies the large language model used by GraphRAG. The default is qwen-max-2025-01-25.
llm_api_key	The API key required to call the large language model.
llm_url	The API operation address of the large language model. By default, the API of Alibaba Cloud Model Studio is used. Note If you use an llm_url from Alibaba Cloud Model Studio, such as or https://dashscope-intl.aliyuncs.com/compatible-mode/v1, enable Internet NAT gateway or configure Access model or application APIs over a private network. Public NAT is not required for PAI resources or AnalyticDB for PostgreSQL AI Node resources located in the same VPC.
embedding_model	Specifies the embedding model used by GraphRAG. The default is text-embedding-v3.
language	The default language used by GraphRAG. The default is English. English: English. Simplified Chinese: Simplified Chinese.
entity_types	Specifies the types of entity nodes to extract when parsing documents to generate knowledge graphs.
relationship_types	Specifies the types of relationship edges to extract when parsing documents to generate knowledge graphs.
first_node_content	When using the decision tree mode (tree), specifies the initial node response. The default is "Hello, how can I help you?"
end_node_content	When using the decision tree mode (tree), specifies the final node response. The default is "Do you have any other questions?"
global_distance_threshold	When using the decision tree mode (tree), specifies the threshold for selecting global search results among local and global results. The default is 0.1, which means that when the similarity between the global search and the customer's reply is greater than 0.1 compared to the local search, the global search result is selected.

Example

SELECT adbpg_graphrag.initialize(
	$$
	{
		"llm_model" : "qwen-max-2025-01-25",
		"llm_api_key" : "sk-****************",
		"llm_url" : "https://dashscope.aliyuncs.com/compatible-mode/v1",
		"embedding_model" : "text-embedding-v3",
		"language" : "Simplified Chinese",
		"entity_types" : ["Business Scenario", "Product", "Feature", "Usage Logic", "Data Metric", "Data Caliber"],
		"relationship_types": ["Contains", "Belongs to", "Uses", "Depends on", "Associates with", "Solves"]
	}
	$$
);

Step 3: Upload documents to build the knowledge graph

Upload your documents one by one. The service will process the text to extract entities and relationships, populating the knowledge graph.

Syntax

SELECT adbpg_graphrag.upload(filename text, context text);

Parameters

The adbpg_graphrag.upload function is used to upload file content. After uploading a file, GraphRAG performs a series of tasks such as text splitting, vector generation, and knowledge graph extraction.

filename: The name of the file.
context: The content of the file.

Example

SELECT adbpg_graphrag.upload('ProductInfo.txt', 'The Xiaomi customer service system can provide question answering, leave application, knowledge base search, and other functions.');

Step 4: Query your knowledge graph

Once your documents are uploaded, you can start asking questions. You can specify different query modes to control the retrieval strategy.

Syntax

SELECT adbpg_graphrag.query(query_str text);
SELECT adbpg_graphrag.query(query_str text, query_mode text);

Parameters

query_str: The question to ask
query_mode: The query mode. GraphRAG supports the following query modes:
- mix (default): Uses vector matching and knowledge graphs to obtain relevant knowledge and provide it to the large language model for reference.
- bypass: Does not use any vector or knowledge graph queries, directly asks the large language model.
- naive: Only uses vector queries to obtain relevant knowledge and provide it to the large language model for reference.
- local: Only uses entity nodes in the knowledge graph to obtain relevant knowledge and provide it to the large language model for reference.
- global: Only uses relationship edges in the knowledge graph to obtain relevant knowledge and provide it to the large language model for reference.
- hybrid: Uses both entity nodes and relationship edges in the knowledge graph to obtain relevant knowledge and provide it to the large language model for reference.
- tree: Decision tree mode, stores knowledge as a Q&A tree and uses this mode for queries and answers.

Example

SELECT adbpg_graphrag.query('What functions does Xiaomi have?', 'hybrid');

Usage tips

Choosing the right LLM

When initializing the service, the llm_model you choose involves a trade-off between quality and speed. Based on our testing:

For the highest quality answers: Use qwen-max. It provides the most accurate and nuanced responses.
For the fastest response times: Use qwen3-32b. It's ideal for applications where low latency is a priority.
For a balance of quality and speed: Use qwen-plus-latest.

Processing documents with images

The GraphRAG service processes text content and has limited support for embedded images.

Best Practice:

Before uploading, use a multimodal LLM (like qwen-vl-max) to convert any images in your documents into detailed text descriptions. Then, include these text descriptions in the content you upload to GraphRAG.

Here is an example prompt you can use to generate the text descriptions:

You are a professional image reading assistant, responsible for generating a detailed interpretation and comprehensive summary for the image provided below.
---Requirements---
1. If the image has distinct modules, please extract and provide as many detailed information as possible for each module.
2. Based on the extracted information, generate a summary for that module.
3. If there are charts in the image, please summarize the available detailed information based on the charts.
4. If there are valid icons in the image, such as flag icons, pictograms, metaphorical icons, tool icons, and hybrid icons, also focus on extracting detailed information from these icons.
5. Based on the extracted information, summarize and provide effective global information.
---Notes---
Ensure writing in third person and include module names to provide complete context.
*Identify the main language in the image* (accounting for more than 80% of the language), and use *the main language of the image* as the output language.