All Products
Search
Document Center

AnalyticDB:ChatWithKnowledgeBaseStream

Last Updated:Mar 21, 2026

This service integrates a Knowledge Base with a Large Language Model (LLM) to provide intelligent Q&A. You can call its Streaming API using server-sent events (SSE) or the Java async SDK.

Operation description

This API interacts with a Large Language Model (LLM) to generate answers grounded in content from specified Knowledge Bases. You can customize the request by configuring parameters for the Database Instance, Knowledge Retrieval, and Model Inference. The API includes a default System Prompt Template but also supports a custom one.

  • DBInstanceId: Required. The ID of the Database Instance.

  • KnowledgeParams: Optional. Parameters for Knowledge Retrieval, including the content to retrieve and the Merge Strategy.

  • ModelParams: Required. Parameters for Model Inference, including the message list and the model name.

  • PromptTemplate: Optional. A custom System Prompt Template.

Try it now

Try this API in OpenAPI Explorer, no manual signing needed. Successful calls auto-generate SDK code matching your parameters. Download it with built-in credential security for local usage.

Test

RAM authorization

The table below describes the authorization required to call this API. You can define it in a Resource Access Management (RAM) policy. The table's columns are detailed below:

  • Action: The actions can be used in the Action element of RAM permission policy statements to grant permissions to perform the operation.

  • API: The API that you can call to perform the action.

  • Access level: The predefined level of access granted for each API. Valid values: create, list, get, update, and delete.

  • Resource type: The type of the resource that supports authorization to perform the action. It indicates if the action supports resource-level permission. The specified resource must be compatible with the action. Otherwise, the policy will be ineffective.

    • For APIs with resource-level permissions, required resource types are marked with an asterisk (*). Specify the corresponding Alibaba Cloud Resource Name (ARN) in the Resource element of the policy.

    • For APIs without resource-level permissions, it is shown as All Resources. Use an asterisk (*) in the Resource element of the policy.

  • Condition key: The condition keys defined by the service. The key allows for granular control, applying to either actions alone or actions associated with specific resources. In addition to service-specific condition keys, Alibaba Cloud provides a set of common condition keys applicable across all RAM-supported services.

  • Dependent action: The dependent actions required to run the action. To complete the action, the RAM user or the RAM role must have the permissions to perform all dependent actions.

Action

Access level

Resource type

Condition key

Dependent action

gpdb:ChatWithKnowledgeBaseStream

get

*DBInstance

acs:gpdb:{#regionId}:{#accountId}:dbinstance/{#DBInstanceId}

None None

Request syntax

POST / HTTP/1.1

Request parameters

Parameter

Type

Required

Description

Example

DBInstanceId

string

Yes

The ID of the instance.

Note

You can call the DescribeDBInstances operation to query the IDs of all AnalyticDB for PostgreSQL instances in a specific region.

gp-xxxxxxxxx

RegionId

string

No

The region ID of the instance.

cn-hangzhou

KnowledgeParams

object

No

The parameters for knowledge retrieval. If this parameter is not specified, only a chat is performed.

MergeMethod

string

No

The method for merging the results from multiple knowledge bases. Default value: RRF.

  • RRF

  • Weight

"RRF"

MergeMethodArgs

object

No

The parameters for merging the results from multiple knowledge bases.

Rrf

object

No

The parameters that can be configured when MergeMethod is set to RRF.

K

integer

No

Specifies the constant k in the 1/(k + rank_i) algorithm for calculating the score. The value must be a positive integer greater than 1.

60

Weight

object

No

The parameters that can be configured when MergeMethod is set to Weight.

Weights

array

No

The weight array of each SourceCollection.

number

No

The weight of each SourceCollection.

0.01

RerankFactor

number

No

The reranking factor. If this parameter is not empty, the vector retrieval results are reranked. Valid values: (1, 5].

Note
  • If documents are sparsely chunked, the reranking efficiency is slow.

  • We recommend that the number of reranked documents, which is calculated by using the ceil(TopK × RerankFactor) formula, does not exceed 50.

5.0

SourceCollection

array<object>

Yes

The knowledge base.

array<object>

No

The knowledge base.

Collection

string

Yes

The name of the collection to be recalled.

cloud_index_adb_50943_prod

Namespace

string

No

The namespace.

Note

You can call the ListNamespaces operation to view the list.

ddstar_vector

NamespacePassword

string

Yes

The password of the namespace.

Note

This parameter is specified when you call the CreateNamespace operation.

namespacePassword

QueryParams

object

No

The parameters related to the retrieval of the knowledge base.

Filter

string

No

The filter conditions for the data to be updated, in the format of a SQL WHERE clause.

method_id='e41695f0-2851-40ac-b21d-dd337b60d71c'

GraphEnhance

boolean

No

Specifies whether to enable knowledge graph enhancement. Default value: false.

true

GraphSearchArgs

object

No

The parameters for knowledge graph retrieval.

GraphTopK

integer

No

The number of top entities and relationship edges to be returned. Default value: 60.

60

HybridSearch

string

No

The dual-retrieval algorithm. By default, this parameter is empty. The system directly compares the scores of vector retrieval and full-text retrieval and then sorts the scores.

Valid values:

  • RRF: Reciprocal Rank Fusion. A parameter k is used to control the fusion effect. For more information, see the HybridSearchArgs parameter.

  • Weight: weighted sort. A parameter alpha is used to control the score ratio of vector retrieval and full-text retrieval, and then the scores are sorted. For more information, see the HybridSearchArgs parameter.

  • Cascaded: performs full-text retrieval first, and then performs vector retrieval based on the full-text retrieval results.

Cascaded

HybridSearchArgs

object

No

The algorithm parameters for dual-retrieval. RRF and Weight are supported.

  • RRF: specifies the constant k in the 1/(k+rank_i) algorithm for calculating scores. The value must be a positive integer greater than 1. Format:

{ 
   "RRF": {
    "k": 60
   }
}
  • Weight: The calculation formula is alpha * vector_score + (1-alpha) * text_score. The alpha parameter indicates the score ratio of vector retrieval to full-text retrieval. The value of this parameter ranges from 0 to 1. 0 indicates that only full-text retrieval is performed, and 1 indicates that only vector retrieval is performed.

{ 
   "Weight": {
    "alpha": 0.5
   }
}

any

No

{\"RRF\":{\"k\":60}}

Metrics

string

No

The method for building a vector index. Valid values:

  • l2: Euclidean distance.

  • ip: inner product distance.

  • cosine: cosine similarity.

cosine

RecallWindow

array

No

The recall window. When this parameter is not empty, the context of the returned retrieval results is added. The format is a two-element array [A, B]. -10 <= A <= 0 and 0 <= B <= 10.

Note
  • This parameter is recommended when documents are fragmented and retrieval may lose contextual information.

  • Reranking takes precedence over windowing, which means that the results are first reranked and then windowed.

integer

No

The recall window. When this parameter is not empty, the context of the returned retrieval results is added. The format is a two-element array [A, B]. -10 <= A <= 0 and 0 <= B <= 10.

Note
  • This parameter is recommended when documents are fragmented and retrieval may lose contextual information.

  • Reranking takes precedence over windowing, which means that the results are first reranked and then windowed.

[-1,1]

RerankFactor

number

No

The reranking factor. If this parameter is not empty, the vector retrieval results are reranked. Valid values: (1, 5].

Note
  • If documents are sparsely chunked, the reranking efficiency is slow.

  • We recommend that the number of reranked documents, which is calculated by using the ceil(TopK × RerankFactor) formula, does not exceed 50.

2.0

TopK

integer

No

Specifies the number of top results to be returned.

101

UseFullTextRetrieval

boolean

No

Specifies whether to use full-text retrieval (dual-retrieval). Default value: false. Only vector retrieval is used.

true

TopK

integer

No

After the results are retrieved from multiple vector sets and merged, this parameter specifies the number of top results to be returned.

10

PromptParams

string

No

The system prompt template, which must include {{ text_chunks }},{{ user_system_prompt }},{{ graph_entities }},{{ graph_relations }}. If this parameter is not specified, this part does not take effect.

"参考以下知识回答问题:{{ text_chunks }}"

ModelParams

object

Yes

The large language model (LLM) call parameter object.

MaxTokens

integer

No

The maximum number of tokens to be generated.

8192

Messages

array<object>

Yes

The message list.

object

No

The message list.

Content

string

No

The content of the message.

你是一个有帮助的助手。

Role

string

No

The role of the message. Valid values:

  • system

  • user

  • assistant

user

Model

string

Yes

The name of the large model to be used. For more information about the options, see Bailian Documentation.

qwen-plus

N

integer

No

The number of candidate replies to be generated.

1

PresencePenalty

number

No

The presence penalty. Value range: [-2.0, 2.0].

1.0

Seed

integer

No

The random seed.

42

Stop

array

No

The list of stop words.

string

No

The stop word.

"\n"

Temperature

number

No

The sampling temperature. Value range: (0, 2).

0.6

Tools

array<object>

No

The list of tools.

array<object>

No

The details of a tool.

Function

object

No

The function information.

Description

string

No

The description of the function tool.

获取天气。

Name

string

No

The name of the function tool.

get_weather

Parameters

any

No

The function parameter JSON Schema.

{"type": "object", ...}

TopP

number

No

The probability threshold for nucleus sampling. Value range: (0, 1).

0.9

IncludeKnowledgeBaseResults

boolean

No

Specifies whether to return the retrieval results. Default value: false.

false

Response elements

Element

Type

Description

Example

object

The response schema.

RequestId

string

The ID of the request.

ABB39CC3-4488-4857-905D-2E4A051D0521

MultiCollectionRecallResult

object

The recall results from multiple knowledge bases.

Entities

array

Details of the entities.

string

Details of an entity.

{'entities': []}

Matches

array<object>

A list of retrieved items.

array<object>

A retrieved item.

Content

string

The document content.

ADBPG向量数据库。

FileName

string

The file name.

a14b0221-e3f2-4cf2-96cd-b3c293510770.jpg

FileURL

string

The public URL of the retrieved file. This URL is valid for two hours by default.

You can customize this duration with the UrlExpiration parameter.

http://dailyshort-sh.oss-cn-shanghai.aliyuncs.com/vod-8efba5/f06147795c6c71f080605420848d0302/0ca34d5743a84bf7c68f489a60715dac-ld.mp4

Id

string

The unique ID of the retrieved data entry.

Note

273e3fc7-8f56-4167-a1bb-d35d2f3b9043

LoaderMetadata

any

The metadata generated when the document is loaded.

{"page":1}

Metadata

object

The custom metadata associated with the data entry.

any

RerankScore

number

The reranking score.

0.12

RetrievalSource

integer

The retrieval method used for this item. Valid values: 1 for Vector Search, 2 for Full-Text Search, and 3 for Hybrid Recall.

0.12

Score

number

The similarity score of this data entry, which is determined by the distance metric (l2, ip, or cosine) specified during index creation.

10

Vector

array

The vector data.

number

A single value in the vector array.

[]

Relations

array

The names of the relations.

string

Details of a relationship edge.

{'relations': []}

RequestId

string

The ID of the request.

ABB39CC3-4488-4857-905D-2E4A051D0521

Status

string

The status of the recall operation. Valid values:

  • success: The operation succeeded.

  • fail: The operation failed.

success

Tokens

integer

The number of tokens consumed.

42

Usage

object

Usage statistics for the document embedding process.

EmbeddingTokens

integer

The number of tokens used for vectorization.

Note

A token is the smallest unit of text that is processed. It can be a word, a phrase, a punctuation mark, or a character.

158

ChatCompletion

object

The response from the model.

Choices

array<object>

A list of generated completion choices.

array<object>

A generated completion choice.

FinishReason

string

The reason the model stopped generating output.

finish

Index

integer

The index of the response choice.

0

Message

object

The message object returned by the model.

Content

string

The message content.

杭州的天气是晴天。

Role

string

The role of the message author. Valid values:

  • system

  • user

  • assistant

user

ToolCalls

array<object>

The tool calls generated by the model for the client to invoke.

array<object>

A single tool call generated by the model.

Id

string

The ID of the tool call.

"chatcmpl-c1bebafa-cc48-44e2-88c6-1a3572952f8e"

Function

object

The function that the model chose to invoke.

Arguments

string

The arguments to pass to the function.

{"city":"hangzhou"}

Name

string

The name of the function to invoke.

"get_weather"

Index

integer

The index of the tool definition in the tools array from the original request.

1

ReasoningContent

string

The content of the model's Chain of Thought (CoT).

逻辑推导过程

Created

integer

The Unix timestamp (in seconds) when the response was created.

1758529748

Id

string

The ID of the response.

273e3fc7-8f56-4167-a1bb-d35d2f3b9043

Model

string

The model that generated the response.

qwen-plus

Usage

object

Usage statistics for the completion request.

CompletionTokens

integer

The number of tokens in the generated response.

42

PromptTokens

integer

The number of tokens in the input prompt.

42

PromptTokensDetails

object

Details about the prompt tokens.

CachedTokens

integer

The number of prompt tokens that were retrieved from the cache.

24

TotalTokens

integer

The total number of tokens used in the request, including both prompt and completion tokens.

42

Message

string

A summary message for the operation.

Successful

Status

string

The status of the operation. Valid values:

  • success: Indicates the operation succeeded.

  • fail: Indicates the operation failed.

success

Examples

Success response

JSON format

{
  "RequestId": "ABB39CC3-4488-4857-905D-2E4A051D0521",
  "MultiCollectionRecallResult": {
    "Entities": [
      "{'entities': []}"
    ],
    "Matches": [
      {
        "Content": "ADBPG向量数据库。\n",
        "FileName": "a14b0221-e3f2-4cf2-96cd-b3c293510770.jpg",
        "FileURL": "http://dailyshort-sh.oss-cn-shanghai.aliyuncs.com/vod-8efba5/f06147795c6c71f080605420848d0302/0ca34d5743a84bf7c68f489a60715dac-ld.mp4",
        "Id": "273e3fc7-8f56-4167-a1bb-d35d2f3b9043",
        "LoaderMetadata": "{\"page\":1}\n",
        "Metadata": {
          "key": ""
        },
        "RerankScore": 0.12,
        "RetrievalSource": 0.12,
        "Score": 10,
        "Vector": [
          0
        ]
      }
    ],
    "Relations": [
      "{'relations': []}"
    ],
    "RequestId": "ABB39CC3-4488-4857-905D-2E4A051D0521",
    "Status": "success",
    "Tokens": 42,
    "Usage": {
      "EmbeddingTokens": 158
    }
  },
  "ChatCompletion": {
    "Choices": [
      {
        "FinishReason": "finish",
        "Index": 0,
        "Message": {
          "Content": "杭州的天气是晴天。\n",
          "Role": "user",
          "ToolCalls": [
            {
              "Id": "\"chatcmpl-c1bebafa-cc48-44e2-88c6-1a3572952f8e\"\n",
              "Function": {
                "Arguments": "{\"city\":\"hangzhou\"}\n",
                "Name": "\"get_weather\"\n"
              },
              "Index": 1
            }
          ],
          "ReasoningContent": "逻辑推导过程"
        }
      }
    ],
    "Created": 1758529748,
    "Id": "273e3fc7-8f56-4167-a1bb-d35d2f3b9043\n",
    "Model": "qwen-plus\n",
    "Usage": {
      "CompletionTokens": 42,
      "PromptTokens": 42,
      "PromptTokensDetails": {
        "CachedTokens": 24
      },
      "TotalTokens": 42
    }
  },
  "Message": "Successful",
  "Status": "success"
}

Error codes

See Error Codes for a complete list.

Release notes

See Release Notes for a complete list.