UploadDocumentAsync - AnalyticDB - Alibaba Cloud Documentation Center

This topic is generated by a machine translation engine without any human intervention. ALIBABA CLOUD DOES NOT GUARANTEE THE ACCURACY OF MACHINE TRANSLATED CONTENT. To request a human-translated version of this topic or provide feedback on this translation, please include it in the feedback form.

Asynchronous Document Upload

Operation description

The server loads and chunks a document based on the file extension, performs vectorization by using the embedding model that is specified when you call the CreateDocumentCollection operation, and then writes the document to the specified document collection. This operation supports multi-modal embedding for various formats of text and images.

Related operations:

You can call the GetUploadDocumentJob operation to query the progress and result of a document upload job.
You can call the CancelUploadDocumentJob operation to cancel a document upload job.

Note

After a document upload request is submitted, the request is queued for processing. Up to 20 documents in the Pending and Running states can be processed within a Resource Access Management (RAM) user or Alibaba Cloud account.
A text document can be split into up to 100,000 chunks.
If a document collection uses the OnePeace model, each RAM user or Alibaba Cloud account can upload and query up to 10,000 images.

Debugging

You can run this interface directly in OpenAPI Explorer, saving you the trouble of calculating signatures. After running successfully, OpenAPI Explorer can automatically generate SDK code samples.

Debug

Authorization information

The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:

Operation: the value that you can use in the Action element to specify the operation on a resource.
Access level: the access level of each operation. The levels are read, write, and list.
Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
- For mandatory resource types, indicate with a prefix of * .
- If the permissions cannot be granted at the resource level, All Resources is used in the Resource type column of the operation.
Condition Key: the condition key that is defined by the cloud service.
Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.

Operation	Access level	Resource type	Condition key	Associated operation
gpdb:UploadDocumentAsync	create	*Document `acs:gpdb:{#regionId}:{#accountId}:document/{#DBInstanceId}`	none	none

Request parameters

Parameter	Type	Required	Description	Example
DBInstanceId	string	Yes	Instance ID with vector engine optimization acceleration enabled. You can call the DescribeDBInstances API to view details of all AnalyticDB PostgreSQL instances in the target region, including the instance ID.	gp-bp12ga6v69h86****
Collection	string	Yes	The name of the document library. Note Created by the CreateDocumentCollection API. You can call the ListDocumentCollections API to view the document libraries that have already been created.	document
Namespace	string	No	Namespace, default is public. You can create one through the CreateNamespace interface and view the list via the ListNamespaces interface.	mynamespace
NamespacePassword	string	Yes	The password corresponding to the namespace. > This value is specified by the CreateNamespace interface.	testpassword
RegionId	string	Yes	The region ID of the instance.	cn-hangzhou
FileName	string	Yes	The name of the file being uploaded. Note File name: .json, .md, and .pdf. Images: .bmp,. jpg,. jpeg,. png, and. tiff. Compressed packages. The package file name must contain an extension: .tar, .gz, and .zip.	mydoc.txt
FileUrl	string	Yes	The URL of the publicly accessible document. Note It is recommended to call this interface using the SDK, which provides a method called UploadDocumentAsyncAdvance for directly uploading local files. > - If the URL points to an image archive, the number of images in the archive should not exceed 100.	https://xx/mydoc.txt
Metadata	object	No	The metadata. The value of this parameter must be the same as the Metadata parameter that is specified when you call the CreateDocumentCollection operation.
	any	No	The metadata. The value of this parameter must be the same as the Metadata parameter that is specified when you call the CreateDocumentCollection operation.	{"title":"mytitle","page":1}
ChunkSize	integer	No	Strategy for processing large data: the size of each chunk when the data is split into smaller parts. Maximum value is 2048.	250
ChunkOverlap	integer	No	The size of data that is overlapped between consecutive chunks. The maximum value of this parameter cannot be greater than the value of the ChunkSize parameter. Note This parameter is used to prevent context missing that may occur due to data truncation. For example, when you upload a long text, you can retain specific overlapped text content between consecutive chunks to better understand the context.	50
Separators	array	No	The separators that are used to split large amounts of data. Note This is an important parameter that determines the chunking effect. This parameter is related to the splitter that is specified by the TextSplitterName parameter. In most cases, you do not need to specify this parameter. The server assigns separators based on the value of the TextSplitterName parameter.
	string	No	The separator.	.
DryRun	boolean	No	Specifies whether to perform only document understanding and chunking, but not vectorization and storage. Default value: false. Note You can set this parameter to true, check the chunking effect, and then perform optimization if needed.	false
ZhTitleEnhance	boolean	No	Specifies whether to enable title enhancement. Note You can determine the title text, mark the text in the metadata, and then combine the text with the upper-level title to implement text enhancement.	false
TextSplitterName	string	No	The name of the separator. Valid values: ChineseRecursiveTextSplitter: Inherits from RecursiveCharacterTextSplitter and, by default, uses the delimiters`["\n\n","\n", "。 \|! \|?", "\.\s\|\! \s\|\?\s", ";\|;\s", ",\|,\s"]` , employing regular expressions to match text. RecursiveCharacterTextSplitter: Uses the delimiters `["\n\n", "\n", " ", ""]` by default. The splitter supports splitting code in languages such as C++, Go, Java, JS, PHP, Proto, Python, RST, Ruby, Rust, Scala, Swift, Markdown, LaTeX, HTML, Sol, and C Sharp. SpacyTextSplitter: Uses the delimiters `\n\n` by default and leverages the spaCy en_core_web_sm model. The splitter can achieve better text splitting performance. MarkdownHeaderTextSplitter: Splits text in the [("#", "head1"), ("##", "head2"), ("###", "head3"), ("####", "head4") format. This splitter works well with Markdown text. LLMSplitter: Use LLM to split text. The default model is qwen3-8b. Currently, this splitter works only when ADBPGLoader is selected.	ChineseRecursiveTextSplitter
DocumentLoaderName	string	No	Specifies the document loader to use for processing the file. If this parameter is omitted, the system automatically selects a loader based on the file's extension.Valid Values:[List of valid loader names would go here] Valid values: UnstructuredHTMLLoader: .html UnstructuredMarkdownLoader: .md PyMuPDFLoader: .pdf PyPDFLoader: .pdf RapidOCRPDFLoader: .pdf PDFWithImageRefLoader: .pdf (with the text-image association feature) JSONLoader: .json CSVLoader: .csv RapidOCRLoader: .png, .jpg, .jpeg, and .bmp UnstructuredFileLoader: .eml, .msg, .rst, .txt, .docx, .epub, .odt, .pptx, and .tsv ADBPGLoader (free of charge for the first 3,000 pages): .pdf, .doc, .docx, .ppt, .pptx, .xls, .xlsx, .xlsm, .csv, .txt, .jpg, .jpeg, .png, .bmp, .gif, .md, .html, .epub, .mobi, and .rtf	PyMuPDFLoader
VlEnhance	boolean	No	Specifies whether to enable VL-enhanced content recognition for complex documents. Default value: false. Note For complex documents with confusing typesetting and formatting, we recommend that you enable VL-enhanced content recognition. Document processing time is longer after VL-enhanced content recognition is enabled. After VL-enhanced content recognition is enabled, images in documents cannot be stored or recalled.	false
SplitterModel	string	No	When DocumentLoaderName is set to ADBPGLoader and TextSplitterName is set to LLMSplitter, you can specify the splitting model. Default Value: qwen3-8b. Note Supported splitting models: qwq-plus, qwq-plus-latest, qwen-max, qwen-max-latest, qwen-plus, qwen-plus-latest, qwen-turbo, qwen-turbo-latest, qwen3-235b-a22b, qwen3-32b,qwen3-30b-a3b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b, qwq-32b qwen2.5-14b-instruct-1m, qwen2.5-7b-instruct-1m, qwen2.5-72b-Instruct, qwen2.5-32b-Instruct, qwen2.5-14b-Instruct, qwen2.5-7b-Instruct, qwen2.5-3b-instruct, qwen2.5-1.5b-instruct, qwen2.5-0.5b-instruct.	qwen3-8b

Response parameters

Parameter	Type	Description	Example
	object
RequestId	string	The request ID.	ABB39CC3-4488-4857-905D-2E4A051D0521
Message	string	The returned message.	success
Status	string	API execution status, with the following values: success: Execution succeeded. fail: Execution failed.	success
JobId	string	The job ID.	231460f8-75dc-405e-a669-0c5204887e91

Examples

Sample success responses

JSONformat

{
  "RequestId": "ABB39CC3-4488-4857-905D-2E4A051D0521",
  "Message": "success",
  "Status": "success",
  "JobId": "231460f8-75dc-405e-a669-0c5204887e91"
}

Error codes

For a list of error codes, visit the Service error codes.