All Products
Search
Document Center

AnalyticDB:UploadDocumentAsync

Last Updated:Nov 05, 2025
This topic is generated by a machine translation engine without any human intervention. ALIBABA CLOUD DOES NOT GUARANTEE THE ACCURACY OF MACHINE TRANSLATED CONTENT. To request a human-translated version of this topic or provide feedback on this translation, please include it in the feedback form.

Asynchronous Document Upload

Operation description

The server loads and chunks a document based on the file extension, performs vectorization by using the embedding model that is specified when you call the CreateDocumentCollection operation, and then writes the document to the specified document collection. This operation supports multi-modal embedding for various formats of text and images.

Related operations:

  • You can call the GetUploadDocumentJob operation to query the progress and result of a document upload job.
  • You can call the CancelUploadDocumentJob operation to cancel a document upload job.
Note
  • After a document upload request is submitted, the request is queued for processing. Up to 20 documents in the Pending and Running states can be processed within a Resource Access Management (RAM) user or Alibaba Cloud account.

  • A text document can be split into up to 100,000 chunks.

  • If a document collection uses the OnePeace model, each RAM user or Alibaba Cloud account can upload and query up to 10,000 images.

Debugging

You can run this interface directly in OpenAPI Explorer, saving you the trouble of calculating signatures. After running successfully, OpenAPI Explorer can automatically generate SDK code samples.

Authorization information

The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:

  • Operation: the value that you can use in the Action element to specify the operation on a resource.
  • Access level: the access level of each operation. The levels are read, write, and list.
  • Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
    • For mandatory resource types, indicate with a prefix of * .
    • If the permissions cannot be granted at the resource level, All Resources is used in the Resource type column of the operation.
  • Condition Key: the condition key that is defined by the cloud service.
  • Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.
OperationAccess levelResource typeCondition keyAssociated operation
gpdb:UploadDocumentAsynccreate
*Document
acs:gpdb:{#regionId}:{#accountId}:document/{#DBInstanceId}
    none
none

Request parameters

ParameterTypeRequiredDescriptionExample
DBInstanceIdstringYes

Instance ID with vector engine optimization acceleration enabled. You can call the DescribeDBInstances API to view details of all AnalyticDB PostgreSQL instances in the target region, including the instance ID.

gp-bp12ga6v69h86****
CollectionstringYes

The name of the document library.

Note Created by the CreateDocumentCollection API. You can call the ListDocumentCollections API to view the document libraries that have already been created.
document
NamespacestringNo

Namespace, default is public. You can create one through the CreateNamespace interface and view the list via the ListNamespaces interface.

mynamespace
NamespacePasswordstringYes

The password corresponding to the namespace. > This value is specified by the CreateNamespace interface.

testpassword
RegionIdstringYes

The region ID of the instance.

cn-hangzhou
FileNamestringYes

The name of the file being uploaded.

Note
  • File name: .json, .md, and .pdf.

  • Images: .bmp,. jpg,. jpeg,. png, and. tiff.

  • Compressed packages. The package file name must contain an extension: .tar, .gz, and .zip.

mydoc.txt
FileUrlstringYes

The URL of the publicly accessible document.

Note
  • It is recommended to call this interface using the SDK, which provides a method called UploadDocumentAsyncAdvance for directly uploading local files. > - If the URL points to an image archive, the number of images in the archive should not exceed 100.
https://xx/mydoc.txt
MetadataobjectNo

The metadata. The value of this parameter must be the same as the Metadata parameter that is specified when you call the CreateDocumentCollection operation.

anyNo

The metadata. The value of this parameter must be the same as the Metadata parameter that is specified when you call the CreateDocumentCollection operation.

{"title":"mytitle","page":1}
ChunkSizeintegerNo

Strategy for processing large data: the size of each chunk when the data is split into smaller parts. Maximum value is 2048.

250
ChunkOverlapintegerNo

The size of data that is overlapped between consecutive chunks. The maximum value of this parameter cannot be greater than the value of the ChunkSize parameter.

Note This parameter is used to prevent context missing that may occur due to data truncation. For example, when you upload a long text, you can retain specific overlapped text content between consecutive chunks to better understand the context.
50
SeparatorsarrayNo

The separators that are used to split large amounts of data.

Note
  • This is an important parameter that determines the chunking effect. This parameter is related to the splitter that is specified by the TextSplitterName parameter.

  • In most cases, you do not need to specify this parameter. The server assigns separators based on the value of the TextSplitterName parameter.

stringNo

The separator.

.
DryRunbooleanNo

Specifies whether to perform only document understanding and chunking, but not vectorization and storage. Default value: false.

Note You can set this parameter to true, check the chunking effect, and then perform optimization if needed.
false
ZhTitleEnhancebooleanNo

Specifies whether to enable title enhancement.

Note You can determine the title text, mark the text in the metadata, and then combine the text with the upper-level title to implement text enhancement.
false
TextSplitterNamestringNo

The name of the separator. Valid values:

  • ChineseRecursiveTextSplitter: Inherits from RecursiveCharacterTextSplitter and, by default, uses the delimiters["\n\n","\n", "。 |! |?", "\.\s|\! \s|\?\s", ";|;\s", ",|,\s"] , employing regular expressions to match text.
  • RecursiveCharacterTextSplitter: Uses the delimiters ["\n\n", "\n", " ", ""] by default. The splitter supports splitting code in languages such as C++, Go, Java, JS, PHP, Proto, Python, RST, Ruby, Rust, Scala, Swift, Markdown, LaTeX, HTML, Sol, and C Sharp.
  • SpacyTextSplitter: Uses the delimiters \n\n by default and leverages the spaCy en_core_web_sm model. The splitter can achieve better text splitting performance.
  • MarkdownHeaderTextSplitter: Splits text in the [("#", "head1"), ("##", "head2"), ("###", "head3"), ("####", "head4") format. This splitter works well with Markdown text.
  • LLMSplitter: Use LLM to split text. The default model is qwen3-8b. Currently, this splitter works only when ADBPGLoader is selected.
ChineseRecursiveTextSplitter
DocumentLoaderNamestringNo

Specifies the document loader to use for processing the file. If this parameter is omitted, the system automatically selects a loader based on the file's extension.Valid Values:[List of valid loader names would go here] Valid values:

  • UnstructuredHTMLLoader: .html
  • UnstructuredMarkdownLoader: .md
  • PyMuPDFLoader: .pdf
  • PyPDFLoader: .pdf
  • RapidOCRPDFLoader: .pdf
  • PDFWithImageRefLoader: .pdf (with the text-image association feature)
  • JSONLoader: .json
  • CSVLoader: .csv
  • RapidOCRLoader: .png, .jpg, .jpeg, and .bmp
  • UnstructuredFileLoader: .eml, .msg, .rst, .txt, .docx, .epub, .odt, .pptx, and .tsv
  • ADBPGLoader (free of charge for the first 3,000 pages): .pdf, .doc, .docx, .ppt, .pptx, .xls, .xlsx, .xlsm, .csv, .txt, .jpg, .jpeg, .png, .bmp, .gif, .md, .html, .epub, .mobi, and .rtf
PyMuPDFLoader
VlEnhancebooleanNo

Specifies whether to enable VL-enhanced content recognition for complex documents. Default value: false.

Note
  • For complex documents with confusing typesetting and formatting, we recommend that you enable VL-enhanced content recognition.

  • Document processing time is longer after VL-enhanced content recognition is enabled.

  • After VL-enhanced content recognition is enabled, images in documents cannot be stored or recalled.

false
SplitterModelstringNo

When DocumentLoaderName is set to ADBPGLoader and TextSplitterName is set to LLMSplitter, you can specify the splitting model. Default Value: qwen3-8b.

Note Supported splitting models: qwq-plus, qwq-plus-latest, qwen-max, qwen-max-latest, qwen-plus, qwen-plus-latest, qwen-turbo, qwen-turbo-latest, qwen3-235b-a22b, qwen3-32b,qwen3-30b-a3b, qwen3-14b, qwen3-8b, qwen3-4b, qwen3-1.7b, qwen3-0.6b, qwq-32b qwen2.5-14b-instruct-1m, qwen2.5-7b-instruct-1m, qwen2.5-72b-Instruct, qwen2.5-32b-Instruct, qwen2.5-14b-Instruct, qwen2.5-7b-Instruct, qwen2.5-3b-instruct, qwen2.5-1.5b-instruct, qwen2.5-0.5b-instruct.
qwen3-8b

Response parameters

ParameterTypeDescriptionExample
object
RequestIdstring

The request ID.

ABB39CC3-4488-4857-905D-2E4A051D0521
Messagestring

The returned message.

success
Statusstring

API execution status, with the following values:

  • success: Execution succeeded.
  • fail: Execution failed.
success
JobIdstring

The job ID.

231460f8-75dc-405e-a669-0c5204887e91

Examples

Sample success responses

JSONformat

{
  "RequestId": "ABB39CC3-4488-4857-905D-2E4A051D0521",
  "Message": "success",
  "Status": "success",
  "JobId": "231460f8-75dc-405e-a669-0c5204887e91"
}

Error codes

For a list of error codes, visit the Service error codes.