Use the following API operations to manage documents in a knowledge base: import documents, query status, list documents, update metadata, and delete documents.
Supported document formats
PDF:
.pdfWord:
.doc,.docxExcel:
.xls,.xlsxPowerPoint:
.ppt,.pptxPlain text:
.txtMarkdown:
.md
Document status lifecycle
After a document is uploaded, it transitions through the following statuses before it becomes searchable:
Status | Description | Actions |
| The task is queued for processing. | Query status, Delete |
| The system is parsing, chunking, and vectorizing the document. | Query status, Delete |
| Indexing is complete. The document is now searchable. | Search, Update metadata, Delete, View chunks |
| Indexing failed. | View failure reason, Delete, Re-upload |
| The system is deleting the document and its associated chunks. | Wait for the deletion to complete. |
A document cannot be searched while its status is Pending or Indexing. You must wait for the status to change to Completed before the document becomes searchable.
Add documents
Import a document into the knowledge base. The system automatically completes parsing, chunking, embedding vectorization, and index building. Uploading a document with the same ossKey overwrites the existing document.
The SDK provides three methods for importing documents:
Method | SDK method | Description |
Upload local file |
| Specify the path to a local file. The SDK automatically uploads it to OSS and then adds it to the knowledge base. |
Add OSS file |
| Specify the path to an existing OSS file. |
Batch import from OSS directory |
| Specify an OSS directory path. The system recursively scans and adds all files in the directory. |
Request parameters
Parameter | Type | Description |
| string | The name of the knowledge base. Required. |
| string | The name of the subspace. The maximum length is 128 characters. Required if subspaces are enabled for the knowledge base. |
| list<object> | A list of documents. Required. You can include up to 10 documents in a single request, and each file must not exceed 50 MB. Note To request an increase in these limits, you can submit a ticket or contact technical support by joining the Tablestore technical exchange group (36165029092). |
| string | The local file path. Required when using upload_documents. |
| string | The path to an OSS file or directory. The length must be between 1 and 256 characters. Required when using Note |
| object | The document metadata. It must conform to the metadata schema defined for the knowledge base. |
| list<string> | Inclusion filter that supports the |
| list<string> | Exclusion filter, supporting the |
Code examples
Upload a local file
Specify the path to a local file. The SDK automatically handles the two-step process: uploading the file to OSS and then adding it to the knowledge base.
When you use upload_documents, you must provide both oss_endpoint and oss_bucket_name when initializing the AgentStorageClient. Otherwise, a ValueError is raised.
resp = client.upload_documents({
"knowledgeBaseName": "product_docs_kb",
"documents": [
{
"filePath": "/home/user/docs/product_manual.pdf",
"metadata": {"author": "Jane Doe", "category": "Product Manual"}
},
{
"filePath": "/home/user/docs/faq.docx",
"metadata": {"author": "John Doe", "category": "FAQ"}
}
]
})Add an OSS file
If the file already exists in OSS, specify its ossKey directly.
resp = client.add_documents({
"knowledgeBaseName": "product_docs_kb",
"documents": [
{
"ossKey": "oss://example-bucket/docs/product_manual.pdf",
"metadata": {"author": "Jane Doe"}
}
]
})Batch import from an OSS directory
Specify an OSS directory path. The system recursively scans all files within the directory. You can use inclusionFilters and exclusionFilters to filter files based on name patterns.
resp = client.add_documents({
"knowledgeBaseName": "product_docs_kb",
"documents": [
{
"ossKey": "oss://example-bucket/docs/",
"inclusionFilters": ["*.pdf", "*.docx"],
"exclusionFilters": ["*draft*"]
}
]
})Response
Response fields
Field | Type | Description |
| list<object> | The processing result for each document. |
| string | The document ID. |
| string | The OSS path of the document. |
| string |
|
| string | The reason for the failure. This field is present only if the status is |
Response examples
{
"code": "SUCCESS",
"data": {
"documentDetails": [
{"docId": "fc6ed97f-...", "status": "succeed", "ossKey": "oss://example-bucket/docs/product_manual.pdf"},
{"docId": "940f2c5c-...", "status": "succeed", "ossKey": "oss://example-bucket/docs/faq.docx"}
]
},
"message": "succeed"
}Example of a partial failure response (the HTTP status code is still 200 and the code is still SUCCESS):
{
"code": "SUCCESS",
"data": {
"documentDetails": [
{"status": "failed", "failureReason": "Metadata field 'date' date string format is not supported", "ossKey": "oss://..."},
{"status": "succeed", "ossKey": "oss://...", "docId": "940f2c5c-..."}
]
},
"message": "succeed"
}Usage notes
A
200 OKHTTP response withcode: SUCCESSdoes not guarantee that all documents were processed successfully. You must check thestatusfield for each document in thedocumentDetailsarray.status: "succeed"indicates that the upload task is received, not that indexing is complete. The document can be retrieved only after the document status changes toCompleted.If subspace is enabled for the knowledge base, you must pass the
subspaceparameter. Otherwise, anINVALID_PARAMETERerror is returned.You must use a supported metadata date format, such as
yyyy-MM-dd HH:mm:ss, because an unsupported format will result in afaileddocument status.
Check indexing status
Document upload is an asynchronous process. After a document is uploaded, it must be processed before it can be searched. We recommend using a polling strategy with exponential backoff to check if indexing is complete.
import time
def wait_for_document(client, kb_name, doc_id, max_interval=30):
"""Polls the document status with exponential backoff until indexing is complete."""
interval = 3
while True:
resp = client.get_document({
"knowledgeBaseName": kb_name,
"docId": doc_id
})
status = resp["data"][0]["status"]
if status == "Completed":
print(f"Indexing complete. Number of chunks: {resp['data'][0].get('chunkNum', 'N/A')}")
return resp
elif status == "Failed":
raise Exception(f"Indexing failed: {resp['data'][0].get('failedDetails')}")
print(f"Current status: {status}, retrying in {interval}s...")
time.sleep(interval)
interval = min(interval * 2, max_interval)The processing time depends on the size, type, and number of files. Small files are typically processed in a few seconds, while large files or batch imports may take several minutes.
Query a document
Call the get_document method to retrieve details for a specific document, including its processing status, number of chunks, and metadata.
Request parameters
Parameter | Type | Description |
| string | The name of the knowledge base. Required. |
| string | The name of the subspace. Required if subspaces are enabled for the knowledge base. |
| string | The document ID. You must specify either this parameter or |
| string | The OSS file path. You must specify either this parameter or |
Code example
resp = client.get_document({
"knowledgeBaseName": "product_docs_kb",
"docId": "fc6ed97f-..."
})
doc = resp["data"][0]
print(f"Status: {doc['status']}, Number of chunks: {doc.get('chunkNum', 'N/A')}")Response
Field | Type | Description |
| string | The document ID. |
| string | The OSS path. |
| string | The subspace. |
| int | The number of chunks. |
| string | The document status: |
| int | The creation timestamp. |
| int | The update timestamp. |
| string | The eTag of the document. |
| string | The reason for the failure. This field is present only if the status is Failed. |
| object | The document metadata. |
Usage notes
If the same ossKey is created, deleted, and then created again, get_document may return multiple records, including historical records. To identify the valid document, check the status field and use the record with a Completed status.
List documents
Call the list_documents method to retrieve a paginated list of documents in a knowledge base.
Request parameters
Parameter | Type | Description |
| string | The name of the knowledge base. Required. |
| list<string> | A list of subspace names. You can specify up to 10 subspaces. Required if subspaces are enabled for the knowledge base. |
| int | The number of results to return. The default value is 10, and the maximum is 1000. |
| string | The pagination token for retrieving the next page of results. |
Code example
resp = client.list_documents({
"knowledgeBaseName": "product_docs_kb",
"maxResults": 20
})
for doc in resp["data"]["documentDetails"]:
print(f"[{doc['status']}] {doc['ossKey']} (Number of chunks: {doc.get('chunkNum', '-')})")Usage notes
The subspace parameter supports a list of up to 10 values. If you exceed this limit, an error is returned.
Update document metadata
Call the update_document method to update the metadata of a specific document.
You can only update the metadata for documents that are in the Completed status. Calling this method for documents in any other status returns an error.
Request parameters
Parameter | Type | Description |
| string | The name of the knowledge base. Required. |
| string | The name of the subspace. Required if subspaces are enabled for the knowledge base. |
| string | The OSS path of the document. You must specify either this parameter or |
| string | The document ID. You must specify either this parameter or |
| map | The new metadata. Required. |
Code example
resp = client.update_document({
"knowledgeBaseName": "product_docs_kb",
"docId": "fc6ed97f-...",
"metadata": {"author": "Jane Doe", "category": "Technical Docs", "version": 2}
})
print(f"Update status: {resp['data']['updateStatus']}") # UPDATED or NO_OPResponse
Field | Type | Description |
| string | The document ID. |
| string | The OSS path. |
| long | The update timestamp. |
| string |
|
Usage notes
Metadata updates are overwrite operations. The new metadata you provide completely replaces the existing metadata. If you only want to update a single field, you must include all other existing fields in the request.
Passing
"metadata": nullwill clear all metadata.If the
metadatafield is not specified, the original value is retained.Limitations: The total size of the metadata (keys and values) cannot exceed 4 KB. The maximum number of fields is 200.
Delete documents
Call the delete_documents method to delete specified documents and all their associated chunks.
Request parameters
Parameter | Type | Description |
| string | The name of the knowledge base. Required. |
| string | The name of the subspace. Required if subspaces are enabled for the knowledge base. |
| list<object> | A list of documents to delete. Required. |
| string | The document ID. You must specify either this parameter or |
| string | The OSS path. You must specify either this parameter or |
Code example
resp = client.delete_documents({
"knowledgeBaseName": "product_docs_kb",
"documents": [
{"docId": "fc6ed97f-..."},
{"ossKey": "oss://example-bucket/docs/faq.docx"}
]
})
# Check the deletion result for each document
for detail in resp["data"]["documentDetails"]:
print(f"{detail['ossKey']}: {detail['status']}")Usage notes
Similar to AddDocuments, the deletion result also requires you to individually check the status of each document in documentDetails.
Related documents
To view and manage document chunks, see Chunk management.