All Products
Search
Document Center

Tablestore:Document management

Last Updated:May 13, 2026

Use the following API operations to manage documents in a knowledge base: import documents, query status, list documents, update metadata, and delete documents.

Supported document formats

  • PDF: .pdf

  • Word: .doc, .docx

  • Excel: .xls, .xlsx

  • PowerPoint: .ppt, .pptx

  • Plain text: .txt

  • Markdown: .md

Document status lifecycle

After a document is uploaded, it transitions through the following statuses before it becomes searchable:

Status

Description

Actions

Pending

The task is queued for processing.

Query status, Delete

Indexing

The system is parsing, chunking, and vectorizing the document.

Query status, Delete

Completed

Indexing is complete. The document is now searchable.

Search, Update metadata, Delete, View chunks

Failed

Indexing failed.

View failure reason, Delete, Re-upload

Deleting

The system is deleting the document and its associated chunks.

Wait for the deletion to complete.

Note

A document cannot be searched while its status is Pending or Indexing. You must wait for the status to change to Completed before the document becomes searchable.

Add documents

Import a document into the knowledge base. The system automatically completes parsing, chunking, embedding vectorization, and index building. Uploading a document with the same ossKey overwrites the existing document.

The SDK provides three methods for importing documents:

Method

SDK method

Description

Upload local file

upload_documents()

Specify the path to a local file. The SDK automatically uploads it to OSS and then adds it to the knowledge base.

Add OSS file

add_documents()

Specify the path to an existing OSS file.

Batch import from OSS directory

add_documents()

Specify an OSS directory path. The system recursively scans and adds all files in the directory.

Request parameters

Parameter

Type

Description

knowledgeBaseName

string

The name of the knowledge base. Required.

subspace

string

The name of the subspace. The maximum length is 128 characters. Required if subspaces are enabled for the knowledge base.

documents

list<object>

A list of documents. Required. You can include up to 10 documents in a single request, and each file must not exceed 50 MB.

Note

To request an increase in these limits, you can submit a ticket or contact technical support by joining the Tablestore technical exchange group (36165029092).

documents[].filePath

string

The local file path. Required when using upload_documents.

documents[].ossKey

string

The path to an OSS file or directory. The length must be between 1 and 256 characters. Required when using add_documents.

Note

documents[].metadata

object

The document metadata. It must conform to the metadata schema defined for the knowledge base.

documents[].inclusionFilters

list<string>

Inclusion filter that supports the * wildcard at the beginning and end (such as *.pdf) for scanning OSS directories.

documents[].exclusionFilters

list<string>

Exclusion filter, supporting the * wildcard at the beginning and end (for example, *draft*)

Code examples

Upload a local file

Specify the path to a local file. The SDK automatically handles the two-step process: uploading the file to OSS and then adding it to the knowledge base.

Note

When you use upload_documents, you must provide both oss_endpoint and oss_bucket_name when initializing the AgentStorageClient. Otherwise, a ValueError is raised.

resp = client.upload_documents({
    "knowledgeBaseName": "product_docs_kb",
    "documents": [
        {
            "filePath": "/home/user/docs/product_manual.pdf",
            "metadata": {"author": "Jane Doe", "category": "Product Manual"}
        },
        {
            "filePath": "/home/user/docs/faq.docx",
            "metadata": {"author": "John Doe", "category": "FAQ"}
        }
    ]
})

Add an OSS file

If the file already exists in OSS, specify its ossKey directly.

resp = client.add_documents({
    "knowledgeBaseName": "product_docs_kb",
    "documents": [
        {
            "ossKey": "oss://example-bucket/docs/product_manual.pdf",
            "metadata": {"author": "Jane Doe"}
        }
    ]
})

Batch import from an OSS directory

Specify an OSS directory path. The system recursively scans all files within the directory. You can use inclusionFilters and exclusionFilters to filter files based on name patterns.

resp = client.add_documents({
    "knowledgeBaseName": "product_docs_kb",
    "documents": [
        {
            "ossKey": "oss://example-bucket/docs/",
            "inclusionFilters": ["*.pdf", "*.docx"],
            "exclusionFilters": ["*draft*"]
        }
    ]
})

Response

Response fields

Field

Type

Description

documentDetails

list<object>

The processing result for each document.

documentDetails[].docId

string

The document ID.

documentDetails[].ossKey

string

The OSS path of the document.

documentDetails[].status

string

succeed or failed.

documentDetails[].failureReason

string

The reason for the failure. This field is present only if the status is failed.

Response examples

{
  "code": "SUCCESS",
  "data": {
    "documentDetails": [
      {"docId": "fc6ed97f-...", "status": "succeed", "ossKey": "oss://example-bucket/docs/product_manual.pdf"},
      {"docId": "940f2c5c-...", "status": "succeed", "ossKey": "oss://example-bucket/docs/faq.docx"}
    ]
  },
  "message": "succeed"
}

Example of a partial failure response (the HTTP status code is still 200 and the code is still SUCCESS):

{
  "code": "SUCCESS",
  "data": {
    "documentDetails": [
      {"status": "failed", "failureReason": "Metadata field 'date' date string format is not supported", "ossKey": "oss://..."},
      {"status": "succeed", "ossKey": "oss://...", "docId": "940f2c5c-..."}
    ]
  },
  "message": "succeed"
}

Usage notes

  • A 200 OK HTTP response with code: SUCCESS does not guarantee that all documents were processed successfully. You must check the status field for each document in the documentDetails array.

  • status: "succeed" indicates that the upload task is received, not that indexing is complete. The document can be retrieved only after the document status changes to Completed.

  • If subspace is enabled for the knowledge base, you must pass the subspace parameter. Otherwise, an INVALID_PARAMETER error is returned.

  • You must use a supported metadata date format, such as yyyy-MM-dd HH:mm:ss, because an unsupported format will result in a failed document status.

Check indexing status

Document upload is an asynchronous process. After a document is uploaded, it must be processed before it can be searched. We recommend using a polling strategy with exponential backoff to check if indexing is complete.

import time

def wait_for_document(client, kb_name, doc_id, max_interval=30):
    """Polls the document status with exponential backoff until indexing is complete."""
    interval = 3
    while True:
        resp = client.get_document({
            "knowledgeBaseName": kb_name,
            "docId": doc_id
        })
        status = resp["data"][0]["status"]
        if status == "Completed":
            print(f"Indexing complete. Number of chunks: {resp['data'][0].get('chunkNum', 'N/A')}")
            return resp
        elif status == "Failed":
            raise Exception(f"Indexing failed: {resp['data'][0].get('failedDetails')}")
        print(f"Current status: {status}, retrying in {interval}s...")
        time.sleep(interval)
        interval = min(interval * 2, max_interval)

The processing time depends on the size, type, and number of files. Small files are typically processed in a few seconds, while large files or batch imports may take several minutes.

Query a document

Call the get_document method to retrieve details for a specific document, including its processing status, number of chunks, and metadata.

Request parameters

Parameter

Type

Description

knowledgeBaseName

string

The name of the knowledge base. Required.

subspace

string

The name of the subspace. Required if subspaces are enabled for the knowledge base.

docId

string

The document ID. You must specify either this parameter or ossKey.

ossKey

string

The OSS file path. You must specify either this parameter or docId.

Code example

resp = client.get_document({
    "knowledgeBaseName": "product_docs_kb",
    "docId": "fc6ed97f-..."
})

doc = resp["data"][0]
print(f"Status: {doc['status']}, Number of chunks: {doc.get('chunkNum', 'N/A')}")

Response

Field

Type

Description

docId

string

The document ID.

ossKey

string

The OSS path.

subspace

string

The subspace.

chunkNum

int

The number of chunks.

status

string

The document status: Pending, Indexing, Completed, Failed, or Deleting.

createdAt

int

The creation timestamp.

updatedAt

int

The update timestamp.

eTag

string

The eTag of the document.

failedDetails

string

The reason for the failure. This field is present only if the status is Failed.

metadata

object

The document metadata.

Usage notes

If the same ossKey is created, deleted, and then created again, get_document may return multiple records, including historical records. To identify the valid document, check the status field and use the record with a Completed status.

List documents

Call the list_documents method to retrieve a paginated list of documents in a knowledge base.

Request parameters

Parameter

Type

Description

knowledgeBaseName

string

The name of the knowledge base. Required.

subspace

list<string>

A list of subspace names. You can specify up to 10 subspaces. Required if subspaces are enabled for the knowledge base.

maxResults

int

The number of results to return. The default value is 10, and the maximum is 1000.

nextToken

string

The pagination token for retrieving the next page of results.

Code example

resp = client.list_documents({
    "knowledgeBaseName": "product_docs_kb",
    "maxResults": 20
})

for doc in resp["data"]["documentDetails"]:
    print(f"[{doc['status']}] {doc['ossKey']} (Number of chunks: {doc.get('chunkNum', '-')})")

Usage notes

The subspace parameter supports a list of up to 10 values. If you exceed this limit, an error is returned.

Update document metadata

Call the update_document method to update the metadata of a specific document.

Note

You can only update the metadata for documents that are in the Completed status. Calling this method for documents in any other status returns an error.

Request parameters

Parameter

Type

Description

knowledgeBaseName

string

The name of the knowledge base. Required.

subspace

string

The name of the subspace. Required if subspaces are enabled for the knowledge base.

ossKey

string

The OSS path of the document. You must specify either this parameter or docId.

docId

string

The document ID. You must specify either this parameter or ossKey.

metadata

map

The new metadata. Required.

Code example

resp = client.update_document({
    "knowledgeBaseName": "product_docs_kb",
    "docId": "fc6ed97f-...",
    "metadata": {"author": "Jane Doe", "category": "Technical Docs", "version": 2}
})

print(f"Update status: {resp['data']['updateStatus']}")  # UPDATED or NO_OP

Response

Field

Type

Description

docId

string

The document ID.

ossKey

string

The OSS path.

updatedAt

long

The update timestamp.

updateStatus

string

NO_OP or UPDATED.

Usage notes

  • Metadata updates are overwrite operations. The new metadata you provide completely replaces the existing metadata. If you only want to update a single field, you must include all other existing fields in the request.

  • Passing "metadata": null will clear all metadata.

  • If the metadata field is not specified, the original value is retained.

  • Limitations: The total size of the metadata (keys and values) cannot exceed 4 KB. The maximum number of fields is 200.

Delete documents

Call the delete_documents method to delete specified documents and all their associated chunks.

Request parameters

Parameter

Type

Description

knowledgeBaseName

string

The name of the knowledge base. Required.

subspace

string

The name of the subspace. Required if subspaces are enabled for the knowledge base.

documents

list<object>

A list of documents to delete. Required.

documents[].docId

string

The document ID. You must specify either this parameter or ossKey.

documents[].ossKey

string

The OSS path. You must specify either this parameter or docId.

Code example

resp = client.delete_documents({
    "knowledgeBaseName": "product_docs_kb",
    "documents": [
        {"docId": "fc6ed97f-..."},
        {"ossKey": "oss://example-bucket/docs/faq.docx"}
    ]
})

# Check the deletion result for each document
for detail in resp["data"]["documentDetails"]:
    print(f"{detail['ossKey']}: {detail['status']}")

Usage notes

Similar to AddDocuments, the deletion result also requires you to individually check the status of each document in documentDetails.

Related documents