All Products
Search
Document Center

Intelligent Media Management:ExtractDocumentText

Last Updated:Nov 20, 2025
This topic is generated by a machine translation engine without any human intervention. ALIBABA CLOUD DOES NOT GUARANTEE THE ACCURACY OF MACHINE TRANSLATED CONTENT. To request a human-translated version of this topic or provide feedback on this translation, please include it in the feedback form.

Extract text from the document

Operation description

  • Before using this interface, please make sure you fully understand the billing method and pricing of the Intelligent Media Management product.

  • Before calling this interface, ensure that there is an available project ( Project ) in the current Region. For more details, see Project Management.

  • Supports common Word, Excel, PPT, PDF, and TXT documents.

  • The file size must not exceed 200 MB. The extracted plain text file size should not exceed 2 MB (approximately 600,000 Chinese characters).

Notice If the document format is complex or the text volume is too large, a timeout error may occur. In such scenarios, it is recommended to use the CreateOfficeConversionTask interface and specify the output format as txt to achieve similar functionality.

Debugging

You can run this interface directly in OpenAPI Explorer, saving you the trouble of calculating signatures. After running successfully, OpenAPI Explorer can automatically generate SDK code samples.

Authorization information

The following table shows the authorization information corresponding to the API. The authorization information can be used in the Action policy element to grant a RAM user or RAM role the permissions to call this API operation. Description:

  • Operation: the value that you can use in the Action element to specify the operation on a resource.
  • Access level: the access level of each operation. The levels are read, write, and list.
  • Resource type: the type of the resource on which you can authorize the RAM user or the RAM role to perform the operation. Take note of the following items:
    • For mandatory resource types, indicate with a prefix of * .
    • If the permissions cannot be granted at the resource level, All Resources is used in the Resource type column of the operation.
  • Condition Key: the condition key that is defined by the cloud service.
  • Associated operation: other operations that the RAM user or the RAM role must have permissions to perform to complete the operation. To complete the operation, the RAM user or the RAM role must have the permissions to perform the associated operations.
OperationAccess levelResource typeCondition keyAssociated operation
imm:ExtractDocumentTextnone
*Project
acs:imm:{#regionId}:{#accountId}:project/{#ProjectName}
    none
none

Request parameters

ParameterTypeRequiredDescriptionExample
ProjectNamestringYes

Project name. For how to obtain it, see Creating a Project.

immtest
SourceURIstringYes

Storage address of the source data.

The OSS address rule is oss://${Bucket}/${Object}, where ${Bucket} is the name of the OSS Bucket in the same region (Region) as the current project, and ${Object} is the complete path of the file including the file extension.

Notice Currently, only HTTP protocol addresses are supported.

oss://test-bucket/test-object
SourceTypestringNo

Suffix type of the source data. By default, the type of the source data is determined based on the suffix of the input object. When the input object does not have a suffix, you can set this parameter. The available values are as follows:

  • Word Documents: doc, docx, wps, wpss, docm, dotm, dot, dotx, html
  • Presentation Documents (PPT): pptx, ppt, pot, potx, pps, ppsx, dps, dpt, pptm, potm, ppsm, dpss
  • Spreadsheet Documents (Excel): xls, xlt, et, ett, xlsx, xltx, csv, xlsb, xlsm, xltm, ets
  • PDF Documents: pdf
docx
CredentialConfigCredentialConfigNo

If there are no special requirements, leave it blank.

Chain authorization configuration, optional. For more information, see Using Chain Authorization to Access Other Entity Resources.

Response parameters

ParameterTypeDescriptionExample
object

Response body structure.

RequestIdstring

Request ID.

94D6F994-E298-037E-8E8B-0090F27*****
DocumentTextstring

The text content of the document.

测试内容。

Examples

Sample success responses

JSONformat

{
  "RequestId": "94D6F994-E298-037E-8E8B-0090F27*****",
  "DocumentText": "测试内容。"
}

Error codes

For a list of error codes, visit the Service error codes.

Change history

Change timeSummary of changesOperation
2023-12-13The request parameters of the API has changedView Change Details