The document format conversion feature of Intelligent Media Management (IMM) enables you to process various document types. You can convert documents to your desired output format and save the converted documents to a specified Object Storage Service (OSS) path.
Scenarios
Online preview optimization: Users upload documents in various formats, such as PDF, Word, Excel, and PPT, to OSS. To allow users to preview these documents directly in web or mobile applications without downloading them, you can call the document conversion API provided by the IMM service. This converts the documents into images suitable for online display.
Cross-platform compatibility: Different devices and operating systems support different file formats. The document conversion service ensures that all users can view documents smoothly, regardless of the device they use.
Billing
Using the document format conversion service incurs the following fees:
Document processing fees: These fees are charged based on the number of calls. For more information, see Document processing.
Traffic fees: You are charged for outbound Internet traffic based on the size of the processed files. For more information, see Traffic fees.
Features
Supported conversion types
The following table describes the supported conversion types for document format conversion.
Input document type | Output document type | Description |
Word, Excel, PPT | Generates a PDF file that consists of images. | |
Word, Excel, PPT, PDF | PNG, JPEG | None |
Word, Excel, PPT | TXT | None |
JPEG | None |
Supported input file types
File type | File extension |
Word | doc, docx, wps, wpss, docm, dotm, dot, dotx, html |
PPT | pptx, ppt, pot, potx, pps, ppsx, dps, dpt, pptm, potm, ppsm, dpss |
Excel | xls, xlt, et, ett, xlsx, xltx, csv, xlsb, xlsm, xltm, ets |
Notes
The output content varies based on the output document format that you specify in the request. For examples of output content, see Output content examples.
For JPEG and PNG output formats, the output content differs based on whether the input document is a spreadsheet (Excel).
If the input document is a spreadsheet (Excel), a folder is created for each sheet. Then, multiple files are generated based on the preview page size.
If the input document is not a spreadsheet, such as a Word or PPT document, one file is generated for each page of the document.
For PDF and TXT output formats, only one file is generated, regardless of whether the input document is a spreadsheet.
You can use the
TargetURIparameter to specify the output path. This parameter supports variable rendering. For more information, see TargetURI template. You can also use theTargetURIPrefixparameter to specify the prefix of the output path. For information about the default output path, see Output content examples.You can convert DOC or DOCX files only to the PDF, image, or TXT format. Conversion between the DOC and DOCX formats is not supported.
Usage notes
The time required for document format conversion depends on factors such as the document size, number of pages, and word count. In typical cases, a request is completed in seconds. However, large files or files with many pages and words may take up to tens of seconds. To reduce application wait times, IMM provides the asynchronous CreateOfficeConversionTask API operation for document format conversion.
After a task starts, its information is saved for only seven days. Use the following methods to promptly retrieve task information:
Call the GetTask or ListTasks operation to retrieve the returned
TaskIdand view the task information.Activate Message Service (MNS) in the same region as IMM and configure a subscription to promptly receive task information notifications. For the asynchronous notification message format, see Asynchronous notification message format. For more information about the MNS software development kit (SDK), see Step 4: Receive and delete messages.
Activate RocketMQ in the same region as IMM. Then, create a RocketMQ 4.0 instance, topic, and group to promptly receive task information notifications. For the asynchronous notification message format, see Asynchronous notification message format. For more information about how to use RocketMQ, see Use an SDK for HTTP to send and receive normal messages.
Activate and connect to EventBridge in the same region as IMM to receive task information notifications in real time. For more information, see Intelligent Media Management (IMM) events.
If the document resolution is reduced after conversion, you can adjust the ImageDPI parameter to control the resolution when you call the CreateOfficeConversionTask operation. A larger value for the ImageDPI parameter results in a clearer image.
Output content examples
The following examples show the output content when the output path prefix TargetURIPrefix is set to the OSS path oss://test-bucket/target/ in the request.
If the input document is a spreadsheet, the output path is in the following format.
oss://test-bucket/target/{sheetname}_{sheetindex}_{sheetsubindex}.{autoext}If the input document is not a spreadsheet, the output path is in the following format.
oss://test-bucket/target/{index}.{autoext}The following table describes the related parameters.
If the input file is a Word, PDF, or PPT file, this variable indicates the page number.
If the input file is an Excel file, this variable indicates sheetindex_sheetsubindex.
sheetindex: The index of the sheet. The value starts from 1.
sheetsubindex: The index of the image for the current sheet. The value starts from 1. A sheet may be converted into multiple images.
Variable | Description | Example value |
index | The output index. The value starts from 1. | 6_12 |
sheetname | If the input file is an Excel file, this variable indicates the name of the sheet. | sheet1 |
autoext | The extension of the output file. | jpg |
Asynchronous notification message format
If you set the message callback parameter in the request, the asynchronous notification contains the number of converted files (TargetFileCount) and custom information (UserData).
To use asynchronous message notifications, set the
Notificationmessage callback parameter when you initiate a request.The parameters in the returned message are described in the following example:
{
"ProjectName": "immtest", // The name of the project.
"DatasetName": "", // The name of the dataset.
"RequestId": "A1DA7436-768B-061D-833C-****", // The request ID.
"StartTime": "2023-01-04T05:03:40.928Z", // The start time of the task.
"EndTime": "2023-01-04T05:03:41.444Z", // The end time of the task.
"UserData": "test", // The custom information.
"TaskType": "OfficeConversion", // The task type.
"TaskId": "OfficeConversion-ed315cab-7736-4ad8-8c56-****", // The ID of the conversion task.
"Status": "Succeeded", // The status of the conversion.
"Code": "", // An empty value indicates that the task is successful.
"Message": "",// The error message if the task failed.
"TargetFileCount": 5 // The number of converted files.
}