All Products
Search
Document Center

Intelligent Media Management:Convert document formats

Last Updated:Mar 28, 2025

The document format conversion feature of Intelligent Media Management (IMM) is applicable to various formats of documents. It allows you to convert the format of a specific document into another format and save the converted document in an Object Storage Service (OSS) path based on your business requirements. Then, you can use the converted document for specific purposes.

Scenarios

  • Online preview optimization: Users upload documents in different formats, such as PDF, Word, Excel, and PPT to OSS. To facilitate users to preview these documents directly in web applications or mobile applications, you can call the CreateOfficeConversionTask operation provided by IMM without the need to download the documents to local computers and then opening them. The documents will be converted into images suitable for online preview.

  • Cross-platform compatibility: Different devices and operating systems support different file formats. The document conversion feature provided by IMM is embedded in OSS to allow users to view documents smoothly regardless of the devices they use.

Feature description

Mappings between the source and destination document formats in document format conversion

The following table describes the mappings between the source and destination document formats in data format conversion.

Source document format

Destination document format

Description

Word, Excel, or PPT

PDF

Documents can be converted into images in the PDF format.

Word, Excel, PPT, or PDF

PNG or JPEG

None.

Word, Excel, PPT, or PDF

TXT

None.

JPEG

PDF

None.

Supported source document formats and filename extensions

Document format

Filename extension

Word

doc, docx, wps, wpss, docm, dotm, dot, dotx, and html

PPT

pptx, ppt, pot, potx, pps, ppsx, dps, dpt, pptm, potm, ppsm, and dpss

Excel

xls, xlt, et, ett, xlsx, xltx, csv, xlsb, xlsm, xltm, and ets

PDF

pdf

Note

  • The OSS path that is returned for a request of document format conversion varies based on the specified destination document format. For more information about the OSS path, see Sample responses.

  • If you set the destination document format to JPEG or PNG, the following output documents may be returned based on the source document format:

    • If the source document is in the Excel format, a folder is created for each sheet of the table. The preview size of a sheet determines the number of documents in the corresponding folder.

    • If the source document format is in the Word, PPT, or PDF format, a document is generated for each page of the source document.

  • If you set the destination document format to PDF or TXT, only an output document is generated regardless of the source document format.

  • You can use the TargetURI template to specify the URL format of the output document. Variables are supported in the template. For more information, see TargetURI template. You can also use the TargetURIPrefix parameter to specify the URL prefix of the output document. For more information about the default URL format of output documents, see Sample responses.

Usage notes

The amount of time that is required for document format conversion varies based on factors such as the document size, number of document pages, and number of document words. In most cases, document format conversion can be completed in a few seconds. However, large-sized documents and documents with a large number of words or pages may take a longer period of time to be converted. To shorten the waiting time for request processing, IMM provides the CreateOfficeConversionTask asynchronous operation, which is used to create a document format conversion task.

Important

After the task starts, the conversion task information is retained for seven days. You can use one of the following methods to query task information:

  • Call the GetTask or ListTasks operation to obtain the value of the TaskId response parameter, and then query the task information by using the parameter value.

  • Activate Message Service (MNS) in the region in which IMM is activated and configure the MNS subscription to obtain task information notifications in a timely manner. For more information, see Asynchronous message examples. For more information about MNS SDK, see Step 4: Receive and delete the message.

  • Activate ApsaraMQ for RocketMQ in the region in which IMM is activated and create an ApsaraMQ for RocketMQ 4.0 instance. Create a topic and a group that can be used to obtain task information notifications in a timely manner. For more information, see Asynchronous message examples. For more information about how to use ApsaraMQ for RocketMQ, see Use HTTP client SDKs to send and subscribe to normal messages.

  • Activate EventBridge in the region in which IMM is activated and access EventBridge to obtain task information notifications in a timely manner. For more information, see IMM events.

  • If the definition of a document is reduced after you convert the format of the document into another format, you can adjust the value of the ImageDPI parameter to control the resolution when you call the CreateOfficeConversionTask operation. The larger the value of the ImageDPI parameter, the clearer the image.

Sample responses

In this example, the TargetURIPrefix parameter specifies that the URL prefix of output documents is the oss://test-bucket/target/ OSS path. The OSS path of an output document varies based on the source document format.

  • If the source document format is Excel, the following OSS path is returned:

    oss://test-bucket/target/{sheetname}_{sheetindex}_{sheetsubindex}.{autoext}

  • If the source document format is not Excel, the following OSS path is returned:

    oss://test-bucket/target/{index}.{autoext}

  • The following table describes the variables in the OSS path:

  • Variable

    Description

    Example

    index

    The subscript of the output document, starting from 1.

    • If the source document format is Word, PDF, or PPT, the subscript specifies the page number.

    • If the source document format is Excel, the sheetindex_sheetsubindex variable is used.

      • sheetindex: the sheet number that starts from 1.

      • sheetsubindex: the image number that starts from 1. A sheet may be converted into multiple images and each image has a number. sheetsubindex specifies the image number.

    6_12

    sheetname

    If the source document format is Excel, this variable is required to specify the sheet name.

    sheet1

    autoext

    The filename extension of the output document.

    jpg

References