All Products
Search
Document Center

Object Storage Service:Document conversion

Last Updated:Feb 04, 2025

The document format conversion feature allows you to convert the format of a document into another format that suits your application, and save the converted document to an Object Storage Service (OSS) bucket for later use.

Scenarios

  • Optimized document preview: Document conversion allows you to convert documents stored in OSS, such as PDF, Word, Excel, and PowerPoint, into an image format, which is more suitable for online preview. Viewers can preview the documents in web browsers or on mobile applications, without the need to download them.

  • Cross-platform compatibility: A specific document format may not be universally supported across devices or operating systems. The document conversion feature enables cross-platform support for various document formats.

Prerequisites

  • An application for the data processing capabilities of the new Intelligent Media Management (IMM) version based on the GET and POST methods is submitted in Quota Center and approved.

  • The OSS bucket that contains the document to be converted is bound to an IMM project. You can bind a bucket to an IMM project in the OSS console or by using the IMM API.

  • The permission of writing data to the bucket is granted to the role. For more information, see Example 1: Authorize a RAM user to fully control a bucket.

Usage notes

  • Document conversion supports only asynchronous processing (x-oss-async-process).

  • Anonymous access will be denied.

  • You must have the required permissions to use the feature. For more information, see permissions.

Parameters

Action: doc/preview

The following table describes the parameters for document conversion.

Parameter

Type

Required

Description

target

string

Yes

The format of the output object. Valid values:

  • pdf

  • png

  • jpg

  • txt

    Note

    Only Word and PowerPoint documents can be converted to the TXT format.

source

string

No

The type of the source document. By default, the extension of the document name is used. If the document name does not contain an extension, you can specify a value for this parameter. Valid values:

  • Word documents: doc, docx, wps, wpss, docm, dotm, dot, dotx, and html

  • Presentation documents: pptx, ppt, pot, potx, pps, ppsx, dps, dpt, pptm, potm, ppsm, and dpss

  • Table documents: xls, xlt, et, ett, xlsx, xltx, csv, xlsb, xlsm, xltm, and ets

  • PDF documents: pdf

pages

string

No

The numbers of pages to be converted.

For example, a value of 1,2,4-10 specifies that the following pages are converted: page 1, page 2, and page 4 to 10.

You may also need to use the sys/saveas and notify parameters when you convert a document. For more information, see sys/saveas and Message notification.

Use the RESTful API

Conversion example

  • Before conversion

    • Document format: DOCX

    • Document name: example.docx

  • Processing method: Convert the document format

  • After conversion

    • Object format: PNG

    • Storage path: oss://test-bucket/doc_images/{index}.png

      • b_dGVzdC1idWNrZXQ=: Save the output object to a bucket named test-bucket (dGVzdC1idWNrZXQ= is the Base64-encoded value of test-bucket).

      • o_ZG9jX2ltYWdlcy97aW5kZXh9LnBuZw==: Store the output object in the doc_images directory with the corresponding page number as the name specified by the index variable (ZG9jX2ltYWdlcy97aW5kZXh9LnBuZw== is the Base64-encoded value of doc_images/{index}.png).

    • Conversion completion notification: Send a message to a topic named doc_images in Simple Message Queue (SMQ)

Sample request

// Convert the example.docx object to PNG images. 
POST /exmaple.docx?  x-oss-async-process HTTP/1.1
Host: doc-demo.oss-cn-hangzhou.aliyuncs.com
Date: Fri, 28 Oct 2022 06:40:10 GMT
Authorization: SignatureValue

x-oss-async-process=doc/convert,target_png,source_docx|sys/saveas,b_dGVzdC1idWNrZXQ,o_ZG9jX2ltYWdlcy97aW5kZXh9LnBuZw/notify,topic_ZG9jX2ltYWdlcw

Use OSS SDKs

You can use OSS SDKs only for Java, Python, or Go to convert a document.

Java

OSS SDK for Java V3.17.4 or later is required.

import com.aliyun.oss.ClientBuilderConfiguration;
import com.aliyun.oss.OSS;
import com.aliyun.oss.OSSClientBuilder;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;
import com.aliyun.oss.model.AsyncProcessObjectRequest;
import com.aliyun.oss.model.AsyncProcessObjectResult;
import com.aliyuncs.exceptions.ClientException;

import java.util.Base64;

public class Demo1 {
    public static void main(String[] args) throws ClientException {
        // Specify the endpoint of the region in which the bucket is located. 
        String endpoint = "https://oss-cn-hangzhou.aliyuncs.com";
        // Specify the ID of the Alibaba Cloud region in which the bucket is located. Example: cn-hangzhou. 
        String region = "cn-hangzhou";
        // Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
        EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
        // Specify the name of the bucket. 
        String bucketName = "examplebucket";
        // Specify the name of the output object. 
        String targetKey = "dest.png";
        // Specify the name of the source document. 
        String sourceKey = "src.docx";

        // Create an OSSClient instance. 
        ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
        clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
        OSS ossClient = OSSClientBuilder.create()
                .endpoint(endpoint)
                .credentialsProvider(credentialsProvider)
                .clientConfiguration(clientBuilderConfiguration)
                .region(region)
                .build();

        try {
            // Create a style variable of the string type to store document conversion parameters. 
            String style = String.format("doc/convert,target_png,source_docx");
            // Create an asynchronous processing instruction. 
            String bucketEncoded = Base64.getUrlEncoder().withoutPadding().encodeToString(bucketName.getBytes());
            String targetEncoded = Base64.getUrlEncoder().withoutPadding().encodeToString(targetKey.getBytes());
            String process = String.format("%s|sys/saveas,b_%s,o_%s/notify,topic_QXVkaW9Db252ZXJ0", style, bucketEncoded, targetEncoded);
            // Create an AsyncProcessObjectRequest object. 
            AsyncProcessObjectRequest request = new AsyncProcessObjectRequest(bucketName, sourceKey, process);
            // Execute the asynchronous processing task. 
            AsyncProcessObjectResult response = ossClient.asyncProcessObject(request);
            System.out.println("EventId: " + response.getEventId());
            System.out.println("RequestId: " + response.getRequestId());
            System.out.println("TaskId: " + response.getTaskId());

        } finally {
            // Shut down the OSSClient instance. 
            ossClient.shutdown();
        }
    }
}

Python

OSS SDK for Python V2.18.4 or later is required.

# -*- coding: utf-8 -*-
import base64
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider

def main():
    # Obtain the temporary access credentials from the environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
    auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
    # Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com. 
    endpoint = 'https://oss-cn-hangzhou.aliyuncs.com'
    # Specify the ID of the Alibaba Cloud region in which the bucket is located. Example: cn-hangzhou. 
    region = 'cn-hangzhou'

    # Specify the name of the bucket. Example: examplebucket. 
    bucket = oss2.Bucket(auth, endpoint, 'examplebucket', region=region)

    # Specify the name of the source document. 
    source_key = 'src.docx'

    # Specify the name of the output object. 
    target_key = 'dest.png'

    # Create a style variable of the string type to store document conversion parameters. 
    animation_style = 'doc/convert,target_png,source_docx'

    # Create a processing instruction, in which the name of the bucket and the name of the output object are Base64-encoded. 
    bucket_name_encoded = base64.urlsafe_b64encode('examplebucket'.encode()).decode().rstrip('=')
    target_key_encoded = base64.urlsafe_b64encode(target_key.encode()).decode().rstrip('=')
    process = f"{animation_style}|sys/saveas,b_{bucket_name_encoded},o_{target_key_encoded}/notify,topic_QXVkaW9Db252ZXJ0"

    try:
        # Execute the asynchronous processing task. 
        result = bucket.async_process_object(source_key, process)
        print(f"EventId: {result.event_id}")
        print(f"RequestId: {result.request_id}")
        print(f"TaskId: {result.task_id}")
    except Exception as e:
        print(f"Error: {e}")


if __name__ == "__main__":
    main()

Go

OSS SDK for Go V3.0.2 or later is required.

package main

import (
    "encoding/base64"
    "fmt"
    "os"
    "github.com/aliyun/aliyun-oss-go-sdk/oss"
    "log"
)

func main() {
    // Obtain temporary access credentials from the environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured. 
    provider, err := oss.NewEnvironmentVariableCredentialsProvider()
    if err != nil {
    fmt.Println("Error:", err)
    os.Exit(-1)
    }
    // Create an OSSClient instance. 
    // Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.  
    // Specify the ID of the Alibaba Cloud region in which the bucket is located. Example: cn-hangzhou. 
    client, err := oss.New("https://oss-cn-hangzhou.aliyuncs.com", "", "", oss.SetCredentialsProvider(&provider), oss.AuthVersion(oss.AuthV4), oss.Region("cn-hangzhou"))
    if err != nil {
    fmt.Println("Error:", err)
    os.Exit(-1)
    }
    // Specify the name of the bucket. Example: examplebucket. 
    bucketName := "examplebucket"

    bucket, err := client.Bucket(bucketName)
    if err != nil {
    fmt.Println("Error:", err)
    os.Exit(-1)
    }

    // Specify the name of the source document. 
    sourceKey := "src.docx"
    // Specify the name of the output object. 
    targetKey := "dest.png"

    // Create a style variable of the string type to store document conversion parameters.
    animationStyle := "doc/convert,target_png,source_docx"

    // Create a processing instruction, in which the name of the bucket and the name of the output object are Base64-encoded. 
    bucketNameEncoded := base64.URLEncoding.EncodeToString([]byte(bucketName))
    targetKeyEncoded := base64.URLEncoding.EncodeToString([]byte(targetKey))
    process := fmt.Sprintf("%s|sys/saveas,b_%v,o_%v/notify,topic_QXVkaW9Db252ZXJ0", animationStyle, bucketNameEncoded, targetKeyEncoded)

    // Run the asynchronous processing task. 
    result, err := bucket.AsyncProcessObject(sourceKey, process)
    if err != nil {
    log.Fatalf("Failed to async process object: %s", err)
    }

    fmt.Printf("EventId: %s\n", result.EventId)
    fmt.Printf("RequestId: %s\n", result.RequestId)
    fmt.Printf("TaskId: %s\n", result.TaskId)
}

FAQ

Can I convert a specific worksheet within an Excel workbook?

No, you cannot convert a specific worksheet within an Excel workbook. The document conversion feature allows you to convert only all worksheets within an Excel workbook. To convert only a specific worksheet, you can call the CreateOfficeConversionTask operation of IMM and use the SheetIndex parameter to specify the worksheet.