The document conversion feature lets you convert various types of documents to a target format and save the results to a specified Object Storage Service (OSS) path.
Use cases
Optimize online previews: After you upload documents such as PDF, Word, Excel, or PPT files to OSS, you can call the document conversion API to convert the documents into images. This lets users preview the documents directly in a web browser or mobile app without downloading them.
Ensure cross-platform compatibility: Document conversion ensures that users can view documents smoothly across different devices and operating systems.
Supported input file types
File type | File extension |
Word | doc, docx, wps, wpss, docm, dotm, dot, dotx |
PPT | pptx, ppt, pot, potx, pps, ppsx, dps, dpt, pptm, potm, ppsm, dpss |
Excel | xls, xlt, et, ett, xlsx, xltx, csv, xlsb, xlsm, xltm, ets |
How to use
Prerequisites
You have created a bucket in Object Storage Service (OSS), uploaded the source document to the bucket, and associated the bucket with an Intelligent Media Management (IMM) project. The IMM project must be in the same region as the bucket.
You have the required permissions for IMM to process the document.
Convert documents
You can use an SDK to call the document conversion API and save the converted file to a specified bucket. Asynchronous processing for document conversion is supported only by the SDKs for Java, Python, and Go.
Java
This example requires OSS SDK for Java version 3.17.4 or later.
import com.aliyun.oss.ClientBuilderConfiguration;
import com.aliyun.oss.OSS;
import com.aliyun.oss.OSSClientBuilder;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;
import com.aliyun.oss.model.AsyncProcessObjectRequest;
import com.aliyun.oss.model.AsyncProcessObjectResult;
import com.aliyuncs.exceptions.ClientException;
import java.util.Base64;
public class Demo1 {
public static void main(String[] args) throws ClientException {
// Set endpoint to the endpoint of the region where the bucket is located.
String endpoint = "https://oss-cn-hangzhou.aliyuncs.com";
// Set region to the region ID, for example, cn-hangzhou.
String region = "cn-hangzhou";
// Obtain access credentials from environment variables. Before running this sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
// Specify the bucket name.
String bucketName = "examplebucket";
// Specify the name of the destination file.
String targetKey = "dest.png";
// Specify the name of the source document.
String sourceKey = "src.docx";
// Create an OSSClient instance.
// When the OSSClient instance is no longer needed, call the shutdown method to release its resources.
ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
OSS ossClient = OSSClientBuilder.create()
.endpoint(endpoint)
.credentialsProvider(credentialsProvider)
.clientConfiguration(clientBuilderConfiguration)
.region(region)
.build();
try {
// Build the string for document processing styles and conversion parameters.
String style = String.format("doc/convert,target_png,source_docx");
// Build the asynchronous processing instruction.
String bucketEncoded = Base64.getUrlEncoder().withoutPadding().encodeToString(bucketName.getBytes());
String targetEncoded = Base64.getUrlEncoder().withoutPadding().encodeToString(targetKey.getBytes());
String process = String.format("%s|sys/saveas,b_%s,o_%s", style, bucketEncoded, targetEncoded);
// Create an AsyncProcessObjectRequest object.
AsyncProcessObjectRequest request = new AsyncProcessObjectRequest(bucketName, sourceKey, process);
// Initiate the asynchronous task.
AsyncProcessObjectResult response = ossClient.asyncProcessObject(request);
System.out.println("EventId: " + response.getEventId());
System.out.println("RequestId: " + response.getRequestId());
System.out.println("TaskId: " + response.getTaskId());
} finally {
// Shut down the OSSClient instance.
ossClient.shutdown();
}
}
}Python
This example requires OSS SDK for Python version 2.18.4 or later.
# -*- coding: utf-8 -*-
import base64
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
def main():
# Obtain access credentials from environment variables. Before running this sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Set endpoint to the endpoint of the region where the bucket is located. For example, for China (Hangzhou), set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = 'https://oss-cn-hangzhou.aliyuncs.com'
# Specify the general-purpose Alibaba Cloud region ID, for example, cn-hangzhou.
region = 'cn-hangzhou'
# Specify the bucket name, for example, examplebucket.
bucket = oss2.Bucket(auth, endpoint, 'examplebucket', region=region)
# Specify the name of the source document.
source_key = 'src.docx'
# Specify the name of the destination file.
target_key = 'dest.png'
# Build the string for document processing styles and conversion parameters.
animation_style = 'doc/convert,target_png,source_docx'
# Build the processing instruction, including the save path and the Base64-encoded bucket name and destination file name.
bucket_name_encoded = base64.urlsafe_b64encode('examplebucket'.encode()).decode().rstrip('=')
target_key_encoded = base64.urlsafe_b64encode(target_key.encode()).decode().rstrip('=')
process = f"{animation_style}|sys/saveas,b_{bucket_name_encoded},o_{target_key_encoded}"
try:
# Initiate the asynchronous task.
result = bucket.async_process_object(source_key, process)
print(f"EventId: {result.event_id}")
print(f"RequestId: {result.request_id}")
print(f"TaskId: {result.task_id}")
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()Go
This example requires OSS SDK for Go version 3.0.2 or later.
package main
import (
"encoding/base64"
"fmt"
"os"
"github.com/aliyun/aliyun-oss-go-sdk/oss"
"log"
)
func main() {
// Obtain access credentials from environment variables. Before running this sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
provider, err := oss.NewEnvironmentVariableCredentialsProvider()
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Create an OSSClient instance.
// Set the endpoint to the one for your bucket's region. For the China (Hangzhou) region, for example, use https://oss-cn-hangzhou.aliyuncs.com. Set the endpoint based on your actual region.
// Specify the general-purpose Alibaba Cloud region ID, for example, cn-hangzhou.
client, err := oss.New("https://oss-cn-hangzhou.aliyuncs.com", "", "", oss.SetCredentialsProvider(&provider), oss.AuthVersion(oss.AuthV4), oss.Region("cn-hangzhou"))
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Specify the bucket name, for example, examplebucket.
bucketName := "examplebucket"
bucket, err := client.Bucket(bucketName)
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Specify the name of the source document.
sourceKey := "src.docx"
// Specify the name of the destination file.
targetKey := "dest.png"
// Build the string for document processing styles and conversion parameters.
animationStyle := "doc/convert,target_png,source_docx"
// Build the processing instruction, including the save path and the Base64-encoded bucket name and destination file name.
bucketNameEncoded := base64.URLEncoding.EncodeToString([]byte(bucketName))
targetKeyEncoded := base64.URLEncoding.EncodeToString([]byte(targetKey))
process := fmt.Sprintf("%s|sys/saveas,b_%v,o_%v", animationStyle, bucketNameEncoded, targetKeyEncoded)
// Initiate the asynchronous task.
result, err := bucket.AsyncProcessObject(sourceKey, process)
if err != nil {
log.Fatalf("Failed to async process object: %s", err)
}
fmt.Printf("EventId: %s\n", result.EventId)
fmt.Printf("RequestId: %s\n", result.RequestId)
fmt.Printf("TaskId: %s\n", result.TaskId)
}Parameters
Action: doc/convert
The following table describes the parameters.
Parameter | Type | Required | Description |
target | string | Yes | The target file format. Valid values:
|
source | string | No | The source file format. By default, the format is determined by the object's file extension. Valid values:
|
pages | string | No | The page numbers to convert. For example, |
You must use the sys/saveas parameter to save the converted document to a specified bucket. For more information, see sys/saveas. If you need to receive the processing results of the conversion task, use the notify parameter. For more information, see message notification.
Advanced scenarios
Document conversion tasks are submitted as asynchronous requests. This means the immediate response does not contain the final result of the conversion, such as whether it succeeded or failed. To get the result, configure event notifications by using Simple Message Queue (SMQ) (formerly MNS). This provides instant notifications upon task completion, eliminating the need to poll for status.
Event notifications
API reference
The SDK methods are built on the RESTful API. For advanced customization, you can call the RESTful API directly. This requires you to manually calculate the signature in the Authorization header. For instructions, see Signature version 4 (recommended).
Convert documents
Before conversion
Document format: DOCX
Document name: example.docx
After conversion
File format: PNG
Storage path: oss://test-bucket/doc_images/{index}.png
b_dGVzdC1idWNrZXQ=: Saves the output to the bucket named test-bucket after transcoding is complete (
dGVzdC1idWNrZXQ=is the Base64-encoded value oftest-bucket).o_ZG9jX2ltYWdlcy97aW5kZXh9LnBuZw==: Saves the object to the doc_images directory. The {index} variable is replaced with the page number of example.docx to form the image filename (
ZG9jX2ltYWdlcy97aW5kZXh9LnBuZw==is the Base64-encoded value ofdoc_images/{index}.png).
Conversion completion notification: A message is sent to a Simple Message Queue (SMQ) (formerly MNS) topic named
test-topic.
Sample request
// Convert the example.docx file to PNG images.
POST /example.docx?x-oss-async-process HTTP/1.1
Host: doc-demo.oss-cn-hangzhou.aliyuncs.com
Date: Fri, 28 Oct 2022 06:40:10 GMT
Authorization: SignatureValue
x-oss-async-process=doc/convert,target_png,source_docx|sys/saveas,b_dGVzdC1idWNrZXQ=,o_ZG9jX2ltYWdlcy97aW5kZXh9LnBuZw==/notify,topic_dGVzdC10b3BpYwUsage notes
Document conversion supports only asynchronous processing (x-oss-async-process).
Anonymous access is not supported.
The maximum source file size is 200 MB. This limit is fixed.
FAQ
Convert a specific Excel worksheet
No. OSS document conversion converts all worksheets in an Excel workbook. To convert a specific worksheet, call the IMM CreateOfficeConversionTask operation and set the SheetIndex parameter.
Related documents
For more information about document conversion, see document conversion.