The document format conversion feature allows you to convert the format of a document into another format that suits your application, and save the converted document to an Object Storage Service (OSS) bucket for later use.
Scenarios
Optimized document preview: Document conversion allows you to convert documents stored in OSS, such as PDF, Word, Excel, and PowerPoint, into an image format, which is more suitable for online preview. Viewers can preview the documents in web browsers or on mobile applications, without the need to download them.
Cross-platform compatibility: A specific document format may not be universally supported across devices or operating systems. The document conversion feature enables cross-platform support for various document formats.
Prerequisites
An application for the data processing capabilities of the new Intelligent Media Management (IMM) version based on the GET and POST methods is submitted in Quota Center and approved.
The OSS bucket that contains the document to be converted is bound to an IMM project. You can bind a bucket to an IMM project in the OSS console or by using the IMM API.
For more information about how to bind a bucket to an IMM project in the OSS console, see Step 1: Bind a bucket to a project.
For more information about how to bind a bucket to an IMM project by using the IMM API, see AttachOSSBucket.
The permission of writing data to the bucket is granted to the role. For more information, see Example 1: Authorize a RAM user to fully control a bucket.
Usage notes
Document conversion supports only asynchronous processing (x-oss-async-process).
Anonymous access will be denied.
You must have the required permissions to use the feature. For more information, see permissions.
Parameters
Action: doc/preview
The following table describes the parameters for document conversion.
Parameter | Type | Required | Description |
target | string | Yes | The format of the output object. Valid values:
|
source | string | No | The type of the source document. By default, the extension of the document name is used. If the document name does not contain an extension, you can specify a value for this parameter. Valid values:
|
pages | string | No | The numbers of pages to be converted. For example, a value of |
You may also need to use the sys/saveas
and notify
parameters when you convert a document. For more information, see sys/saveas and Message notification.
Use the RESTful API
Conversion example
Before conversion
Document format: DOCX
Document name: example.docx
Processing method: Convert the document format
After conversion
Object format: PNG
Storage path: oss://test-bucket/doc_images/{index}.png
b_dGVzdC1idWNrZXQ=: Save the output object to a bucket named test-bucket (dGVzdC1idWNrZXQ= is the Base64-encoded value of test-bucket).
o_ZG9jX2ltYWdlcy97aW5kZXh9LnBuZw==: Store the output object in the doc_images directory with the corresponding page number as the name specified by the index variable (ZG9jX2ltYWdlcy97aW5kZXh9LnBuZw== is the Base64-encoded value of doc_images/{index}.png).
Conversion completion notification: Send a message to a topic named doc_images in Simple Message Queue (SMQ)
Sample request
// Convert the example.docx object to PNG images.
POST /exmaple.docx? x-oss-async-process HTTP/1.1
Host: doc-demo.oss-cn-hangzhou.aliyuncs.com
Date: Fri, 28 Oct 2022 06:40:10 GMT
Authorization: SignatureValue
x-oss-async-process=doc/convert,target_png,source_docx|sys/saveas,b_dGVzdC1idWNrZXQ,o_ZG9jX2ltYWdlcy97aW5kZXh9LnBuZw/notify,topic_ZG9jX2ltYWdlcw
Use OSS SDKs
You can use OSS SDKs only for Java, Python, or Go to convert a document.
Java
OSS SDK for Java V3.17.4 or later is required.
import com.aliyun.oss.ClientBuilderConfiguration;
import com.aliyun.oss.OSS;
import com.aliyun.oss.OSSClientBuilder;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;
import com.aliyun.oss.model.AsyncProcessObjectRequest;
import com.aliyun.oss.model.AsyncProcessObjectResult;
import com.aliyuncs.exceptions.ClientException;
import java.util.Base64;
public class Demo1 {
public static void main(String[] args) throws ClientException {
// Specify the endpoint of the region in which the bucket is located.
String endpoint = "https://oss-cn-hangzhou.aliyuncs.com";
// Specify the ID of the Alibaba Cloud region in which the bucket is located. Example: cn-hangzhou.
String region = "cn-hangzhou";
// Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
// Specify the name of the bucket.
String bucketName = "examplebucket";
// Specify the name of the output object.
String targetKey = "dest.png";
// Specify the name of the source document.
String sourceKey = "src.docx";
// Create an OSSClient instance.
ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
OSS ossClient = OSSClientBuilder.create()
.endpoint(endpoint)
.credentialsProvider(credentialsProvider)
.clientConfiguration(clientBuilderConfiguration)
.region(region)
.build();
try {
// Create a style variable of the string type to store document conversion parameters.
String style = String.format("doc/convert,target_png,source_docx");
// Create an asynchronous processing instruction.
String bucketEncoded = Base64.getUrlEncoder().withoutPadding().encodeToString(bucketName.getBytes());
String targetEncoded = Base64.getUrlEncoder().withoutPadding().encodeToString(targetKey.getBytes());
String process = String.format("%s|sys/saveas,b_%s,o_%s/notify,topic_QXVkaW9Db252ZXJ0", style, bucketEncoded, targetEncoded);
// Create an AsyncProcessObjectRequest object.
AsyncProcessObjectRequest request = new AsyncProcessObjectRequest(bucketName, sourceKey, process);
// Execute the asynchronous processing task.
AsyncProcessObjectResult response = ossClient.asyncProcessObject(request);
System.out.println("EventId: " + response.getEventId());
System.out.println("RequestId: " + response.getRequestId());
System.out.println("TaskId: " + response.getTaskId());
} finally {
// Shut down the OSSClient instance.
ossClient.shutdown();
}
}
}
Python
OSS SDK for Python V2.18.4 or later is required.
# -*- coding: utf-8 -*-
import base64
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
def main():
# Obtain the temporary access credentials from the environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
endpoint = 'https://oss-cn-hangzhou.aliyuncs.com'
# Specify the ID of the Alibaba Cloud region in which the bucket is located. Example: cn-hangzhou.
region = 'cn-hangzhou'
# Specify the name of the bucket. Example: examplebucket.
bucket = oss2.Bucket(auth, endpoint, 'examplebucket', region=region)
# Specify the name of the source document.
source_key = 'src.docx'
# Specify the name of the output object.
target_key = 'dest.png'
# Create a style variable of the string type to store document conversion parameters.
animation_style = 'doc/convert,target_png,source_docx'
# Create a processing instruction, in which the name of the bucket and the name of the output object are Base64-encoded.
bucket_name_encoded = base64.urlsafe_b64encode('examplebucket'.encode()).decode().rstrip('=')
target_key_encoded = base64.urlsafe_b64encode(target_key.encode()).decode().rstrip('=')
process = f"{animation_style}|sys/saveas,b_{bucket_name_encoded},o_{target_key_encoded}/notify,topic_QXVkaW9Db252ZXJ0"
try:
# Execute the asynchronous processing task.
result = bucket.async_process_object(source_key, process)
print(f"EventId: {result.event_id}")
print(f"RequestId: {result.request_id}")
print(f"TaskId: {result.task_id}")
except Exception as e:
print(f"Error: {e}")
if __name__ == "__main__":
main()
Go
OSS SDK for Go V3.0.2 or later is required.
package main
import (
"encoding/base64"
"fmt"
"os"
"github.com/aliyun/aliyun-oss-go-sdk/oss"
"log"
)
func main() {
// Obtain temporary access credentials from the environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
provider, err := oss.NewEnvironmentVariableCredentialsProvider()
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Create an OSSClient instance.
// Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com.
// Specify the ID of the Alibaba Cloud region in which the bucket is located. Example: cn-hangzhou.
client, err := oss.New("https://oss-cn-hangzhou.aliyuncs.com", "", "", oss.SetCredentialsProvider(&provider), oss.AuthVersion(oss.AuthV4), oss.Region("cn-hangzhou"))
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Specify the name of the bucket. Example: examplebucket.
bucketName := "examplebucket"
bucket, err := client.Bucket(bucketName)
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Specify the name of the source document.
sourceKey := "src.docx"
// Specify the name of the output object.
targetKey := "dest.png"
// Create a style variable of the string type to store document conversion parameters.
animationStyle := "doc/convert,target_png,source_docx"
// Create a processing instruction, in which the name of the bucket and the name of the output object are Base64-encoded.
bucketNameEncoded := base64.URLEncoding.EncodeToString([]byte(bucketName))
targetKeyEncoded := base64.URLEncoding.EncodeToString([]byte(targetKey))
process := fmt.Sprintf("%s|sys/saveas,b_%v,o_%v/notify,topic_QXVkaW9Db252ZXJ0", animationStyle, bucketNameEncoded, targetKeyEncoded)
// Run the asynchronous processing task.
result, err := bucket.AsyncProcessObject(sourceKey, process)
if err != nil {
log.Fatalf("Failed to async process object: %s", err)
}
fmt.Printf("EventId: %s\n", result.EventId)
fmt.Printf("RequestId: %s\n", result.RequestId)
fmt.Printf("TaskId: %s\n", result.TaskId)
}
FAQ
Can I convert a specific worksheet within an Excel workbook?
No, you cannot convert a specific worksheet within an Excel workbook. The document conversion feature allows you to convert only all worksheets within an Excel workbook. To convert only a specific worksheet, you can call the CreateOfficeConversionTask operation of IMM and use the SheetIndex parameter to specify the worksheet.