The Object Storage Service (OSS) data indexing feature allows you to efficiently collect statistics from a large number of objects, such as the number and size of objects. Compared with the ListObjects operation, the data indexing feature significantly improves the statistical efficiency and simplifies the operation process. The data indexing feature is suitable for large-scale data statistics.
Benefits
Enterprise A stores 200 million objects which are classified by business prefix and 1.8 million directories in a bucket named mybucket in the China (Guangzhou) region. After you use the data indexing feature, the amount of time consumed by object statistics can be reduced by 83%.
Traditional method | Data indexing | |
Duration | 2 hours per day | 20 minutes per day |
Complexity | If the number of objects in a directory is greater than 1,000, you must call the ListObject operation multiple times. | You need only to call the DoMetaQuery operation once for each directory. |
Process
The following figure shows how to use the data indexing feature for large-scale data statistics.
Process:
Enable the data indexing feature: OSS automatically creates an index table that contains object metadata, user metadata, and object tags.
Search for objects and collect statistics: You must specify the search conditions and call the DoMetaQuery operation.
Finally, OSS returns statistics, such as the number, total size, and average size of objects that meet the search conditions for your analysis.
Use data indexing to search for objects that meet specific conditions in large volumes of data
Step 1: Enable the data indexing feature
Use the OSS console
Log on to the OSS console.
In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the bucket for which you want to enable the data indexing feature.
In the left-side navigation tree, choose .
On the Data Indexing page, click Enable Now.
In the Data Indexing dialog box, select MetaSearch and click Enable.
Use OSS SDKs
Currently, only OSS SDK for Java, OSS SDK for Python, and OSS SDK for Go allow you to use MetaSearch to query objects that meet specific conditions.
import com.aliyun.oss.*;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;
public class Demo {
// In this example, the endpoint of the China (Guangzhou) region is used. Specify your actual endpoint.
private static String endpoint = "https://oss-cn-guangzhou.aliyuncs.com";
// Specify the name of the bucket. Example: examplebucket.
private static String bucketName = "examplebucket";
public static void main(String[] args) throws com.aliyuncs.exceptions.ClientException {
// Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
// Specify the region in which the bucket is located. For example, if the bucket is located in the China (Guangzhou) region, set the region to cn-guangzhou.
String region = "cn-guangzhou";
// Create an OSSClient instance.
ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
OSS ossClient = OSSClientBuilder.create()
.endpoint(endpoint)
.credentialsProvider(credentialsProvider)
.clientConfiguration(clientBuilderConfiguration)
.region(region)
.build();
try {
// Enable the data indexing feature.
ossClient.openMetaQuery(bucketName);
} catch (OSSException oe) {
System.out.println("Error Message:" + oe.getErrorMessage());
System.out.println("Error Code:" + oe.getErrorCode());
System.out.println("Request ID:" + oe.getRequestId());
System.out.println("Host ID:" + oe.getHostId());
} catch (ClientException ce) {
System.out.println("Error Message: " + ce.getMessage());
} finally {
// Shut down the OSSClient instance.
ossClient.shutdown();
}
}
}
# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Guangzhou) region, set the endpoint to https://oss-cn-guangzhou.aliyuncs.com.
endpoint = "https://oss-cn-guangzhou.aliyuncs.com"
# Specify the region of the bucket. Example: cn-guangzhou. This parameter is required if you use the V4 signature algorithm.
region = "cn-guangzhou"
# Replace examplebucket with the name of the bucket.
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)
# Enable the data indexing feature.
bucket.open_bucket_meta_query()package main
import (
"context"
"flag"
"log"
"github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss"
"github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss/credentials"
)
var (
region string // Specify a variable to store the region information obtained from the command line.
bucketName string // Specify a variable to store the bucket name obtained from the command line.
)
// The init function is executed before the main function to initialize the program.
func init() {
// Use a command line parameter to specify the region.
flag.StringVar(®ion, "region", "", "The region in which the bucket is located.")
// Use a command line parameter to specify the bucket name.
flag.StringVar(&bucketName, "bucket", "", "The name of the bucket.")
}
func main() {
flag.Parse() // Parse the command line parameters.
// Check whether the bucket name is specified. If the bucket name is not specified, return the default parameters and exit the program.
if len(bucketName) == 0 {
flag.PrintDefaults()
log.Fatalf("invalid parameters, bucket name required") // Record the error and exit the program.
}
// Check whether the region is specified. If the region is not specified, return the default parameters and exit the program.
if len(region) == 0 {
flag.PrintDefaults()
log.Fatalf("invalid parameters, region required") // Record the error and exit the program.
}
// Create and configure a client and use environment variables to pass the credential provider.
cfg := oss.LoadDefaultConfig().
WithCredentialsProvider(credentials.NewEnvironmentVariableCredentialsProvider()).
WithRegion(region)
client := oss.NewClient(cfg) // Use the client configurations to create a new OSSClient instance.
// Create an OpenMetaQuery request to enable the metadata management feature for a specific bucket.
request := &oss.OpenMetaQueryRequest{
Bucket: oss.Ptr(bucketName), // Specify the name of the bucket.
}
result, err := client.OpenMetaQuery(context.TODO(), request) // Execute the request to enable the metadata management feature for the bucket.
if err != nil {
log.Fatalf("failed to open meta query %v", err) // If an error occurs, record the error message and exit the program.
}
log.Printf("open meta query result:%#v\n", result) // Display the results of the request.
}Step 2: Search for objects and collect statistics
Use the OSS console
Search condition settings
In the left-side navigation tree, choose .
Set Storage Class to Standard and ACL to Private.
Select Fuzzy match from the drop-down list and enter a/b in the box for the Object Name parameter.
Search result settings
Sort the objects in descending order based on the last modified time of the objects: set Object Sort Order to Descending and select Last Modified Time from the Sorted By drop-down list.
Calculate the total size and average size of the objects: create two items for the Data Aggregation parameter. 1. Select Object Size from the Output drop-down list and Sum from the By drop-down list. 2. Select Object Size from the Output drop-down list and Average from the By drop-down list.
Calculate the number of objects: select Storage Class from the Output drop-down list and Group Count from the By drop-down list.
Click Query Now.
Use OSS SDKs
import com.aliyun.oss.*;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;
import com.aliyun.oss.model.*;
import java.util.ArrayList;
import java.util.List;
public class Demo {
// In this example, the endpoint of the China (Guangzhou) region is used. Specify your actual endpoint.
private static String endpoint = "https://oss-cn-guangzhou.aliyuncs.com";
// Specify the name of the bucket. Example: examplebucket.
private static String bucketName = "examplebucket";
public static void main(String[] args) throws Exception {
// Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
// Specify the region in which the bucket is located. For example, if the bucket is located in the China (Guangzhou) region, set the region to cn-guangzhou.
String region = "cn-guangzhou";
// Create an OSSClient instance.
ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
OSS ossClient = OSSClientBuilder.create()
.endpoint(endpoint)
.credentialsProvider(credentialsProvider)
.clientConfiguration(clientBuilderConfiguration)
.region(region)
.build();
try {
// Set maxResults to 20 to allow up to 20 objects to be returned.
int maxResults = 20;
// Specify the search conditions: set the prefix to a/b, the storage class to Standard, and the access control list (ACL) to private.
// Use "and" to connect multiple search conditions in the query statement.
String query = "{\n" +
" \"Operation\": \"and\",\n" +
" \"SubQueries\": [\n" +
" {\n" +
" \"Field\": \"Filename\",\n" +
" \"Value\": \"a/b\",\n" +
" \"Operation\": \"match\"\n" +
" },\n" +
" {\n" +
" \"Field\": \"OSSStorageClass\",\n" +
" \"Value\": \"Standard\",\n" +
" \"Operation\": \"eq\"\n" +
" },\n" +
" {\n" +
" \"Field\": \"ObjectACL\",\n" +
" \"Value\": \"private\",\n" +
" \"Operation\": \"eq\"\n" +
" }\n" +
" ]\n" +
"}";
String sort = "FileModifiedTime";// Sort the objects by their last modified time.
// Create an aggregation operation instance to calculate the total size, the average size, and the number of the objects.
Aggregation aggregationRequest1 = new Aggregation();
aggregationRequest 1.setField("Size");// Set the aggregation field to Size.
aggregationRequest 1.setOperation("sum");// Calculate the total size of all objects.
Aggregation aggregationRequest2 = new Aggregation();
aggregationRequest 2.setField("Size");// Set the aggregation field to Size.
aggregationRequest 2.setOperation("count");// Calculate the number of the objects.
Aggregation aggregationRequest3 = new Aggregation();
aggregationRequest 3.setField("Size");// Set the aggregation field to Size.
aggregationRequest 3.setOperation("average");// Calculate the average size of the objects.
// Add all aggregation requests to a list.
Aggregations aggregations = new Aggregations();
List<Aggregation> aggregationList = new ArrayList<>();
aggregationList.add(aggregationRequest 1);// Add the sum aggregation request.
aggregationList.add(aggregationRequest 2);// Add the count aggregation request.
aggregationList.add(aggregationRequest 3);// Add the average aggregation request.
aggregations.setAggregation(aggregationList);// Add all aggregation operations to the Aggregations object.
// Create a DoMetaQuery request. Specify the bucket name, maximum number of returned objects, search conditions, and sorting method.
DoMetaQueryRequest doMetaQueryRequest = new DoMetaQueryRequest(bucketName, maxResults, query, sort);
// Add all aggregation operations to the DoMetaQuery request.
doMetaQueryRequest.setAggregations(aggregations);
// Set the sorting method to DESC.
doMetaQueryRequest.setOrder(SortOrder.DESC);
// Execute the DoMetaQuery request to obtain the results.
DoMetaQueryResult doMetaQueryResult = ossClient.doMetaQuery(doMetaQueryRequest);
// Verify the results.
if (doMetaQueryResult.getFiles() != null) {
// If the object list is not empty, traverse and display the object information.
for (ObjectFile file : doMetaQueryResult.getFiles().getFile()) {
System.out.println("Filename: " + file.getFilename()); // The name of the object.
System.out.println("ETag: " + file.getETag());// The ETag of the object.
System.out.println("ObjectACL: " + file.getObjectACL()); // The ACL of the object.
System.out.println("OssObjectType: " + file.getOssObjectType());// The type of the object.
System.out.println("OssStorageClass: " + file.getOssStorageClass());// The storage class of the object.
System.out.println("TaggingCount: " + file.getOssTaggingCount()); // The number of tags.
if (file.getOssTagging() != null) {
// Display the tags.
for (Tagging tag : file.getOssTagging().getTagging()) {
System.out.println("Key: " + tag.getKey());
System.out.println("Value: " + tag.getValue());
}
}
if (file.getOssUserMeta() != null) {
// Display user metadata.
for (UserMeta meta : file.getOssUserMeta().getUserMeta()) {
System.out.println("Key: " + meta.getKey());
System.out.println("Value: " + meta.getValue());
}
}
}
} else if (doMetaQueryResult.getAggregations() != null) {
// If the aggregation results exist, traverse and display the aggregation information.
for (Aggregation aggre : doMetaQueryResult.getAggregations().getAggregation()) {
System.out.println("Field: " + aggre.getField());// The aggregation field.
System.out.println("Operation: " + aggre.getOperation()); // The aggregation operation.
System.out.println("Value: " + aggre.getValue());// The value of the aggregation operation.
if (aggre.getGroups() != null && aggre.getGroups().getGroup().size() > 0) {
// Query the value of the aggregation operation by group.
System.out.println("Groups value: " + aggre.getGroups().getGroup().get(0).getValue());
// Query the total number of the aggregation operations by group.
System.out.println("Groups count: " + aggre.getGroups().getGroup().get(0).getCount());
}
}
} else {
System.out.println("NextToken: " + doMetaQueryResult.getNextToken());
}
} catch (OSSException oe) {
// Capture OSS exceptions and display related information.
System.out.println("Error Message:" + oe.getErrorMessage());
System.out.println("Error Code:" + oe.getErrorCode());
System.out.println("Request ID:" + oe.getRequestId());
System.out.println("Host ID:" + oe.getHostId());
} catch (ClientException ce) {
// Capture client exceptions and display error messages.
System.out.println("Error Message: " + ce.getMessage());
} finally {
// Shut down the OSSClient instance.
ossClient.shutdown();
}
}
}# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
from oss2.models import MetaQuery, AggregationsRequest
import json
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Guangzhou) region, set the endpoint to https://oss-cn-guangzhou.aliyuncs.com.
endpoint = "https://oss-cn-guangzhou.aliyuncs.com"
# Specify the region of the bucket. Example: cn-guangzhou. This parameter is required if you use the V4 signature algorithm.
region = "cn-guangzhou"
# Specify the name of the bucket. Example: examplebucket.
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)
# Specify the search conditions: set the prefix to a/b, the storage class to Standard, and the ACL to private.
query = {
"Operation": "and",
"SubQueries": [
{"Field": "Filename", "Value": "a/b", "Operation": "match"},
{"Field": "OSSStorageClass", "Value": "Standard", "Operation": "eq"},
{"Field": "ObjectACL", "Value": "private", "Operation": "eq"}
]
}
# Convert the format of the search conditions to JSON.
query_json = json.dumps(query)
# Create an aggregation operation instance to calculate the total size, the average size, and the number of the objects.
aggregations = [
AggregationsRequest(field="Size", operation="sum"), # Calculate the total size of all objects.
AggregationsRequest(field="Size", operation="count"), # Calculate the number of the objects.
AggregationsRequest(field="Size", operation="average") # Calculate the average size of the objects.
]
# Create a DoMetaQuery request. Specify the search conditions, maximum number of returned objects, sorting method, and aggregation operations.
do_meta_query_request = MetaQuery(
max_results=20, # Return up to 20 objects.
query=query_json, # Specify search conditions.
sort="FileModifiedTime", # Sort the objects by last modified time.
order="desc", # Sort the objects in descending order.
aggregations=aggregations # Specify aggregation operations.
)
# Execute the DoMetaQuery request to obtain the results.
result = bucket.do_bucket_meta_query(do_meta_query_request)
# Display the information of the objects that meet the search conditions in the results.
if result.files:
for file in result.files:
print(f"Filename: {file.file_name}") # Display the name of the object.
print(f"ETag: {file.etag}") # Display the ETag of the object.
print(f"ObjectACL: {file.object_acl}") # Display the ACL of the object.
print(f"OssObjectType: {file.oss_object_type}") # Display the type of the object.
print(f"ETag: {file.etag}") # Display the ETag of the object.
print(f"TaggingCount: {file.oss_tagging_count}") # Display the number of tags of the object.
// Display all tags of the object.
if file.oss_tagging:
for tag in file.oss_tagging:
print(f"Key: {tag. Key}") # Display the key of the tag.
print(f"Value: {tag.value}") # Display the value of the tag.
// Display user metadata.
if file.oss_user_meta:
for meta in file.oss_user_meta:
print(f"Key: {tag. Key}") # Display the key of a piece of user metadata.
print(f"Value: {tag.value}") # Display the value of a piece of user metadata.
# Display the results of the aggregation operations.
if result.aggregations:
for aggre in result.aggregations:
print(f"Field: {aggre.field}") # Display the aggregation operation field.
print(f"Operation: {aggre.operation}") # Display the aggregation operation type, such as sum, count, and average.
print(f"Value: {aggre.value}") # Display the values of the aggregation operations.package main
import (
"context"
"encoding/json"
"flag"
"fmt"
"log"
"github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss"
"github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss/credentials"
)
var (
region string // Specify a variable to store the region information obtained from the command line.
bucketName string // Specify a variable to store the bucket name obtained from the command line.
)
// The init function is executed before the main function to initialize the program.
func init() {
// Use a command line parameter to specify the region. By default, the parameter is an empty string.
flag.StringVar(®ion, "region", "", "The region in which the bucket is located.")
// Use a command line parameter to specify the bucket name. By default, the parameter is an empty string.
flag.StringVar(&bucketName, "bucket", "", "The name of the bucket.")
}
func main() {
flag.Parse() // Parse the command line parameters.
// Check whether the bucket name is specified. If the bucket name is not specified, return the default parameters and exit the program.
if len(bucketName) == 0 {
flag.PrintDefaults()
log.Fatalf("invalid parameters, bucket name required")
}
// Check whether the region is specified. If the region is not specified, return the default parameters and exit the program.
if len(region) == 0 {
flag.PrintDefaults()
log.Fatalf("invalid parameters, region required")
}
// Create and configure a client and use environment variables to pass the credential provider and the region.
cfg := oss.LoadDefaultConfig().
WithCredentialsProvider(credentials.NewEnvironmentVariableCredentialsProvider()).
WithRegion(region)
client := oss.NewClient(cfg) // Use the client configurations to create a new OSSClient instance.
# Specify the search conditions: set the prefix to a/b, the storage class to Standard, and the ACL to private.
query := map[string]interface{}{
"Operation": "and",
"SubQueries": []map[string]interface{}{
{"Field": "Filename", "Value": "a/b", "Operation": "match"},
{"Field": "OSSStorageClass", "Value": "Standard", "Operation": "eq"},
{"Field": "ObjectACL", "Value": "private", "Operation": "eq"},
},
}
// Convert the format of the search conditions to JSON.
queryJSON, err := json.Marshal(query)
if err != nil {
log.Fatalf("failed to marshal query %v", err)
}
// Create an aggregation operation instance to calculate the total size, the average size, and the number of the objects.
aggregations := []oss.MetaQueryAggregation{
{Field: oss.Ptr("Size"), Operation: oss.Ptr("sum")}, // Calculate the total size of all objects.
{Field: oss.Ptr("Size"), Operation: oss.Ptr("count")}, // Calculate the number of the objects.
{Field: oss.Ptr("Size"), Operation: oss.Ptr("average")}, // Calculate the average size of the objects.
}
// Create a DoMetaQuery request to query the objects that meet specific conditions.
request := &oss.DoMetaQueryRequest{
Bucket: oss.Ptr(bucketName), // Specify the name of the bucket.
MetaQuery: &oss.MetaQuery{
MaxResults: oss.Ptr(int64(20)), // Set the maximum number of objects to return to 20.
Query: oss.Ptr(string(queryJSON)), // Query the objects whose size is larger than 1 MB.
Sort: oss.Ptr("Size"), // List the objects by size.
Order: oss.MetaQueryOrderAsc, // Sort the objects in ascending order.
Aggregations: &oss.MetaQueryAggregations{
Aggregations: aggregations}, // Create an aggregation operation instance to calculate the total size, the average size, and the number of the objects.
},
}
result, err := client.DoMetaQuery(context.TODO(), request) // Execute the request to query the objects that meet the preceding conditions.
if err != nil {
log.Fatalf("failed to do meta query %v", err)
}
// Display the token used to query data on the next page.
fmt.Printf("NextToken:%s\n", *result.NextToken)
// Traverse the returned results and display the details of each object.
for _, file := range result.Files {
fmt.Printf("File name: %s\n", *file.Filename)
fmt.Printf("size: %d\n", file.Size)
fmt.Printf("File Modified Time:%s\n", *file.FileModifiedTime)
fmt.Printf("Oss Object Type:%s\n", *file.OSSObjectType)
fmt.Printf("Oss Storage Class:%s\n", *file.OSSStorageClass)
fmt.Printf("Object ACL:%s\n", *file.ObjectACL)
fmt.Printf("ETag:%s\n", *file.ETag)
fmt.Printf("Oss CRC64:%s\n", *file.OSSCRC64)
if file.OSSTaggingCount != nil {
fmt.Printf("Oss Tagging Count:%d\n", *file.OSSTaggingCount)
}
// Display the tags of the object.
for _, tagging := range file.OSSTagging {
fmt.Printf("Oss Tagging Key:%s\n", *tagging.Key)
fmt.Printf("Oss Tagging Value:%s\n", *tagging.Value)
}
// Display the user metadata.
for _, userMeta := range file.OSSUserMeta {
fmt.Printf("Oss User Meta Key:%s\n", *userMeta.Key)
fmt.Printf("Oss User Meta Key Value:%s\n", *userMeta.Value)
}
}
// Display the results of the aggregation operations.
for _, aggregation := range result.Aggregations {
fmt.Printf("Aggregation Field:%s\n", *aggregation.Field)
fmt.Printf("Aggregation Operation:%s\n", *aggregation.Operation)
fmt.Printf("Aggregation Value:%f\n", *aggregation.Value)
}
}
Step 3: Verify the results
Use the OSS console
The Query Results page shows that the total size of 100 Standard objects that meet the search conditions is 19.53 MB, and the average size of each object is about 200 KB.
Use OSS SDKs
The following figure shows that the total size of 100 Standard objects that meet the search conditions is 19.53 MB, and the average size of each object is about 200 KB.

References
If your business requires a high level of customization, you can directly call RESTful APIs. To directly call an API, you must include the signature calculation in your code. For more information, see (Recommended) Include a V4 signature and DoMetaQuery.