All Products
Search
Document Center

Object Storage Service:Using OSS data indexing for large-scale data statistics

Last Updated:Jun 20, 2026

With OSS data indexing, you can efficiently aggregate statistics for a massive number of objects, such as object counts and sizes. Compared with the traditional method of using the ListObjects operation, data indexing significantly improves efficiency and simplifies the process, making it ideal for large-scale data aggregation scenarios.

Benefits

A company stores 200 million objects, organized by business prefixes into 1.8 million directories, in a bucket named mybucket in the China (Guangzhou) region. By using OSS data indexing, the time required for object aggregation is reduced by 83%.

Traditional method

OSS data indexing

Duration

Daily aggregation takes 2 hours

Daily aggregation takes 20 minutes

Complexity

For directories with more than 1,000 objects, you must call the ListObjects operation multiple times.

You only need to call the DoMetaQuery operation once for each directory.

Overview

image

This process involves the following steps:

  1. Enable data indexing: OSS automatically creates an index table that includes object metadata, custom metadata, and object tags.

  2. Initiate a query and aggregation: Set the query conditions and then call the DoMetaQuery operation. OSS performs a fast query.

Finally, OSS returns statistics for the matching objects, such as the total count, total size, and average size.

Quick start

Step 1: Enable data indexing

OSS console

  1. Log on to the OSS console.

  2. In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the destination bucket.

  3. In the left-side navigation pane, choose Object Management > Data Indexing.

  4. On the Data Indexing page, if you are using this feature for the first time, follow the on-screen instructions to grant permissions to the AliyunMetaQueryDefaultRole role. This allows the OSS service to manage data in your bucket. After you grant the permissions, click Enable data indexing.

  5. Select MetaSearch and click Enable.

OSS SDK

Only the OSS SDKs for Java, Python, and Go support the MetaSearch feature for querying objects that meet specified conditions.

import com.aliyun.oss.*;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;
public class Demo {
    // Use your actual endpoint. This example uses the endpoint for the China (Guangzhou) region.
    private static String endpoint = "https://oss-cn-guangzhou.aliyuncs.com";
    // Specify your bucket name, for example, "examplebucket".
    private static String bucketName = "examplebucket";
    public static void main(String[] args) throws com.aliyuncs.exceptions.ClientException {
        // Obtain access credentials from environment variables. Before running the code, ensure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are set.
        EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
        // Specify the region where the bucket is located. For example, if the bucket is in the China (Guangzhou) region, set the region to cn-guangzhou.
        String region = "cn-guangzhou";
        // Create an OSSClient instance.
        // When the OSSClient instance is no longer needed, call the shutdown method to release its resources.
        ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
        clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
        OSS ossClient = OSSClientBuilder.create()
                .endpoint(endpoint)
                .credentialsProvider(credentialsProvider)
                .clientConfiguration(clientBuilderConfiguration)
                .region(region)
                .build();
        try {
            // Enable the data indexing feature.
            ossClient.openMetaQuery(bucketName);
        } catch (OSSException oe) {
            System.out.println("Error Message:" + oe.getErrorMessage());
            System.out.println("Error Code:" + oe.getErrorCode());
            System.out.println("Request ID:" + oe.getRequestId());
            System.out.println("Host ID:" + oe.getHostId());
        } catch (ClientException ce) {
            System.out.println("Error Message: " + ce.getMessage());
        } finally {
            // Shut down the OSSClient instance.
            ossClient.shutdown();
        }
    }
}
# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
# Obtain access credentials from environment variables. Ensure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are set before running the code.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint for the region where your bucket is located. For example, if the bucket is in the China (Guangzhou) region, set the endpoint to https://oss-cn-guangzhou.aliyuncs.com.
endpoint = "https://oss-cn-guangzhou.aliyuncs.com"
# Specify the region that corresponds to the endpoint. Example: cn-guangzhou. This parameter is required if you use Signature Version 4.
region = "cn-guangzhou"
# Specify your bucket name, for example, "examplebucket".
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)
# Enable the data indexing feature.
bucket.open_bucket_meta_query()
package main
import (
	"context"
	"flag"   
	"log"     
	"github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss"          
	"github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss/credentials" 
)
var (
	region     string // Stores the region from the command-line flag.
	bucketName string // Stores the bucket name from the command-line flag.
)
// The init function is executed before main to initialize the program.
func init() {
	// Add a command-line flag for the region.
	flag.StringVar(&region, "region", "", "The region in which the bucket is located.")
	// Add a command-line flag for the bucket name.
	flag.StringVar(&bucketName, "bucket", "", "The name of the bucket.")
}
func main() {
	flag.Parse() // Parse the command-line flags.
	// Check if the bucket name is provided. If not, print flags and exit.
	if len(bucketName) == 0 {
		flag.PrintDefaults()
		log.Fatalf("invalid parameters, bucket name required") 
	}
	// Check if the region is provided. If not, print flags and exit.
	if len(region) == 0 {
		flag.PrintDefaults()
		log.Fatalf("invalid parameters, region required") 
	}
	// Create a client configuration that uses an environment variable credentials provider.
	cfg := oss.LoadDefaultConfig().
		WithCredentialsProvider(credentials.NewEnvironmentVariableCredentialsProvider()).
		WithRegion(region)
	client := oss.NewClient(cfg) // Create an OSS client from the configuration.
	// Build the request to enable data indexing for the bucket.
	request := &oss.OpenMetaQueryRequest{
		Bucket: oss.Ptr(bucketName), // Specify the target bucket.
	}
	result, err := client.OpenMetaQuery(context.TODO(), request) // Execute the request to enable data indexing.
	if err != nil {
		log.Fatalf("failed to open meta query %v", err) 
	}
	log.Printf("open meta query result:%#v\n", result) // Print the result.
}

Step 2: Initiate a query and aggregation

OSS console

Set query conditions

  1. In the left-side navigation pane, choose Object Management > Data Indexing.

  2. For Storage Class, select Standard. For ACL, select private.

  3. Use Fuzzy Match for the object prefix and enter a/b.

Configure output results

  1. Sort the results in descending order by Last Modified At.

  2. Calculate the Sum and Average of the filtered object sizes.

  3. Use Group Count by Storage Class to count the number of objects.

  1. Click Query Now.

OSS SDK

import com.aliyun.oss.*;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;
import com.aliyun.oss.model.*;
import java.util.ArrayList;
import java.util.List;
public class Demo {
    // The endpoint of the China (Guangzhou) region is used as an example. Replace it with the actual endpoint.
    private static String endpoint = "https://oss-cn-guangzhou.aliyuncs.com";
    // Specify the bucket name. Example: examplebucket.
    private static String bucketName = "examplebucket";
    public static void main(String[] args) throws Exception {
        // Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
        EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
        // Specify the region where the bucket is located. For example, if the bucket is in the China (Guangzhou) region, set the region to cn-guangzhou.
        String region = "cn-guangzhou";
        // Create an OSSClient instance.
        // When the OSSClient instance is no longer needed, call the shutdown method to release its resources.
        ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
        clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
        OSS ossClient = OSSClientBuilder.create()
                .endpoint(endpoint)
                .credentialsProvider(credentialsProvider)
                .clientConfiguration(clientBuilderConfiguration)
                .region(region)
                .build();
        try {
            // Set the maximum number of objects to return.
            int maxResults = 20;
            // Set the query conditions: filename matches "a/b", storage class is "Standard", and ACL is "private".
            // The query uses the "and" operator to connect subqueries.
            String query = "{\n" +
                    "  \"Operation\": \"and\",\n" +
                    "  \"SubQueries\": [\n" +
                    "    {\n" +
                    "      \"Field\": \"Filename\",\n" +
                    "      \"Value\": \"a/b\",\n" +
                    "      \"Operation\": \"match\"\n" +
                    "    },\n" +
                    "    {\n" +
                    "      \"Field\": \"OSSStorageClass\",\n" +
                    "      \"Value\": \"Standard\",\n" +
                    "      \"Operation\": \"eq\"\n" +
                    "    },\n" +
                    "    {\n" +
                    "      \"Field\": \"ObjectACL\",\n" +
                    "      \"Value\": \"private\",\n" +
                    "      \"Operation\": \"eq\"\n" +
                    "    }\n" +
                    "  ]\n" +
                    "}";
            String sort = "FileModifiedTime";// Sort by last modified time.
            // Create aggregations to calculate the sum, count, and average of object sizes.
            Aggregation aggregationRequest1 = new Aggregation();
            aggregationRequest1.setField("Size");// Aggregate by size.
            aggregationRequest1.setOperation("sum");// Calculate the sum.
            Aggregation aggregationRequest2 = new Aggregation();
            aggregationRequest2.setField("Size");// Aggregate by size.
            aggregationRequest2.setOperation("count");// Calculate the count.
            Aggregation aggregationRequest3 = new Aggregation();
            aggregationRequest3.setField("Size");// Aggregate by size.
            aggregationRequest3.setOperation("average");// Calculate the average.
            // Add the aggregation requests to a list.
            Aggregations aggregations = new Aggregations();
            List<Aggregation> aggregationList = new ArrayList<>();
            aggregationList.add(aggregationRequest1);// Add the sum aggregation.
            aggregationList.add(aggregationRequest2);// Add the count aggregation.
            aggregationList.add(aggregationRequest3);// Add the average aggregation.
            aggregations.setAggregation(aggregationList);// Set the aggregations for the request.
            // Create a DoMetaQueryRequest.
            DoMetaQueryRequest doMetaQueryRequest = new DoMetaQueryRequest(bucketName, maxResults, query, sort);
            // Add the aggregations to the request.
            doMetaQueryRequest.setAggregations(aggregations);
            // Set the sort order to descending.
            doMetaQueryRequest.setOrder(SortOrder.DESC);
            // Execute the meta query.
            DoMetaQueryResult doMetaQueryResult = ossClient.doMetaQuery(doMetaQueryRequest);
            // Process the query results.
            if (doMetaQueryResult.getFiles() != null) {
                // If files are returned, iterate and print their information.
                for (ObjectFile file : doMetaQueryResult.getFiles().getFile()) {
                    System.out.println("Filename: " + file.getFilename()); // Filename
                    System.out.println("ETag: " + file.getETag());// ETag
                    System.out.println("ObjectACL: " + file.getObjectACL()); // ACL
                    System.out.println("OssObjectType: " + file.getOssObjectType());// Object type
                    System.out.println("OssStorageClass: " + file.getOssStorageClass());// Storage class
                    System.out.println("TaggingCount: " + file.getOssTaggingCount()); // Tag count
                    if (file.getOssTagging() != null) {
                        // Print object tags.
                        for (Tagging tag : file.getOssTagging().getTagging()) {
                            System.out.println("Key: " + tag.getKey());
                            System.out.println("Value: " + tag.getValue());
                        }
                    }
                    if (file.getOssUserMeta() != null) {
                        // Print user metadata.
                        for (UserMeta meta : file.getOssUserMeta().getUserMeta()) {
                            System.out.println("Key: " + meta.getKey());
                            System.out.println("Value: " + meta.getValue());
                        }
                    }
                }
            } else if (doMetaQueryResult.getAggregations() != null) {
                // If aggregations are returned, iterate and print the results.
                for (Aggregation aggre : doMetaQueryResult.getAggregations().getAggregation()) {
                    System.out.println("Field: " + aggre.getField());// Aggregation field
                    System.out.println("Operation: " + aggre.getOperation()); // Aggregation operation
                    System.out.println("Value: " + aggre.getValue());// Aggregation result value
                    if (aggre.getGroups() != null && aggre.getGroups().getGroup().size() > 0) {
                        // Get the value of the grouped aggregation.
                        System.out.println("Groups value: " + aggre.getGroups().getGroup().get(0).getValue());
                        // Get the total count of the grouped aggregation.
                        System.out.println("Groups count: " + aggre.getGroups().getGroup().get(0).getCount());
                    }
                }
            } else {
                System.out.println("NextToken: " + doMetaQueryResult.getNextToken());
            }
        } catch (OSSException oe) {
            // Catch OSS exceptions.
            System.out.println("Error Message:" + oe.getErrorMessage());
            System.out.println("Error Code:" + oe.getErrorCode());
            System.out.println("Request ID:" + oe.getRequestId());
            System.out.println("Host ID:" + oe.getHostId());
        } catch (ClientException ce) {
            // Catch and print client exceptions.
            System.out.println("Error Message: " + ce.getMessage());
        } finally {
            // Shut down the OSSClient instance.
            ossClient.shutdown();
        }
    }
}
# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
from oss2.models import MetaQuery, AggregationsRequest  
import json
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())
# Specify the endpoint of the region where the bucket is located. For example, if the bucket is in the China (Guangzhou) region, set the endpoint to https://oss-cn-guangzhou.aliyuncs.com.
endpoint = "https://oss-cn-guangzhou.aliyuncs.com"
# Specify the region that corresponds to the endpoint. Example: cn-guangzhou. This parameter is required if you use Signature Version 4.
region = "cn-guangzhou"
# Specify the bucket name. Example: examplebucket.
bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)
# Query conditions: filename matches "a/b", storage class is "Standard", and ACL is "private".
query = {
    "Operation": "and",
    "SubQueries": [
        {"Field": "Filename", "Value": "a/b", "Operation": "match"},
        {"Field": "OSSStorageClass", "Value": "Standard", "Operation": "eq"},
        {"Field": "ObjectACL", "Value": "private", "Operation": "eq"}
    ]
}
# Convert the query to a JSON string.
query_json = json.dumps(query)
# Create aggregations to calculate the sum, count, and average of object sizes.
aggregations = [
    AggregationsRequest(field="Size", operation="sum"),  # Calculate the sum of object sizes.
    AggregationsRequest(field="Size", operation="count"),  # Calculate the number of objects.
    AggregationsRequest(field="Size", operation="average")  # Calculate the average of object sizes.
]
# Create a MetaQuery request, specifying the query conditions, max results, sort field and order, and aggregations.
do_meta_query_request = MetaQuery(
    max_results=20,  # Return a maximum of 20 objects.
    query=query_json,  # Set the query conditions.
    sort="FileModifiedTime",  # Sort by the last modified time.
    order="desc",  # Sort in descending order.
    aggregations=aggregations  # Set the aggregation operations.
)
# Execute the meta query.
result = bucket.do_bucket_meta_query(do_meta_query_request)
# Print information for matching objects.
if result.files:
    for file in result.files:
        print(f"Filename: {file.file_name}")  # Print the filename.
        print(f"ETag: {file.etag}")  # Print the ETag.
        print(f"ObjectACL: {file.object_acl}")  # Print the access control list (ACL).
        print(f"OssObjectType: {file.oss_object_type}")  # Print the OSS object type.
        print(f"OssStorageClass: {file.oss_storage_class}")  # Print the storage class.
        print(f"TaggingCount: {file.oss_tagging_count}")  # Print the tag count.
        # Print all tags of the object.
        if file.oss_tagging:
            for tag in file.oss_tagging:
                print(f"Key: {tag.key}")  # Print the tag key.
                print(f"Value: {tag.value}")  # Print the tag value.
        # Print the user metadata of the object.
        if file.oss_user_meta:
            for meta in file.oss_user_meta:
                print(f"Key: {meta.key}")  # Print the user metadata key.
                print(f"Value: {meta.value}")  # Print the user metadata value.
# Print the aggregation results.
if result.aggregations:
    for aggre in result.aggregations:
        print(f"Field: {aggre.field}")  # Print the aggregation field.
        print(f"Operation: {aggre.operation}")  # Print the aggregation operation type (such as sum, count, average).
        print(f"Value: {aggre.value}")  # Print the aggregation result value.
package main
import (
	"context"
	"encoding/json"
	"flag"
	"fmt"
	"log"
	"github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss"
	"github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss/credentials"
)
var (
	region     string // Stores the region from the command-line flag.
	bucketName string // Stores the bucket name from the command-line flag.
)
// The init function is executed before main to initialize the program.
func init() {
	// Add a command-line flag for the region.
	flag.StringVar(&region, "region", "", "The region in which the bucket is located.")
	// Add a command-line flag for the bucket name.
	flag.StringVar(&bucketName, "bucket", "", "The name of the bucket.")
}
func main() {
	flag.Parse() // Parse the command-line flags.
	// Check if the bucket name is provided. If not, print flags and exit.
	if len(bucketName) == 0 {
		flag.PrintDefaults()
		log.Fatalf("invalid parameters, bucket name required")
	}
	// Check if the region is provided. If not, print flags and exit.
	if len(region) == 0 {
		flag.PrintDefaults()
		log.Fatalf("invalid parameters, region required")
	}
	// Create a client configuration that uses an environment variable credentials provider and the specified region.
	cfg := oss.LoadDefaultConfig().
		WithCredentialsProvider(credentials.NewEnvironmentVariableCredentialsProvider()).
		WithRegion(region)
	client := oss.NewClient(cfg) // Create an OSS client from the configuration.
	// Build query conditions: filename matches "a/b", storage class is "Standard", and ACL is "private".
	query := map[string]interface{}{
		"Operation": "and",
		"SubQueries": []map[string]interface{}{
			{"Field": "Filename", "Value": "a/b", "Operation": "match"},
			{"Field": "OSSStorageClass", "Value": "Standard", "Operation": "eq"},
			{"Field": "ObjectACL", "Value": "private", "Operation": "eq"},
		},
	}
	// Marshal the query to a JSON string.
	queryJSON, err := json.Marshal(query)
	if err != nil {
		log.Fatalf("failed to marshal query %v", err)
	}
	// Create aggregations to calculate the sum, count, and average of object sizes.
	aggregations := []oss.MetaQueryAggregation{
		{Field: oss.Ptr("Size"), Operation: oss.Ptr("sum")},     // Calculate the sum.
		{Field: oss.Ptr("Size"), Operation: oss.Ptr("count")},   // Calculate the count.
		{Field: oss.Ptr("Size"), Operation: oss.Ptr("average")}, // Calculate the average.
	}
	// Build the DoMetaQuery request.
	request := &oss.DoMetaQueryRequest{
		Bucket: oss.Ptr(bucketName), // Specify the bucket to query.
		MetaQuery: &oss.MetaQuery{
			MaxResults: oss.Ptr(int64(20)),         // Maximum results to return: 20.
			Query:      oss.Ptr(string(queryJSON)), // Set the query conditions.
			Sort:       oss.Ptr("FileModifiedTime"),            // Sort by last modified time.
			Order:      oss.Ptr(oss.MetaQueryOrderDesc),      // Sort in descending order.
			Aggregations: &oss.MetaQueryAggregations{
				Aggregations: aggregations}, // Set the aggregation operations.
		},
	}
	result, err := client.DoMetaQuery(context.TODO(), request) // Send the meta query request.
	if err != nil {
		log.Fatalf("failed to do meta query %v", err)
	}
	// Print the NextToken for pagination.
	fmt.Printf("NextToken:%s\n", *result.NextToken)
	// Iterate through the results and print file details.
	for _, file := range result.Files {
		fmt.Printf("File name: %s\n", *file.Filename)
		fmt.Printf("size: %d\n", file.Size)
		fmt.Printf("File Modified Time:%s\n", *file.FileModifiedTime)
		fmt.Printf("Oss Object Type:%s\n", *file.OSSObjectType)
		fmt.Printf("Oss Storage Class:%s\n", *file.OSSStorageClass)
		fmt.Printf("Object ACL:%s\n", *file.ObjectACL)
		fmt.Printf("ETag:%s\n", *file.ETag)
		fmt.Printf("Oss CRC64:%s\n", *file.OSSCRC64)
		if file.OSSTaggingCount != nil {
			fmt.Printf("Oss Tagging Count:%d\n", *file.OSSTaggingCount)
		}
		// Print the object's tag information.
		for _, tagging := range file.OSSTagging {
			fmt.Printf("Oss Tagging Key:%s\n", *tagging.Key)
			fmt.Printf("Oss Tagging Value:%s\n", *tagging.Value)
		}
		// Print the user-defined metadata.
		for _, userMeta := range file.OSSUserMeta {
			fmt.Printf("Oss User Meta Key:%s\n", *userMeta.Key)
			fmt.Printf("Oss User Meta Key Value:%s\n", *userMeta.Value)
		}
	}
	// Print the aggregation results.
	for _, aggregation := range result.Aggregations {
		fmt.Printf("Aggregation Field:%s\n", *aggregation.Field)
		fmt.Printf("Aggregation Operation:%s\n", *aggregation.Operation)
		fmt.Printf("Aggregation Value:%f\n", *aggregation.Value)
	}
}

Step 3: Verify the results

OSS console

The total size of the 100 matching Standard objects is 19.53 MB, and the average size of each object is approximately 200 KB.

The query results page contains two sections: Object Data Aggregation Results and File List. The aggregation results section shows the sum and average of object sizes, and a group count by storage class. The file list section displays detailed information for each object, such as its name (for example, a/b/report9.txt), size, object type, storage class, ACL (private), and last modified time.

OSS SDK

The total size of the 100 matching Standard objects is 19.53 MB, and the average size of each object is approximately 200 KB.

Field: Size
Operation: sum
Value: 2.048E7
Field: Size
Operation: count
Value: 100.0
Field: Size
Operation: average
Value: 204800.0

References

  • For advanced customization, you can make REST API requests directly. This requires manually calculating the request signature. For more information, see Signature Version 4 and DoMetaQuery.