All Products
Search
Document Center

Object Storage Service:Use OSS data indexing for large-scale data statistics

Last Updated:Mar 20, 2026

OSS data indexing lets you query and aggregate statistics across hundreds of millions of objects in a single API call — no repeated ListObjects pagination required.

In this tutorial, you will:

  1. Enable data indexing on a bucket using the console or an SDK.

  2. Run a query with filters (prefix, storage class, and Access Control List (ACL)) and aggregation operations (total size, average size, and object count).

  3. Verify that the results match the expected output.

How it works

Enabling data indexing on a bucket causes OSS to build an index table from the bucket's object metadata, user metadata, and object tags. Once indexed, you call the DoMetaQuery operation with a JSON query expression and aggregation definitions. OSS evaluates the query against the index and returns matching objects along with aggregated statistics — without scanning the bucket object by object.

image

Performance comparison (Enterprise A: 200 million objects, 1.8 million directories in bucket mybucket in the China (Guangzhou) region):

MetricListObjects (traditional)Data indexing
Time per day2 hours20 minutes
API calls per directory (>1,000 objects)Multiple ListObjects callsOne DoMetaQuery call

Data indexing reduced statistics collection time by 83%.

Prerequisites

Before you begin, ensure that you have:

  • An OSS bucket with objects already uploaded

  • The OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables configured with credentials that have permission to call OpenMetaQuery and DoMetaQuery

  • (SDK only) OSS SDK for Java, Python, or Go installed — these are the only SDKs that support the data indexing (MetaSearch) feature

Step 1: Enable data indexing

Use the OSS console

  1. Log on to the OSS console.

  2. In the left-side navigation pane, click Buckets. On the Buckets page, click the name of the bucket for which you want to enable data indexing.

  3. In the left-side navigation tree, choose Object Management > Data Indexing.

  4. On the Data Indexing page, click Enable Now.

  5. In the Data Indexing dialog box, select MetaSearch and click Enable.

Use OSS SDKs

All three examples call OpenMetaQuery (Java/Go) or open_bucket_meta_query (Python) to enable the feature. Credentials are read from environment variables.

Java

import com.aliyun.oss.*;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;

public class Demo {

    // Replace with your actual endpoint.
    private static String endpoint = "https://oss-cn-guangzhou.aliyuncs.com";
    // Replace with your bucket name.
    private static String bucketName = "examplebucket";

    public static void main(String[] args) throws com.aliyuncs.exceptions.ClientException {
        // Credentials are read from OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET.
        EnvironmentVariableCredentialsProvider credentialsProvider =
            CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
        String region = "cn-guangzhou";

        ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
        clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
        OSS ossClient = OSSClientBuilder.create()
                .endpoint(endpoint)
                .credentialsProvider(credentialsProvider)
                .clientConfiguration(clientBuilderConfiguration)
                .region(region)
                .build();

        try {
            ossClient.openMetaQuery(bucketName);
        } catch (OSSException oe) {
            System.out.println("Error Message:" + oe.getErrorMessage());
            System.out.println("Error Code:"    + oe.getErrorCode());
            System.out.println("Request ID:"    + oe.getRequestId());
            System.out.println("Host ID:"       + oe.getHostId());
        } catch (ClientException ce) {
            System.out.println("Error Message: " + ce.getMessage());
        } finally {
            ossClient.shutdown();
        }
    }
}

Python

# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider

# Credentials are read from OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())

endpoint = "https://oss-cn-guangzhou.aliyuncs.com"
region = "cn-guangzhou"  # Required for V4 signature.

bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)

bucket.open_bucket_meta_query()

Go

package main

import (
    "context"
    "flag"
    "log"

    "github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss"
    "github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss/credentials"
)

var (
    region     string
    bucketName string
)

func init() {
    flag.StringVar(&region, "region", "", "The region in which the bucket is located.")
    flag.StringVar(&bucketName, "bucket", "", "The name of the bucket.")
}

func main() {
    flag.Parse()

    if len(bucketName) == 0 {
        flag.PrintDefaults()
        log.Fatalf("invalid parameters, bucket name required")
    }
    if len(region) == 0 {
        flag.PrintDefaults()
        log.Fatalf("invalid parameters, region required")
    }

    // Credentials are read from OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET.
    cfg := oss.LoadDefaultConfig().
        WithCredentialsProvider(credentials.NewEnvironmentVariableCredentialsProvider()).
        WithRegion(region)

    client := oss.NewClient(cfg)

    request := &oss.OpenMetaQueryRequest{
        Bucket: oss.Ptr(bucketName),
    }
    result, err := client.OpenMetaQuery(context.TODO(), request)
    if err != nil {
        log.Fatalf("failed to open meta query %v", err)
    }

    log.Printf("open meta query result:%#v\n", result)
}

Step 2: Search for objects and collect statistics

The query in this tutorial finds objects that match all three conditions:

FieldConditionValue
FilenameFuzzy matcha/b
OSSStorageClassEqualsStandard
ObjectACLEqualsprivate

Results are sorted by last modified time in descending order. Three aggregations are computed: total size (sum), average size (average), and object count (group count by storage class).

Use the OSS console

  1. In the left-side navigation tree, choose Object Management > Data Indexing.

  2. Set Storage Class to Standard and ACL to Private.

  3. For the Object Name parameter, select Fuzzy match from the drop-down list and enter a/b.

  4. Set Object Sort Order to Descending and select Last Modified Time from the Sorted By drop-down list.

  5. Under Data Aggregation, add the following items:

    • Output: Object Size, By: Sum — calculates the total size.

    • Output: Object Size, By: Average — calculates the average size.

    • Output: Storage Class, By: Group Count — counts objects per storage class.

  6. Click Query Now.

Use OSS SDKs

All three examples call DoMetaQuery with the same JSON query expression, sort field (FileModifiedTime), sort order (desc), and aggregation definitions.

Java

import com.aliyun.oss.*;
import com.aliyun.oss.common.auth.CredentialsProviderFactory;
import com.aliyun.oss.common.auth.EnvironmentVariableCredentialsProvider;
import com.aliyun.oss.common.comm.SignVersion;
import com.aliyun.oss.model.*;

import java.util.ArrayList;
import java.util.List;

public class Demo {

    private static String endpoint   = "https://oss-cn-guangzhou.aliyuncs.com";
    private static String bucketName = "examplebucket";

    public static void main(String[] args) throws Exception {

        EnvironmentVariableCredentialsProvider credentialsProvider =
            CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
        String region = "cn-guangzhou";

        ClientBuilderConfiguration clientBuilderConfiguration = new ClientBuilderConfiguration();
        clientBuilderConfiguration.setSignatureVersion(SignVersion.V4);
        OSS ossClient = OSSClientBuilder.create()
                .endpoint(endpoint)
                .credentialsProvider(credentialsProvider)
                .clientConfiguration(clientBuilderConfiguration)
                .region(region)
                .build();

        try {
            int maxResults = 20;

            // Query: objects with prefix a/b, storage class Standard, ACL private.
            String query = "{"
                + "\"Operation\":\"and\","
                + "\"SubQueries\":["
                + "  {\"Field\":\"Filename\",\"Value\":\"a/b\",\"Operation\":\"match\"},"
                + "  {\"Field\":\"OSSStorageClass\",\"Value\":\"Standard\",\"Operation\":\"eq\"},"
                + "  {\"Field\":\"ObjectACL\",\"Value\":\"private\",\"Operation\":\"eq\"}"
                + "]}";

            String sort = "FileModifiedTime";

            // Aggregations: total size, object count, and average size.
            Aggregation aggSum = new Aggregation();
            aggSum.setField("Size");
            aggSum.setOperation("sum");

            Aggregation aggCount = new Aggregation();
            aggCount.setField("Size");
            aggCount.setOperation("count");

            Aggregation aggAvg = new Aggregation();
            aggAvg.setField("Size");
            aggAvg.setOperation("average");

            Aggregations aggregations = new Aggregations();
            List<Aggregation> aggregationList = new ArrayList<>();
            aggregationList.add(aggSum);
            aggregationList.add(aggCount);
            aggregationList.add(aggAvg);
            aggregations.setAggregation(aggregationList);

            DoMetaQueryRequest doMetaQueryRequest =
                new DoMetaQueryRequest(bucketName, maxResults, query, sort);
            doMetaQueryRequest.setAggregations(aggregations);
            doMetaQueryRequest.setOrder(SortOrder.DESC);

            DoMetaQueryResult doMetaQueryResult = ossClient.doMetaQuery(doMetaQueryRequest);

            // Print matching objects, if any.
            if (doMetaQueryResult.getFiles() != null) {
                for (ObjectFile file : doMetaQueryResult.getFiles().getFile()) {
                    System.out.println("Filename: "      + file.getFilename());
                    System.out.println("ETag: "          + file.getETag());
                    System.out.println("ObjectACL: "     + file.getObjectACL());
                    System.out.println("OssObjectType: " + file.getOssObjectType());
                    System.out.println("OssStorageClass: " + file.getOssStorageClass());
                    System.out.println("TaggingCount: "  + file.getOssTaggingCount());
                    if (file.getOssTagging() != null) {
                        for (Tagging tag : file.getOssTagging().getTagging()) {
                            System.out.println("Key: "   + tag.getKey());
                            System.out.println("Value: " + tag.getValue());
                        }
                    }
                    if (file.getOssUserMeta() != null) {
                        for (UserMeta meta : file.getOssUserMeta().getUserMeta()) {
                            System.out.println("Key: "   + meta.getKey());
                            System.out.println("Value: " + meta.getValue());
                        }
                    }
                }
            }

            // Print aggregation results.
            if (doMetaQueryResult.getAggregations() != null) {
                for (Aggregation aggre : doMetaQueryResult.getAggregations().getAggregation()) {
                    System.out.println("Field: "     + aggre.getField());
                    System.out.println("Operation: " + aggre.getOperation());
                    System.out.println("Value: "     + aggre.getValue());
                    if (aggre.getGroups() != null && aggre.getGroups().getGroup().size() > 0) {
                        System.out.println("Groups value: " + aggre.getGroups().getGroup().get(0).getValue());
                        System.out.println("Groups count: " + aggre.getGroups().getGroup().get(0).getCount());
                    }
                }
            } else {
                System.out.println("NextToken: " + doMetaQueryResult.getNextToken());
            }

        } catch (OSSException oe) {
            System.out.println("Error Message:" + oe.getErrorMessage());
            System.out.println("Error Code:"    + oe.getErrorCode());
            System.out.println("Request ID:"    + oe.getRequestId());
            System.out.println("Host ID:"       + oe.getHostId());
        } catch (ClientException ce) {
            System.out.println("Error Message: " + ce.getMessage());
        } finally {
            ossClient.shutdown();
        }
    }
}

Python

# -*- coding: utf-8 -*-
import oss2
import json
from oss2.credentials import EnvironmentVariableCredentialsProvider
from oss2.models import MetaQuery, AggregationsRequest

# Credentials are read from OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET.
auth = oss2.ProviderAuthV4(EnvironmentVariableCredentialsProvider())

endpoint = "https://oss-cn-guangzhou.aliyuncs.com"
region = "cn-guangzhou"  # Required for V4 signature.

bucket = oss2.Bucket(auth, endpoint, "examplebucket", region=region)

# Query: objects with prefix a/b, storage class Standard, ACL private.
query = {
    "Operation": "and",
    "SubQueries": [
        {"Field": "Filename",       "Value": "a/b",      "Operation": "match"},
        {"Field": "OSSStorageClass","Value": "Standard", "Operation": "eq"},
        {"Field": "ObjectACL",      "Value": "private",  "Operation": "eq"},
    ]
}
query_json = json.dumps(query)

# Aggregations: total size, object count, and average size.
aggregations = [
    AggregationsRequest(field="Size", operation="sum"),
    AggregationsRequest(field="Size", operation="count"),
    AggregationsRequest(field="Size", operation="average"),
]

do_meta_query_request = MetaQuery(
    max_results=20,
    query=query_json,
    sort="FileModifiedTime",
    order="desc",
    aggregations=aggregations,
)

result = bucket.do_bucket_meta_query(do_meta_query_request)

# Print matching objects, if any.
if result.files:
    for file in result.files:
        print(f"Filename: {file.file_name}")
        print(f"ETag: {file.etag}")
        print(f"ObjectACL: {file.object_acl}")
        print(f"OssObjectType: {file.oss_object_type}")
        print(f"TaggingCount: {file.oss_tagging_count}")
        if file.oss_tagging:
            for tag in file.oss_tagging:
                print(f"Key: {tag.key}")
                print(f"Value: {tag.value}")
        if file.oss_user_meta:
            for meta in file.oss_user_meta:
                print(f"Key: {meta.key}")
                print(f"Value: {meta.value}")

# Print aggregation results.
if result.aggregations:
    for aggre in result.aggregations:
        print(f"Field: {aggre.field}")
        print(f"Operation: {aggre.operation}")
        print(f"Value: {aggre.value}")

Go

package main

import (
    "context"
    "encoding/json"
    "flag"
    "fmt"
    "log"

    "github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss"
    "github.com/aliyun/alibabacloud-oss-go-sdk-v2/oss/credentials"
)

var (
    region     string
    bucketName string
)

func init() {
    flag.StringVar(&region, "region", "", "The region in which the bucket is located.")
    flag.StringVar(&bucketName, "bucket", "", "The name of the bucket.")
}

func main() {
    flag.Parse()

    if len(bucketName) == 0 {
        flag.PrintDefaults()
        log.Fatalf("invalid parameters, bucket name required")
    }
    if len(region) == 0 {
        flag.PrintDefaults()
        log.Fatalf("invalid parameters, region required")
    }

    // Credentials are read from OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET.
    cfg := oss.LoadDefaultConfig().
        WithCredentialsProvider(credentials.NewEnvironmentVariableCredentialsProvider()).
        WithRegion(region)

    client := oss.NewClient(cfg)

    // Query: objects with prefix a/b, storage class Standard, ACL private.
    query := map[string]interface{}{
        "Operation": "and",
        "SubQueries": []map[string]interface{}{
            {"Field": "Filename",        "Value": "a/b",      "Operation": "match"},
            {"Field": "OSSStorageClass", "Value": "Standard", "Operation": "eq"},
            {"Field": "ObjectACL",       "Value": "private",  "Operation": "eq"},
        },
    }
    queryJSON, err := json.Marshal(query)
    if err != nil {
        log.Fatalf("failed to marshal query %v", err)
    }

    // Aggregations: total size, object count, and average size.
    aggregations := []oss.MetaQueryAggregation{
        {Field: oss.Ptr("Size"), Operation: oss.Ptr("sum")},
        {Field: oss.Ptr("Size"), Operation: oss.Ptr("count")},
        {Field: oss.Ptr("Size"), Operation: oss.Ptr("average")},
    }

    request := &oss.DoMetaQueryRequest{
        Bucket: oss.Ptr(bucketName),
        MetaQuery: &oss.MetaQuery{
            MaxResults: oss.Ptr(int64(20)),
            Query:      oss.Ptr(string(queryJSON)),
            Sort:       oss.Ptr("FileModifiedTime"),
            Order:      oss.MetaQueryOrderDesc,
            Aggregations: &oss.MetaQueryAggregations{
                Aggregations: aggregations,
            },
        },
    }

    result, err := client.DoMetaQuery(context.TODO(), request)
    if err != nil {
        log.Fatalf("failed to do meta query %v", err)
    }

    fmt.Printf("NextToken: %s\n", *result.NextToken)

    // Print matching objects.
    for _, file := range result.Files {
        fmt.Printf("File name: %s\n",         *file.Filename)
        fmt.Printf("Size: %d\n",               file.Size)
        fmt.Printf("File modified time: %s\n", *file.FileModifiedTime)
        fmt.Printf("Oss object type: %s\n",    *file.OSSObjectType)
        fmt.Printf("Oss storage class: %s\n",  *file.OSSStorageClass)
        fmt.Printf("Object ACL: %s\n",         *file.ObjectACL)
        fmt.Printf("ETag: %s\n",               *file.ETag)
        fmt.Printf("Oss CRC64: %s\n",          *file.OSSCRC64)
        if file.OSSTaggingCount != nil {
            fmt.Printf("Oss tagging count: %d\n", *file.OSSTaggingCount)
        }
        for _, tagging := range file.OSSTagging {
            fmt.Printf("Oss tagging key: %s\n",   *tagging.Key)
            fmt.Printf("Oss tagging value: %s\n", *tagging.Value)
        }
        for _, userMeta := range file.OSSUserMeta {
            fmt.Printf("Oss user meta key: %s\n",   *userMeta.Key)
            fmt.Printf("Oss user meta value: %s\n", *userMeta.Value)
        }
    }

    // Print aggregation results.
    for _, aggregation := range result.Aggregations {
        fmt.Printf("Aggregation field: %s\n",     *aggregation.Field)
        fmt.Printf("Aggregation operation: %s\n", *aggregation.Operation)
        fmt.Printf("Aggregation value: %f\n",     *aggregation.Value)
    }
}

Verify the results

The query returns 100 Standard objects matching the conditions. The aggregation output shows:

AggregationValueOutput field
Total size (sum)19.53 MBField: Size / Operation: sum / Value: <bytes>
Average size per object~200 KBField: Size / Operation: average / Value: <bytes>
Object count (group count)100Groups count: 100

Console: The Query Results page displays these statistics directly.

SDK: In the aggregation output block, match each result line to the table above. The sum entry gives the total size in bytes, average gives the mean size in bytes, and the group-count entry gives the number of matching objects.

Query results showing total size, average size, and object count

Queryable fields reference

Use the following fields to build custom queries with DoMetaQuery. Combine multiple fields using "Operation": "and" or "Operation": "or" at the top level.

FieldTypeSupported operatorsExample value
FilenameStringmatch (fuzzy), eq (exact)a/b
OSSStorageClassStringeqStandard, IA, Archive
ObjectACLStringeqprivate, public-read
SizeIntegereq, gt, gte, lt, lte1048576
FileModifiedTimeTimestampeq, gt, gte, lt, lte2024-01-01T00:00:00Z

For the full list of supported fields, operators, and aggregation types, see DoMetaQuery.

What's next

  • To build custom queries using additional queryable fields (such as Size, FileModifiedTime, or object tags), call the DoMetaQuery API directly. For the full list of supported fields and operators, see DoMetaQuery.

  • If you need to sign requests manually, see (Recommended) Include a V4 signature.