Object Storage Service (OSS) provides the data indexing feature to allow you to index the metadata of objects. You can specify the metadata of objects as index conditions to query objects. Data indexing helps you better understand and manage data structures and facilitates queries, statistics, and management of objects.
Scenarios
To meet data audit or data supervision requirements, you may need to query specific objects from an OSS bucket in which hundreds of millions of objects are stored. An object contains a large volume of metadata, including the name, ETag value, storage class, size, tags, and last modified time of the object. The data indexing feature allows you to combine simple query conditions and data aggregation methods based on your business requirements to improve query performance.
Usage notes
Supported regions
The data indexing feature is supported only in the China (Hangzhou) and Australia (Sydney) regions.
Billing
During the public preview, you are not charged for metadata management. For more information about billable items of the data indexing feature, see Data indexing fees.
Time required for indexing
When you enable metadata management, OSS creates an index. The time required for creating the index is directly proportional to the number of objects stored in the bucket. That means, the larger the number of objects in the bucket, the longer the time required to create the index.
Multipart upload
If a bucket contains objects that are uploaded by using multipart upload, the query results include only the complete objects combined by calling the CompleteMultipartUpload operation. Parts that are uploaded by multipart upload tasks that have been initiated but are not completed or not canceled are not included in the query results.
Use the OSS console
Log on to the OSS console.
In the left-side navigation pane, click Buckets. On the Buckets page, find and click the desired bucket.
In the left-side navigation tree, choose Object Management > Data Indexing.
On the Data Indexing page, turn on Metadata Management.
The time required for metadata management to take effect varies based on the number of objects in the bucket.
Specify basic conditions to filter objects.
In the Basic Filtering Conditions section, specify the basic filtering conditions based on your business requirements. The following table describes the basic filtering conditions.
Filtering condition
Description
Storage Class
By default, the following OSS storage classes are selected: Standard, IA, Archive, and Cold Archive. You can select storage classes as filtering conditions based on your business requirements.
ACL (Access Control List)
By default, the following access control lists (ACLs) supported by OSS are selected: Inherited from Bucket, Private, Public Read, and Public Read/Write. You can select ACLs as filtering conditions based on your business requirements.
File Name
Valid values: Fuzzy Match and Equal To. If you want to display the name of an object in the query results, such as exampleobject.txt, you can use one of the following methods to match the object name:
Select Equal To and enter the full name of the object: exampleobject.txt.
Select Fuzzy Match and enter the prefix or suffix of the object name. For example, you can enter example or .txt.
ImportantFuzzy match can match all object names that contain the specified characters. For example, if you enter test, localfolder/test/.example.jpg and localfolder/test.jpg meet the query condition, and are displayed in the query results.
Upload Type
By default, the following upload types supported by OSS are selected. You can select upload types as filtering conditions based on your business requirements.
Normal: returns objects uploaded by using simple upload in the query results.
Multipart: returns objects uploaded by using multipart upload in the query results.
Appendable: returns objects uploaded by using append upload in the query results.
Symlink: returns symbolic links in the query results.
Last Modified At
You can specify Start Date and End Date for Last Modified At. The start time and end time are accurate to seconds.
Object Size
The following operators can be used to specify object size in KB: Equal To, Greater Than, Greater Than or Equal To, Less Than, and Less Than or Equal To.
Object Versions
You can query only the current versions of objects.
Optional:Specify other conditions to filter objects.
If you want to sort objects in the query results or use tags to filter objects, click Show more filtering conditions.
Specify the sorting order
In the Object Sort Order section, select Last Modified At,File Name, or Object Size as the sorting condition and select Ascending or Descending as the sorting order.
Specify tag-based filtering conditions
In the Tag-based Filtering Conditions section, specify the ETags or tags to display the intended objects in the query results.
ETags support only exact match. You can enter multiple ETags. Separate ETags with line feeds.
In the Object Tags field, specify the tags in the form of key-value pairs. The keys and values of object tags are case-sensitive. For more information about object tags, see Object tagging.
Specify the methods that you want to use to aggregate object data
If you want to classify the query results and collect statistics on each category, you can specify data aggregation methods. For example, you can specify data aggregation methods to collect statistics on the sizes of all objects and obtain the number of distinct storage classes of objects in the query results.
Use OSS SDKs
Only OSS SDK for Java, OSS SDK for Python, and OSS SDK for Go allow you to use the data indexing feature to query objects that meet the specified conditions. Before you use the data indexing feature, you must enable the metadata management feature for a bucket. For more information about the sample code of data indexing, see Overview.
import com.aliyun.oss.ClientException;
import com.aliyun.oss.OSS;
import com.aliyun.oss.common.auth.*;
import com.aliyun.oss.OSSClientBuilder;
import com.aliyun.oss.OSSException;
import com.aliyun.oss.model.*;
import java.util.ArrayList;
import java.util.List;
public class Demo {
// In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint.
private static String endpoint = "https://oss-cn-hangzhou.aliyuncs.com";
// We recommend that you do not save access credentials in the project code. Otherwise, access credentials may be leaked. As a result, the security of all resources in your account is compromised. In this example, access credentials are obtained from environment variables. You need to configure environment variables before you run the sample code.
EnvironmentVariableCredentialsProvider credentialsProvider = CredentialsProviderFactory.newEnvironmentVariableCredentialsProvider();
// Specify the name of the bucket. Example: examplebucket.
private static String bucketName = "examplebucket";
public static void main(String[] args) {
// Create an OSSClient instance.
OSS ossClient = new OSSClientBuilder().build(endpoint, credentialsProvider);
try {
// Query objects that meet the specified conditions and list information about the objects based on the specified fields and sorting methods.
int maxResults = 20;
// Query objects that are smaller than 1,048,576 bytes in size, return up to 20 objects at the same time, and sort the objects in ascending order.
String query = "{\"Field\": \"Size\",\"Value\": \"1048576\",\"Operation\": \"lt\"}";
String sort = "Size";
String order = "asc";
DoMetaQueryRequest doMetaQueryRequest = new DoMetaQueryRequest(bucketName, maxResults, query, sort);
AggregationRequest aggregationRequest = new AggregationRequest();
List<AggregationRequest> aggregations = new ArrayList<AggregationRequest>();
// Specify the name of the field that is used in the aggregate operation.
aggregationRequest.setField("Size");
// Specify the operator that is used in the aggregate operation. max indicates the maximum value.
aggregationRequest.setOperation("max");
aggregations.add(aggregationRequest);
// Specify the aggregate operation.
doMetaQueryRequest.setAggregations(aggregations);
doMetaQueryRequest.setOrder(SortOrder.ASC);
DoMetaQueryResult doMetaQueryResult = ossClient.doMetaQuery(doMetaQueryRequest);
if(doMetaQueryResult.getFiles() != null){
for(ObjectFile file : doMetaQueryResult.getFiles()){
System.out.println("Filename: " + file.getFilename());
// Query the ETag value that is used to identify the content of the object.
System.out.println("ETag: " + file.getETag());
// Query the access control list (ACL) of the object.
System.out.println("ObjectACL: " + file.getObjectACL());
// Query the type of the object.
System.out.println("OssObjectType: " + file.getOssObjectType());
// Query the storage class of the object.
System.out.println("OssStorageClass: " + file.getOssStorageClass());
// Query the number of tags attached to the object.
System.out.println("TaggingCount: " + file.getOssTaggingCount());
if(file.getOssTaggings() != null){
for(Tag tag : file.getOssTaggings()){
System.out.println("Key: " + tag.getKey());
System.out.println("Value: " + tag.getValue());
}
}
if(file.getOssUserMetas() != null){
for(UserMeta meta : file.getOssUserMetas()){
System.out.println("Key: " + meta.getKey());
System.out.println("Value: " + meta.getValue());
}
}
}
} else if(doMetaQueryResult.getAggregations() != null){
for(Aggregation aggre : doMetaQueryResult.getAggregations()){
// Query the name of the aggregation field.
System.out.println("Field: " + aggre.getField());
// Query the aggregation operator.
System.out.println("Operation: " + aggre.getOperation());
// Query the result of the aggregate operation.
System.out.println("Value: " + aggre.getValue());
if(aggre.getGroups() != null && aggre.getGroups().size() > 0){
// Query the result of grouping and aggregation.
System.out.println("Groups value: " + aggre.getGroups().get(0).getValue());
// Query the total count of grouping and aggregation.
System.out.println("Groups count: " + aggre.getGroups().get(0).getCount());
}
}
} else {
System.out.println("NextToken: " + doMetaQueryResult.getNextToken());
}
} catch (OSSException oe) {
System.out.println("Error Message:" + oe.getErrorMessage());
System.out.println("Error Code:" + oe.getErrorCode());
System.out.println("Request ID:" + oe.getRequestId());
System.out.println("Host ID:" + oe.getHostId());
} catch (ClientException ce) {
System.out.println("Error Message: " + ce.getMessage());
} finally {
// Shut down the OSSClient instance.
ossClient.shutdown();
}
}
}
# -*- coding: utf-8 -*-
import oss2
from oss2.credentials import EnvironmentVariableCredentialsProvider
from oss2.models import MetaQuery, AggregationsRequest
# Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
auth = oss2.ProviderAuth(EnvironmentVariableCredentialsProvider())
# In this example, the endpoint of the China (Hangzhou) region is used. Specify your actual endpoint.
# Specify the name of the bucket. Example: examplebucket.
bucket = oss2.Bucket(auth, 'http://oss-cn-hangzhou.aliyuncs.com', 'examplebucket')
# Query objects that meet specified conditions and list the object information based on the specified fields and sorting methods.
# Query objects that are smaller than 1 MB, return up to 10 objects at a time, and sort the objects in ascending order.
do_meta_query_request = MetaQuery(max_results=10, query='{"Field": "Size","Value": "1048576","Operation": "lt"}', sort='Size', order='asc')
result = bucket.do_bucket_meta_query(do_meta_query_request)
# Display the object names.
print(result.files[0].file_name)
# Display the ETags of the objects.
print(result.files[0].etag)
# Display the types of the objects.
print(result.files[0].oss_object_type)
# Display the storage classes of the objects.
print(result.files[0].oss_storage_class)
# Print the 64-bit CRC value of the object.
print(result.files[0].oss_crc64)
# Display the access control lists (ACLs) of the objects.
print(result.files[0].object_acl)
package main
import (
"fmt"
"github.com/aliyun/aliyun-oss-go-sdk/oss"
"os"
)
func main() {
/// Obtain access credentials from environment variables. Before you run the sample code, make sure that the OSS_ACCESS_KEY_ID and OSS_ACCESS_KEY_SECRET environment variables are configured.
provider, err := oss.NewEnvironmentVariableCredentialsProvider()
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Create an OSSClient instance.
// Specify the endpoint of the region in which the bucket is located. For example, if the bucket is located in the China (Hangzhou) region, set the endpoint to https://oss-cn-hangzhou.aliyuncs.com. Specify your actual endpoint.
client, err := oss.New("yourEndpoint", "", "", oss.SetCredentialsProvider(&provider))
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
// Query objects that meet the specified conditions and list object information based on the specified fields and sorting methods.
// Query objects that are larger than 30 bytes in size, return up to 10 objects at the same time, and then sort the objects in ascending order.
query := oss.MetaQuery{
NextToken: "",
MaxResults: 10,
Query: `{"Field": "Size","Value": "30","Operation": "gt"}`,
Sort: "Size",
Order: "asc",
}
// Specify the name of the bucket. Example: examplebucket.
result,err := client.DoMetaQuery("examplebucket",query)
if err != nil {
fmt.Println("Error:", err)
os.Exit(-1)
}
fmt.Printf("NextToken:%s\n", result.NextToken)
for _, file := range result.Files {
fmt.Printf("File name: %s\n", file.Filename)
fmt.Printf("size: %d\n", file.Size)
fmt.Printf("File Modified Time:%s\n", file.FileModifiedTime)
fmt.Printf("Oss Object Type:%s\n", file.OssObjectType)
fmt.Printf("Oss Storage Class:%s\n", file.OssStorageClass)
fmt.Printf("Object ACL:%s\n", file.ObjectACL)
fmt.Printf("ETag:%s\n", file.ETag)
fmt.Printf("Oss CRC64:%s\n", file.OssCRC64)
fmt.Printf("Oss Tagging Count:%d\n", file.OssTaggingCount)
for _, tagging := range file.OssTagging {
fmt.Printf("Oss Tagging Key:%s\n", tagging.Key)
fmt.Printf("Oss Tagging Value:%s\n", tagging.Value)
}
for _, userMeta := range file.OssUserMeta {
fmt.Printf("Oss User Meta Key:%s\n", userMeta.Key)
fmt.Printf("Oss User Meta Key Value:%s\n", userMeta.Value)
}
}
}
Use RESTful APIs
If your business requires a high level of customization, you can directly call RESTful APIs. To directly call an API, you must include the signature calculation in your code. For more information, see DoMetaQuery.
FAQ
When hundreds of millions of objects are stored in a bucket, why are data indexes not created in a long period of time?
Indexes can be created for 600 objects in approximately 1 second. You can estimate the time required to create indexes based on the number of objects in a bucket.