The metadata and data discovery module for Object Storage Service (OSS) gives you intelligent control over large volumes of objects—so you can find, categorize, and analyze them without building your own retrieval system. The module covers four capabilities: file metadata management, multi-dimensional data indexing, storage inventory export, and file query.
Use cases
Static website performance optimization
When hosting static websites on OSS, slow loading or broken content-type detection often traces back to missing or misconfigured metadata. The Manage file metadata feature lets you set precise Cache-Control policies, assign the correct Content-Type to each file, and configure Content-Disposition to control how browsers handle downloads. Correct metadata configuration reduces unnecessary traffic and CDN back-to-origin costs.
Intelligent management of multimedia content
OSS frequently stores thousands of image, video, and audio files where filename-based search falls short. The AISearch feature lets you search using natural language descriptions—for example, "spring cherry blossoms", "sunset at the beach", or "meeting recording"—and returns semantically relevant results without relying on exact filenames or tags.
Corporate data compliance audits
Finance, healthcare, and government organizations need to audit stored data regularly to meet regulatory requirements. The scalar retrieval feature lets you filter objects by metadata conditions—such as creation time, storage class, access permissions, or custom tags—and export the results to generate audit reports automatically. This replaces error-prone manual traversal.
Storage cost analysis and optimization
As object counts grow, understanding what consumes storage and where savings are possible becomes difficult. The bucket inventory feature generates scheduled reports listing object names, sizes, storage classes, and encryption statuses across your entire bucket. Use these reports to identify objects that haven't been accessed recently, then apply lifecycle rules to transition them to lower-cost storage classes or remove them entirely.
Large-scale data analysis
If you store structured data files such as CSV or JSON logs in OSS, the Query files feature lets you run standard SQL queries directly against individual objects in the cloud—without downloading them for local processing. It supports WHERE filters and aggregate functions, and returns only the data that matches your criteria. This suits log analysis, data validation, and report generation.
Key concepts
Object metadata types
Every object in OSS carries two kinds of metadata:
Standard HTTP properties — such as Content-Type, Cache-Control, and Content-Disposition; these control file access behavior.
User-defined metadata — keys must begin with the
x-oss-meta-prefix; used to identify the business properties and purposes of files.
Data indexing mechanism
OSS data indexing automatically builds an index table from object metadata, enabling queries across massive volumes of objects in seconds. There are two retrieval modes:
Scalar retrieval — filters objects by metadata attributes such as creation time, storage class, access permissions, or custom tags. Best for structured queries, audits, and cost analysis. Supports larger object volumes and is more cost-effective.
AISearch — matches objects by content semantics using vector search. Best for multimedia similarity retrieval, semantic content discovery, and natural language queries.
Both modes can be active simultaneously. Choose based on the query type.
Bucket inventory
The bucket inventory feature generates scheduled reports for all objects in a bucket, including names, sizes, storage classes, and encryption statuses. Compared to traversing files one by one using the ListObjects operation, the inventory feature is more efficient and cost-effective in scenarios with a massive number of files.
FAQ
How do I choose between scalar retrieval and AISearch?
Base your choice on what you're searching for:
Use scalar retrieval when you know the metadata attributes to filter on—creation date ranges, specific storage classes, or custom tag values. It handles larger volumes and costs less.
Use AISearch when you want to find objects by meaning or visual similarity, without needing exact metadata values.
For workflows that mix structured auditing and content discovery, enable both and use each where it fits.
What advantages does data indexing have over using ListObjects directly?
The ListObjects API requires sequential traversal, which becomes slow and resource-intensive at scale. OSS data indexing maintains a prebuilt index table, so queries return results in seconds regardless of object count. It also unlocks capabilities that ListObjects cannot provide—semantic search via AISearch and metadata-based filtering via scalar retrieval—without custom development or an external search service.
How do I ensure stability and security when using these features in production?
All features are validated in large-scale production environments and support high-concurrency access and large-scale data processing. To operate them safely in production:
Set access permissions to restrict who can query object metadata and read inventory reports.
Monitor index creation progress and query performance to catch anomalies early.
Back up important metadata configurations and inventory data regularly.
Use Resource Access Management (RAM) for access control and Virtual Private Cloud (VPC) network isolation to secure access to OSS.
For feature-specific guidance, see the user guide for each capability.