All Products
Search
Document Center

E-MapReduce:Enable and configure OSS storage analysis

Last Updated:Aug 26, 2025

EMR Doctor supports data analysis for Object Storage Service (OSS). The OSS storage analysis feature helps you understand the usage and health of your OSS resources. This helps you better manage the data stored in OSS.

Background information

OSS provides an inventory feature. After you configure this feature, OSS periodically generates manifest files for a bucket. These files contain storage information about the objects in the bucket, such as their number and size. EMR Doctor uses the latest manifest file in your bucket to analyze data usage, health status, and the association with Hive storage.

To use the OSS storage analysis feature in EMR Doctor, you must first enable the inventory feature for your bucket. For more information about the inventory feature, see Bucket inventory.

Precautions

You are charged for using the OSS inventory feature. For more information, see Bucket inventory.

Enable the OSS inventory feature

If your cluster uses multiple OSS buckets and you want to obtain storage analysis for all of them, follow these steps to enable the inventory feature for each bucket in the OSS console.

  1. Log on to the OSS console.

  2. In the left-side navigation pane, click Buckets. On the Buckets page, find and click the desired bucket.

  3. In the navigation pane on the left, choose Data Management > Bucket Inventory.

  4. On the Bucket Inventory page, click Create Inventory.

  5. In the Set Inventory Report Rule panel, configure the parameters. For more information, see Bucket inventory.

    Important
    • Make sure that the Inventory Storage Bucket is the same bucket for which you are enabling the OSS inventory feature.

    • If you store more than 10 billion files in OSS, set Export Cycle Of Inventory Report to Weekly. Otherwise, set the export cycle to Daily.

    • Make sure that Inventory Content > Optional Fields includes Object Size and Storage Class.

  6. Select I Understand And Agree To Authorize Alibaba Cloud OSS To Access Bucket Resources, and then click OK.

Configure OSS storage analysis

OSS storage analysis depends on the manifest files generated by the inventory feature. You need to configure the following parameters on the configuration page of the TAIHAODOCTOR service in the EMR console. For more information about the operations and other configurations, see EMR Doctor configuration guide.

Configuration item

Description

collect.oss.bucket

The name of the OSS bucket to be analyzed.

collect.oss.manifest.dir

The directory of the manifest file. The format is inventory_path/inventory_bucket/inventory_name. For more information, see the Bucket Inventory list in the Enable the OSS inventory feature section.

  • inventory_path: The storage path for the inventory report that you configured in the previous step.

  • inventory_bucket: The inventory storage bucket. This is the name of the OSS bucket to be analyzed.

  • inventory_name: The inventory name that you configured in the previous step.

For example, you set the inventory_path parameter to reports, the inventory_bucket parameter to my-bucket, and the inventory_name parameter to my-inventory.

Then, the directory of the manifest file (collect.oss.manifest.dir) is: reports/my-bucket/my-inventory.

Important

If your cluster uses multiple buckets and you have enabled the inventory feature for each bucket, you can add the bucket names and their corresponding manifest file directories to the configuration items. Separate the items with commas (,). Make sure that the order of the bucket names matches the order of the manifest file directories.

Example for a single bucket

This example uses the my-bucket bucket. The final storage analysis configuration is as follows.

collect.oss.bucket:   my-bucket
collect.oss.manifest.dir:   reports/my-bucket/my-inventory

Example for multiple buckets

This example uses the my-bucket1 and my-bucket2 buckets. The final storage analysis configuration is as follows.

collect.oss.bucket:   my-bucket1,my-bucket2
collect.oss.manifest.dir:   reports1/my-bucket1/my-inventory1,reports2/my-bucket2/my-inventory2