All Products
Search
Document Center

E-MapReduce:Enable and configure the storage analysis feature

Last Updated:Feb 27, 2025

EMR Doctor enables you to analyze data stored in OSS. By activating the storage analysis feature, you gain deeper insights into the usage and health of your OSS storage resources, leading to more effective data governance.

Background information

OSS offers a bucket inventory feature that, when enabled, allows OSS to periodically generate inventory lists for a bucket. These lists contain details about the objects, including their number and size. EMR Doctor utilizes these inventory lists to assess the usage and health of data within the bucket, along with its relationship to Hive storage resources.

To use the storage analysis feature, you must first activate the bucket inventory feature for a bucket. For more information, see the referenced document.

Precautions

Please note that enabling the bucket inventory feature may result in additional costs. For details, see the referenced document.

Enable the bucket inventory feature

If your cluster uses multiple OSS buckets and you want to analyze the storage resources within them, follow these steps in the OSS console to enable the bucket inventory feature for your buckets.

  1. Log on to the OSS console.

  2. In the left-side navigation pane, click Buckets. On the Buckets page, find and click the desired bucket.

  3. In the left-side navigation pane, select Data Management > Bucket Inventory.

  4. On the Bucket Inventory page, click Create Inventory.

  5. In the Set Inventory Report Rule panel, configure the necessary parameters. For more information, see the referenced document.

    Important
    • Make sure that the Inventory Bucket is the same as the bucket for which you are enabling the inventory feature.

    • If your OSS stores many files (over 10 billion), consider setting the Inventory Report Export Cycle to weekly. For fewer files, a daily cycle may suffice.

    • Ensure that the Optional Information For Inventory Content includes both Object Size and Storage Class.

  6. Select I Acknowledge And Agree To Grant Alibaba Cloud OSS Service Permission To Access Bucket Resources, and then click OK.

Configure the storage analysis feature

The storage analysis feature depends on the inventory lists created by the bucket inventory feature. Configure the necessary parameters on the configuration page of the TAIHAODOCTOR service in the EMR console. For detailed steps and additional configurations, see EMR Doctor configuration instructions.

Configuration item

Description

collect.oss.bucket

The name of the OSS bucket to be analyzed.

collect.oss.manifest.dir

The directory in which the generated inventory lists are stored. The format is: inventory_path/inventory_bucket/inventory_name. For more information, see the Bucket Inventory list in Enable the bucket inventory feature.

  • inventory_path is the inventory report storage path you configured in the previous step.

  • inventory_bucket is the inventory bucket, which is the name of the OSS bucket to be analyzed.

  • inventory_name is the inventory name you configured in the previous step.

For instance, if the configuration parameters for your OSS bucket inventory are: inventory report storage path (inventory_path) as reports, the name of the OSS bucket to be analyzed (inventory_bucket) as my-bucket, and the inventory name (inventory_name) as my-inventory.

Then, the directory where the inventory lists are stored (collect.oss.manifest.dir) would be: reports/my-bucket/my-inventory.

Important

If your cluster uses multiple buckets and you have activated the inventory feature for each, you can list the names of the buckets and their corresponding inventory directories in sequence in the configuration item, separated by commas. Make sure the order of the bucket names matches the order of the inventory directories.

Single bucket configuration example

For a bucket named my-bucket, the storage analysis configuration would be as follows.

collect.oss.bucket:   my-bucket
collect.oss.manifest.dir:   reports/my-bucket/my-inventory

Multiple bucket configuration example

For buckets named my-bucket1 and my-bucket2, the storage analysis configuration would be as follows.

collect.oss.bucket:   my-bucket1,my-bucket2
collect.oss.manifest.dir:   reports1/my-bucket1/my-inventory1,reports2/my-bucket2/my-inventory2