EMR Doctor supports data analysis for Object Storage Service (OSS). The OSS storage analysis feature helps you understand the usage and health of your OSS resources. This helps you better manage the data stored in OSS.
Background information
OSS provides an inventory feature. After you configure this feature, OSS periodically generates manifest files for a bucket. These files contain storage information about the objects in the bucket, such as their number and size. EMR Doctor uses the latest manifest file in your bucket to analyze data usage, health status, and the association with Hive storage.
To use the OSS storage analysis feature in EMR Doctor, you must first enable the inventory feature for your bucket. For more information about the inventory feature, see Bucket inventory.
Precautions
You are charged for using the OSS inventory feature. For more information, see Bucket inventory.
Enable the OSS inventory feature
If your cluster uses multiple OSS buckets and you want to obtain storage analysis for all of them, follow these steps to enable the inventory feature for each bucket in the OSS console.
Log on to the OSS console.
In the left-side navigation pane, click Buckets. On the Buckets page, find and click the desired bucket.
In the navigation pane on the left, choose .
On the Bucket Inventory page, click Create Inventory.
In the Set Inventory Report Rule panel, configure the parameters. For more information, see Bucket inventory.
ImportantMake sure that the Inventory Storage Bucket is the same bucket for which you are enabling the OSS inventory feature.
If you store more than 10 billion files in OSS, set Export Cycle Of Inventory Report to Weekly. Otherwise, set the export cycle to Daily.
Make sure that includes Object Size and Storage Class.
Select I Understand And Agree To Authorize Alibaba Cloud OSS To Access Bucket Resources, and then click OK.
Configure OSS storage analysis
OSS storage analysis depends on the manifest files generated by the inventory feature. You need to configure the following parameters on the configuration page of the TAIHAODOCTOR service in the EMR console. For more information about the operations and other configurations, see EMR Doctor configuration guide.
Configuration item | Description |
| The name of the OSS bucket to be analyzed. |
| The directory of the manifest file. The format is
|
For example, you set the inventory_path parameter to reports, the inventory_bucket parameter to my-bucket, and the inventory_name parameter to my-inventory.
Then, the directory of the manifest file (collect.oss.manifest.dir) is: reports/my-bucket/my-inventory.
If your cluster uses multiple buckets and you have enabled the inventory feature for each bucket, you can add the bucket names and their corresponding manifest file directories to the configuration items. Separate the items with commas (,). Make sure that the order of the bucket names matches the order of the manifest file directories.
Example for a single bucket
This example uses the my-bucket bucket. The final storage analysis configuration is as follows.
collect.oss.bucket: my-bucket
collect.oss.manifest.dir: reports/my-bucket/my-inventoryExample for multiple buckets
This example uses the my-bucket1 and my-bucket2 buckets. The final storage analysis configuration is as follows.
collect.oss.bucket: my-bucket1,my-bucket2
collect.oss.manifest.dir: reports1/my-bucket1/my-inventory1,reports2/my-bucket2/my-inventory2