OSS provides the bucket inventory feature to regularly export information about bucket objects and store that information as an object within a specified bucket. This feature can help you understand the status of objects in your buckets and make workflows and big data tasks more simple and fast.

After you enable the bucket inventory feature, OSS scans objects in your buckets on a daily or weekly basis, generates an inventory list in the CSV format, and stores the list as an object in a specified bucket. You can specify object metadata to be exported to the inventory list, such as object size and encryption status.
Note Only the metadata of objects are scanned and exported to inventory lists. The content of objects are not scanned.

Implementation modes

Implementation mode Description
Console User-friendly and intuitive web application
ossutil High-performance command-line tool

Function

After you create an inventory for a bucket, OSS generates inventory lists at the frequency specified in the inventory and stores the lists and related data in the following folder structure:
- dst_bucket/
 - destination-prefix/
   - src_bucket/
     - inventory_id/
       - YYYY-MM-DDTHH-MMZ/
         - manifest.json
         - manifest.checksum
       - data/
         - 745a29e3-bfaa-490d-9109-47086afcc8f2.csv.gz
       - hive/
         - dt=YYYY-MM-DDTHH-MMZ/
           - symlink.txt
  • destination-prefix/: The folder generated based on the inventory list prefix specified in the inventory. If the prefix is not specified, this folder level is omitted, that is, the following folder structure is used to store inventory lists and related data: dst_bucket/src_bucket/….
  • src_bucket/: The generated folder corresponding to the bucket for which the inventory is configured.
  • inventory_id/: The generated folder corresponding to the name of the inventory name.
  • YYY-MM-DDTHH-MMZ/: The generated folder corresponding to the GMT timestamp when the bucket was scanned. Example: 2020-05-17T16-00Z. The manifest.json and manifest.checksum files are stored in this folder.
  • data/: The folder that stores inventory lists.
  • hive/: The folder that stores the symlink.txt file. By using this file, you can use Apache Hive to process the data in the inventory lists.
After an inventory is created for a bucket, the following files are generated based on the inventory:
  • Inventory lists
    Inventory lists contain the exported object information and are stored in the data folder. You can query the fileSchema field in the manifest.json to obtain the field columns included in the inventory lists. A complete inventory list includes the following field columns from left to right: Bucket, Key, VersionId, IsLatest, IsDeleteMarker, Size, LastModifiedDate, ETag, StorageClass, IsMultipartUploaded, EncryptionStatus.Inventory list
    Field Description
    Bucket The name of the bucket for which the inventory is created.
    Key The name of the object in the bucket. The object name is URL-encoded. You must decode the object name before you can view it.
    VersionId The version ID of the object. If versioning is enabled for the bucket, you can select to export either the current version or all versions of objects when you configure inventories for the bucket. If you select to export only the current version of objects, this field column is not included in inventory lists. For more information about versioning, see Overview.
    IsLatest If an object has multiple versions and the current version is the latest version, the value of this field is True. Otherwise, the value of this field is False. If you select to export only the current version of objects, this field column is not included in inventory lists.
    IsDeleteMarker If an object has multiple versions and the current version is a delete marker, the value of this field is True. Otherwise, the value of this field is False. If you select to export only the current version of objects, this field column is not included in inventory lists.
    Size The size of the object.
    LastModifiedDate The time when the object is last modified.
    ETag The ETag of the object. It is generated when the object is created to identify the content of the object.
    • For an object created by using PutObject, the ETag of the object is the MD5 hash of the object content.
    • For an object created in other methods, the ETag of the object is the UUID of the object content.
    StorageClass The storage class of the object.
    IsMultipartUploaded A field that indicates whether the object is created by using multipart upload. If the object is created by using multipart upload, the value of this field is True. Otherwise, the value of this field is False.
    EncryptionStatus The encryption status of the object. If the object is encrypted, the value of this field is True. Otherwise, the value of this field is False.
  • Manifest files
    Manifest files include the following two files:
    • manifest.json: stores the metadata of inventory lists and related information, including the following fields:
      • creationTimestamp: A timestamp in the UNIX time format. It indicates the time when OSS starts to scan objects in the bucket to generate an inventory list.
      • destinationBucket: The bucket that stores the inventory lists.
      • fileFormat: The format of the inventory lists.
      • fileSchema: The field columns included in the inventory lists.
      • files: The name, size, and MD5 hash of the inventory lists.
      • sourceBucket: The bucket for which the inventory is configured.
      • version: The version of the inventory lists.
    • manifest.checksum: stores the MD5 hash of manifest.json.
  • symlink.txt

    This file is stored in the hive folder to indicate the location of inventory lists. This manifest file is compatible with Apache Hive and allows Apache Hive to find inventory lists and related files automatically.

Notes

  • You can configure a maximum of 1,000 inventories for a bucket. A maximum of 10 inventories can be configured and displayed in the OSS console.
  • The inventory list must be stored in a bucket within the same region as the bucket for which the inventory is configured.
  • If the number of objects in your bucket is more than one billion, we recommend that you generate inventory lists on a weekly basis. R.
  • If no objects are stored in the bucket for which the inventory is configured for or no objects apply to the prefix specified in the inventory, inventory lists are not generated.
  • To use API operations, SDKs, or ossutil to configure inventories for a bucket, you must create a RAM role that has read permissions on all objects in the bucket and write permissions on the destination bucket that stores inventory lists. For more information about how to configure RAM roles, see Create a RAM role for a trusted Alibaba Cloud service.
  • When you configure inventories in the OSS console, a role named AliyunOSSRole is generated. If you use a RAM user to configure inventories, the RAM user must have the permission to call the role.
  • After an inventory is configured for a bucket, OSS will generate inventory lists based on that inventory until it is deleted. We recommend that you delete inventory lists that are no longer needed in a timely manner to save storage space.

FAQ

How can I quickly know whether an inventory list was generated?

If the number of objects in your bucket is large, it will take some time for OSS to generate inventory lists. If you want to be notified immediately after an inventory list is generated, we recommend that you configure an event notification rule. You can configure a notification rule for PutObject events for the destination bucket that stores generated inventory lists. This way, when an inventory list is generated, you will be notified right away. For more information about how to configure event notification, see Configure event notification.