All Products
Search
Document Center

Intelligent Media Management:Batch delete metadata in a dataset

Last Updated:Jun 12, 2025

A large amount of expired or useless metadata, such as historical logs, temporary files, or metadata corresponding to deleted original files, may accumulate in your dataset, when time lapses. This redundant data occupies storage space and reduces retrieval efficiency. This topic describes how to batch delete file metadata that is no longer needed in a dataset to optimize resource management.

Prerequisites

  • An AccessKey pair is created. For more information, see Create an AccessKey pair.

  • Object Storage Service (OSS) is activated, a bucket is created, and objects are uploaded to the bucket. For more information, see Upload objects.

  • Intelligent Media Management (IMM) is activated. For more information, see Activate IMM.

  • A project is created in the IMM console. For more information, see Create a project.

    Note
    • You can also call the CreateProject API operation to create a project.

    • You can call the ListProjects API operation to list all projects in a specified region.

  • A metadata index is created for files based on your business scenarios. For more information, see Create a metadata index.

Considerations

  • Permission management: Make sure that the AccessKey pair you use has the permissions to operate on the target dataset.

  • Exception handling: The script simply displays exceptions. We recommend that you improve the exception handling logic in a production environment.

  • Data security: Batch delete operations are irreversible. Before you perform a batch delete operation, confirm the operation scope to prevent accidentally deleting important data.

Procedure

Important

The IMM console does not provide the feature to batch delete metadata in a dataset. To batch delete metadata in a dataset, follow the instructions in this topic.

Step 1: Install IMM SDK for Python

  1. Environment requirements

    Make sure that your Python version is v3.7 or later. You can run the following command to check the current Python version:

    python --version
  2. Installation method

    Use the pip command to install a specific version (v4.6.2) of IMM SDK for Python:

    pip install alibabacloud_imm20200930==4.6.2
  3. For more information, see Intelligent Media Management.

Step 2: Configure environment variables

After you create an AccessKey pair, you need to configure the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables. For more information, see Create an AccessKey pair and Configure environment variables.

Important

The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. We recommend that you use the AccessKey pair of a RAM user to call API operations or perform routine O&M.

Step 3: Run the batch metadata deletion script

Perform the following operations to configure and execute the code to batch delete metadata in a dataset.

  1. Save the code file.

    Save the following example code as a file named delete_dataset_file_meta.py.

    # -*- coding:utf-8-*
    
    import os
    
    from alibabacloud_imm20200930.client import Client as imm20200930Client
    from alibabacloud_tea_openapi import models as open_api_models
    from alibabacloud_imm20200930 import models as imm_20200930_models
    from alibabacloud_tea_util import models as util_models
    from alibabacloud_tea_util.client import Client as UtilClient
    
    
    class DeleteDatasetFileMeta:
        """
        Delete metadata of files in the dataset.
        """
    
        def __init__(self, endpoint):
            self.endpoint = endpoint
    
        def create_client(self):
            """
            Use credentials to initialize the Client
            @return: Client
            @throws Exception
            """
            config = open_api_models.Config(
                # Required. Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID environment variable is configured.,
                access_key_id=os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'],
                # Required. Make sure that the ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variable is configured.,
                access_key_secret=os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET']
            )
            # Specify the IMM endpoint. For more information about endpoints, visit https://api.aliyun.com/product/imm
            config.endpoint = self.endpoint
            return imm20200930Client(config)
    
        def simple_query(self, project_name, dataset_name, max_result=100, next_token=None):
            """
            Call the SimpleQuery API operation to query files in the dataset.
            :param project_name:
            :param dataset_name:
            :param max_result:
            :param next_token:
            :return:
            """
            client = self.create_client()
            simple_query_request = imm_20200930_models.SimpleQueryRequest(
                project_name=project_name,
                dataset_name=dataset_name,
                next_token=next_token,
                max_results=max_result,
                with_fields=["URI"]
            )
            runtime = util_models.RuntimeOptions()
            try:
                response = client.simple_query_with_options(simple_query_request, runtime)
                return response.body.to_map()
            except Exception as error:
                # Handle exceptions with caution based on your actual business scenario, and do not ignore exceptions in your project. The error message displayed in this example is for reference only.
                # Error message
                print(error.message)
                # The URL of the corresponding error diagnosis page.
                print(error.data.get("Recommend"))
                UtilClient.assert_as_string(error.message)
    
        def batch_delete_file_meta(self, project_name, dataset_name, uri_list):
            """
            BatchDeleteFileMeta Batch delete metadata of multiple files in the dataset.
            :param project_name:
            :param dataset_name:
            :param uri_list:
            :return:
            """
            if not uri_list:
                return
            client = self.create_client()
            batch_delete_file_meta_request = imm_20200930_models.BatchDeleteFileMetaRequest(
                project_name=project_name,
                dataset_name=dataset_name,
                uris=uri_list
            )
            runtime = util_models.RuntimeOptions()
            try:
                response = client.batch_delete_file_meta_with_options(
                    batch_delete_file_meta_request, runtime)
                return response.body.to_map()
            except Exception as error:
                # Handle exceptions with caution based on your actual business scenario, and do not ignore exceptions in your project. The error message displayed in this example is for reference only.
                # Error message
                print(error.message)
                UtilClient.assert_as_string(error.message)
    
        @staticmethod
        def main(endpoint, project_name, dataset_name):
            """Delete metadata in the dataset"""
            tool = DeleteDatasetFileMeta(endpoint)
            next_token = None
            while True:
                # Call SimpleQuery to loop through files in the dataset.
                simple_query_response = tool.simple_query(
                    project_name, dataset_name, max_result=100, next_token=next_token)
                next_token = simple_query_response.get("NextToken")
                uri_list = [x.get("URI") for x in simple_query_response.get("Files", list())]
                # Batch delete
                delete_response = tool.batch_delete_file_meta(project_name, dataset_name, uri_list)
                print(f"Deleted files: {uri_list}, response: {delete_response}")
                if not next_token:
                    break
    
            print("Deletion completed")
    
    
    if __name__ == "__main__":
        """
        1. You need to install IMM SDK for Python. For more information, visit: https://next.api.aliyun.com/api-tools/sdk/imm?spm=a2c4g.11186623.0.0.5e9952feu0Zm3a&version=2020-09-30&language=python-tea&tab=primer-doc.
        2. The script calls IMM SDK for Python by using the AccessKey ID and AccessKey secret. Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables are configured in your code execution environment.
        3. Replace the endpoint, project_name, and dataset_name configurations with your actual values.
        """
        # Specify the IMM endpoint. For more information about endpoints, visit https://api.aliyun.com/product/imm
        endpoint = 'imm.cn-hangzhou.aliyuncs.com'
        # IMM project name.
        project_name = "test-project"
        # Dataset name.
        dataset_name = "test-dataset"
        DeleteDatasetFileMeta.main(endpoint, project_name, dataset_name)
  2. Modify the configuration parameters.

    Replace the endpoint, project_name, and dataset_name configurations with your actual values, and save the file.

    # Specify the IMM endpoint. For more information about endpoints, visit https://api.aliyun.com/product/imm
    endpoint = 'imm.cn-hangzhou.aliyuncs.com'
    # IMM project name.
    project_name = "test-project"
    # Dataset name.
    dataset_name = "test-dataset"
  3. Run the following command to batch delete metadata in the dataset.

    python delete_dataset_file_meta.py
  4. Result demonstration.

    During script execution, the URIs of each batch of deleted files and the response results are displayed, for example:

    lQLPJxc3ZU1SqxPNASbNCPywT5Au1SNyGl4H521AEPpwAA_2300_294