A large amount of expired or useless metadata, such as historical logs, temporary files, or metadata corresponding to deleted original files, may accumulate in your dataset, when time lapses. This redundant data occupies storage space and reduces retrieval efficiency. This topic describes how to batch delete file metadata that is no longer needed in a dataset to optimize resource management.
Prerequisites
An AccessKey pair is created. For more information, see Create an AccessKey pair.
Object Storage Service (OSS) is activated, a bucket is created, and objects are uploaded to the bucket. For more information, see Upload objects.
Intelligent Media Management (IMM) is activated. For more information, see Activate IMM.
A project is created in the IMM console. For more information, see Create a project.
NoteYou can also call the CreateProject API operation to create a project.
You can call the ListProjects API operation to list all projects in a specified region.
A metadata index is created for files based on your business scenarios. For more information, see Create a metadata index.
Considerations
Permission management: Make sure that the AccessKey pair you use has the permissions to operate on the target dataset.
Exception handling: The script simply displays exceptions. We recommend that you improve the exception handling logic in a production environment.
Data security: Batch delete operations are irreversible. Before you perform a batch delete operation, confirm the operation scope to prevent accidentally deleting important data.
Procedure
The IMM console does not provide the feature to batch delete metadata in a dataset. To batch delete metadata in a dataset, follow the instructions in this topic.
Step 1: Install IMM SDK for Python
Environment requirements
Make sure that your Python version is v3.7 or later. You can run the following command to check the current Python version:
python --versionInstallation method
Use the
pipcommand to install a specific version (v4.6.2) of IMM SDK for Python:pip install alibabacloud_imm20200930==4.6.2For more information, see Intelligent Media Management.
Step 2: Configure environment variables
After you create an AccessKey pair, you need to configure the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables. For more information, see Create an AccessKey pair and Configure environment variables.
The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. We recommend that you use the AccessKey pair of a RAM user to call API operations or perform routine O&M.
Step 3: Run the batch metadata deletion script
Perform the following operations to configure and execute the code to batch delete metadata in a dataset.
Save the code file.
Save the following example code as a file named
delete_dataset_file_meta.py.# -*- coding:utf-8-* import os from alibabacloud_imm20200930.client import Client as imm20200930Client from alibabacloud_tea_openapi import models as open_api_models from alibabacloud_imm20200930 import models as imm_20200930_models from alibabacloud_tea_util import models as util_models from alibabacloud_tea_util.client import Client as UtilClient class DeleteDatasetFileMeta: """ Delete metadata of files in the dataset. """ def __init__(self, endpoint): self.endpoint = endpoint def create_client(self): """ Use credentials to initialize the Client @return: Client @throws Exception """ config = open_api_models.Config( # Required. Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID environment variable is configured., access_key_id=os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'], # Required. Make sure that the ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variable is configured., access_key_secret=os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'] ) # Specify the IMM endpoint. For more information about endpoints, visit https://api.aliyun.com/product/imm config.endpoint = self.endpoint return imm20200930Client(config) def simple_query(self, project_name, dataset_name, max_result=100, next_token=None): """ Call the SimpleQuery API operation to query files in the dataset. :param project_name: :param dataset_name: :param max_result: :param next_token: :return: """ client = self.create_client() simple_query_request = imm_20200930_models.SimpleQueryRequest( project_name=project_name, dataset_name=dataset_name, next_token=next_token, max_results=max_result, with_fields=["URI"] ) runtime = util_models.RuntimeOptions() try: response = client.simple_query_with_options(simple_query_request, runtime) return response.body.to_map() except Exception as error: # Handle exceptions with caution based on your actual business scenario, and do not ignore exceptions in your project. The error message displayed in this example is for reference only. # Error message print(error.message) # The URL of the corresponding error diagnosis page. print(error.data.get("Recommend")) UtilClient.assert_as_string(error.message) def batch_delete_file_meta(self, project_name, dataset_name, uri_list): """ BatchDeleteFileMeta Batch delete metadata of multiple files in the dataset. :param project_name: :param dataset_name: :param uri_list: :return: """ if not uri_list: return client = self.create_client() batch_delete_file_meta_request = imm_20200930_models.BatchDeleteFileMetaRequest( project_name=project_name, dataset_name=dataset_name, uris=uri_list ) runtime = util_models.RuntimeOptions() try: response = client.batch_delete_file_meta_with_options( batch_delete_file_meta_request, runtime) return response.body.to_map() except Exception as error: # Handle exceptions with caution based on your actual business scenario, and do not ignore exceptions in your project. The error message displayed in this example is for reference only. # Error message print(error.message) UtilClient.assert_as_string(error.message) @staticmethod def main(endpoint, project_name, dataset_name): """Delete metadata in the dataset""" tool = DeleteDatasetFileMeta(endpoint) next_token = None while True: # Call SimpleQuery to loop through files in the dataset. simple_query_response = tool.simple_query( project_name, dataset_name, max_result=100, next_token=next_token) next_token = simple_query_response.get("NextToken") uri_list = [x.get("URI") for x in simple_query_response.get("Files", list())] # Batch delete delete_response = tool.batch_delete_file_meta(project_name, dataset_name, uri_list) print(f"Deleted files: {uri_list}, response: {delete_response}") if not next_token: break print("Deletion completed") if __name__ == "__main__": """ 1. You need to install IMM SDK for Python. For more information, visit: https://next.api.aliyun.com/api-tools/sdk/imm?spm=a2c4g.11186623.0.0.5e9952feu0Zm3a&version=2020-09-30&language=python-tea&tab=primer-doc. 2. The script calls IMM SDK for Python by using the AccessKey ID and AccessKey secret. Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables are configured in your code execution environment. 3. Replace the endpoint, project_name, and dataset_name configurations with your actual values. """ # Specify the IMM endpoint. For more information about endpoints, visit https://api.aliyun.com/product/imm endpoint = 'imm.cn-hangzhou.aliyuncs.com' # IMM project name. project_name = "test-project" # Dataset name. dataset_name = "test-dataset" DeleteDatasetFileMeta.main(endpoint, project_name, dataset_name)Modify the configuration parameters.
Replace the
endpoint,project_name, anddataset_nameconfigurations with your actual values, and save the file.# Specify the IMM endpoint. For more information about endpoints, visit https://api.aliyun.com/product/imm endpoint = 'imm.cn-hangzhou.aliyuncs.com' # IMM project name. project_name = "test-project" # Dataset name. dataset_name = "test-dataset"Run the following command to batch delete metadata in the dataset.
python delete_dataset_file_meta.pyResult demonstration.
During script execution, the URIs of each batch of deleted files and the response results are displayed, for example:
