After you create a dataset in Intelligent Media Management (IMM), you can create a metadata index for files that are stored on services such as Object Storage Service (OSS) and Drive and Photo Service. Metadata indexing allows you to efficiently manage and retrieve media files. This topic describes how to create and manage a metadata index to accelerate file searching, filtering, and management.
Prerequisites
A dataset is created. For more information, see Create a dataset.
Overview
Metadata indexing allows you to structure and index key information about media files. This way, you can efficiently manage and retrieve the media files. Metadata includes but is not limited to file titles, authors, keywords, creation dates, sizes, formats, and resolutions. Metadata indexing allows you to efficiently retrieve, filter, and manage media files by using keywords, attributes, and other media information.
Usage
You can have a metadata index automatically created for all objects in an OSS bucket, or manually create a metadata index for specified data in OSS or Drive and Photo Service.
Automatically create a metadata index for all objects in an OSS bucket
To automatically create a metadata index for all objects in an OSS bucket, call the CreateBinding operation to bind an OSS bucket to a dataset or add a data source to the dataset in the IMM console. After the binding is established, IMM perform a full scan for all existing data in the bucket, extracts metadata, and creates a metadata index. After the initial full scan, IMM monitors the bucket continuously for incremental data, extracts metadata, and indexes incremental data.
When the binding is established, IMM scans for existing data and for incremental data in the bucket. The number of objects in the bucket is directly proportional to the metadata collection fee. For more information, see Billable items. If you want to try out metadata indexing on an OSS bucket, we recommend that you use a bucket that contains a small number of objects and cautiously select a workflow template to avoid unexpected fees.
Call the API operation
The following example creates a metadata index in the "test-dataset" dataset of the "test-project" project for all objects in the "test-bucket" bucket:
Call the CreateBinding operation to bind the dataset to the bucket.
Sample request
{ "ProjectName": "test-project", "URI": "oss://test-bucket", "DatasetName": "test-dataset" }
Sample response
{ "Binding": { "Phase": "", "ProjectName": "test-project", "DatasetName": "test-dataset", "State": "Ready", "CreateTime": "2022-07-06T07:03:28.054762739+08:00", "UpdateTime": "2022-07-06T07:03:28.054762739+08:00", "URI": "oss://test-bucket" }, "RequestId": "090D2AC5-8450-0AA8-A1B1-****" }
Complete sample code (IMM SDK for Python)
# -*- coding: utf-8 -*- import os from alibabacloud_imm20200930.client import Client as imm20200930Client from alibabacloud_tea_openapi import models as open_api_models from alibabacloud_imm20200930 import models as imm_20200930_models from alibabacloud_tea_util import models as util_models from alibabacloud_tea_util.client import Client as UtilClient class Sample: def __init__(self): pass @staticmethod def create_client( access_key_id: str, access_key_secret: str, ) -> imm20200930Client: """ Use your AccessKey ID and AccessKey secret to initialize the client. @param access_key_id: @param access_key_secret: @return: Client @throws Exception """ config = open_api_models.Config( access_key_id=access_key_id, access_key_secret=access_key_secret ) # Specify the IMM endpoint. config.endpoint = f'imm.cn-beijing.aliyuncs.com' return imm20200930Client(config) @staticmethod def main() -> None: # The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. We recommend that you use a RAM user to call API operations or perform routine O&M. # For security reasons, we recommend that you do not embed your AccessKey pair in your project code. # In this example, the AccessKey pair is obtained from the environment variables to implement identity verification for API access. For information about how to configure environment variables, visit https://www.alibabacloud.com/help/en/imm/developer-reference/configure-environment-variables. imm_access_key_id = os.getenv("AccessKeyId") imm_access_key_secret = os.getenv("AccessKeySecret") client = Sample.create_client(imm_access_key_id, imm_access_key_secret) create_binding_request = imm_20200930_models.CreateBindingRequest( # Specify the name of the IMM project. project_name='test-project', # Specify the name of the dataset. dataset_name='test-dataset', # Specify the URI of the bucket. uri='oss://test-bucket' ) runtime = util_models.RuntimeOptions() try: # Print the response of the API operation. response = client.create_binding_with_options(create_binding_request, runtime) print(response.body.to_map()) except Exception as error: # Print the error message if necessary. UtilClient.assert_as_string(error.message) print(error) if __name__ == '__main__': Sample.main()
(Optional) Call the GetBinding operation to query the binding.
Sample request
{ "ProjectName": "test-project", "URI": "oss://test-bucket", "DatasetName": "test-dataset" }
Sample response
{ "Binding": { "Phase": "IncrementalScanning", "ProjectName": "test-project", "DatasetName": "test-dataset", "State": "Running", "CreateTime": "2022-07-06T07:04:05.105182822+08:00", "UpdateTime": "2022-07-06T07:04:13.302084076+08:00", "URI": "oss://test-bucket" }, "RequestId": "B5A9F54B-6C54-03C9-B011-****" }
NoteIf the value of the Phase field is IncrementalScanning, IMM has created a metadata index for all existing objects in the bucket and is scanning for incremental objects.
If the value of the State field is Running, the binding is in progress.
Complete sample code (IMM SDK for Python 1.27.3)
# -*- coding: utf-8 -*- import os from alibabacloud_imm20200930.client import Client as imm20200930Client from alibabacloud_tea_openapi import models as open_api_models from alibabacloud_imm20200930 import models as imm_20200930_models from alibabacloud_tea_util import models as util_models from alibabacloud_tea_util.client import Client as UtilClient class Sample: def __init__(self): pass @staticmethod def create_client( access_key_id: str, access_key_secret: str, ) -> imm20200930Client: """ Use your AccessKey ID and AccessKey secret to initialize the client. @param access_key_id: @param access_key_id: @param access_key_secret: @return: Client @throws Exception """ config = open_api_models.Config( access_key_id=access_key_id, access_key_secret=access_key_secret ) # Specify the IMM endpoint. config.endpoint = f'imm.cn-beijing.aliyuncs.com' return imm20200930Client(config) @staticmethod def main() -> None: # The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. We recommend that you use a RAM user to call API operations or perform routine O&M. # For security reasons, we recommend that you do not embed your AccessKey pair in your project code. # In this example, the AccessKey pair is obtained from the environment variables to implement identity verification for API access. For information about how to configure environment variables, visit https://www.alibabacloud.com/help/en/imm/developer-reference/configure-environment-variables. imm_access_key_id = os.getenv("AccessKeyId") imm_access_key_secret = os.getenv("AccessKeySecret") client = Sample.create_client(imm_access_key_id, imm_access_key_secret) get_binding_request = imm_20200930_models.GetBindingRequest( # Specify the name of the IMM project. project_name='test-project', # Specify the name of the dataset. dataset_name='test-dataset', # Specify the URI of the bucket. uri='oss://test-bucket' ) runtime = util_models.RuntimeOptions() try: # Print the response of the API operation. response = client.get_binding_with_options(get_binding_request, runtime) print(response.body.to_map()) except Exception as error: # Print the error message if necessary. UtilClient.assert_as_string(error.message) print(error) if __name__ == '__main__': Sample.main()
Use the IMM console
Enter the project where the dataset was created, and go to Data management and indexing > Dataset List. In the data list, find the dataset that you created (test-dataset in this example).
Click the name of the dataset. On the dataset details page, click Data access. On the Data source tab, click New Data Source.
In the New Data Source dialog box, select a bucket that you want to bind to the dataset, and click OK.
NoteAfter the data source is added, IMM extracts metadata from all existing objects in the bucket, continuously monitors for incremental objects in the bucket, and extracts metadata from incremental objects detected. Metadata extraction incurs a fee. For more information, see Billing overview. We recommend that you first test metadata indexing on a bucket that contains a small number of objects.
Manually create a metadata index for the specified data
Use the IMM API
To manually create a metadata index for the specified data in an OSS bucket or Drive and Photo Service, call the BatchIndexFileMeta or IndexFileMeta operation.
Call the BatchIndexFileMeta operation
The following sample code creates a metadata index in the "test-dataset" dataset of the "test-project" project for the "oss://test-bucket/test-object1.jpg" and "oss://test-bucket/test-object2.jpg" OSS objects:
Sample request
{ "ProjectName": "test-project", "DatasetName": "test-dataset", "Files": [ { "URI": "oss://test-bucket/test-object1.jpg", "CustomLabels": { "category": "Persons" } }, { "URI": "oss://test-bucket/test-object2.jpg", "CustomLabels": { "category": "Pets" } } ], "Notification": { "MNS": { "TopicName": "test-topic" } } }
Sample response
{ "RequestId": "0D4CB096-EB44-02D6-A4E9-****", "EventId": "16C-1KoeYbdckkiOObpyzc****" }
Sample Simple Message Queue message (For more information about Simple Message Queue SDKs, see Step 4: Receive and delete the message)
{ "ProjectName": "test-project", "DatasetName": "test-dataset", "RequestId": "658FFD57-B495-07C0-B24B-B64CC52993CB", "StartTime": "2022-07-06T07:18:18.664770352+08:00", "EndTime": "2022-07-06T07:18:20.762465221+08:00", "Success": true, "Message": "", "Files": [ { "URI": "oss://test-bucket/test-object1.jpg", "CustomLabels": { "category": "Persons" }, "Error": "" }, { "URI": "oss://test-bucket/test-object2.jpg", "CustomLabels": { "category": "Pets" }, "Error": "" } ] }
NoteIf the value of the Success field is true, the metadata index is created.
The Files element contains the URI and error information of each object. If the value of the Error field is empty, the object is indexed.
Complete sample code (IMM SDK for Python)
# -*- coding: utf-8 -*- # This file is auto-generated, don't edit it. Thanks. import sys import os from typing import List from alibabacloud_imm20200930.client import Client as imm20200930Client from alibabacloud_tea_openapi import models as open_api_models from alibabacloud_imm20200930 import models as imm_20200930_models from alibabacloud_tea_util import models as util_models from alibabacloud_tea_util.client import Client as UtilClient class Sample: def __init__(self): pass @staticmethod def create_client( access_key_id: str, access_key_secret: str, ) -> imm20200930Client: """ Use your AccessKey ID and AccessKey secret to initialize the client. @param access_key_id: @param access_key_secret: @return: Client @throws Exception """ config = open_api_models.Config( access_key_id=access_key_id, access_key_secret=access_key_secret ) # Specify the IMM endpoint. config.endpoint = f'imm.cn-beijing.aliyuncs.com' return imm20200930Client(config) @staticmethod def main( args: List[str], ) -> None: # The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. We recommend that you use a RAM user to call API operations or perform routine O&M. # For security reasons, we recommend that you do not embed your AccessKey pair in your project code. # In this example, the AccessKey pair is obtained from the environment variables to implement identity verification for API access. For information about how to configure environment variables, visit https://www.alibabacloud.com/help/en/imm/developer-reference/configure-environment-variables. imm_access_key_id = os.getenv("AccessKeyId") imm_access_key_secret = os.getenv("AccessKeySecret") client = Sample.create_client(imm_access_key_id, imm_access_key_secret) notification_mns = imm_20200930_models.MNS( topic_name='test-topic' ) notification = imm_20200930_models.Notification( mns=notification_mns ) input_file_0custom_labels = { 'category': 'Persons' } input_file_0 = imm_20200930_models.InputFile( uri='oss://test-bucket/test-object1.jpg', custom_labels=input_file_0custom_labels ) input_file_1custom_labels = { 'category': 'Pets' } input_file_1 = imm_20200930_models.InputFile( uri='oss://test-bucket/test-object2.jpg', custom_labels=input_file_1custom_labels ) batch_index_file_meta_request = imm_20200930_models.BatchIndexFileMetaRequest( project_name='test-project', dataset_name='test-dataset', files=[ input_file_0, input_file_1 ], notification=notification ) runtime = util_models.RuntimeOptions() try: # Write your code to print the response of the API operation if necessary. client.batch_index_file_meta_with_options(batch_index_file_meta_request, runtime) except Exception as error: # Print the error message if necessary. UtilClient.assert_as_string(error.message) @staticmethod async def main_async( args: List[str], ) -> None: # The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. We recommend that you use a RAM user to call API operations or perform routine O&M. # For security reasons, we recommend that you do not embed your AccessKey pair in your project code. # In this example, the AccessKey pair is obtained from the environment variables to implement identity verification for API access. For information about how to configure environment variables, visit https://www.alibabacloud.com/help/en/imm/developer-reference/configure-environment-variables. imm_access_key_id = os.getenv("AccessKeyId") imm_access_key_secret = os.getenv("AccessKeySecret") client = Sample.create_client(imm_access_key_id, imm_access_key_secret) notification_mns = imm_20200930_models.MNS( topic_name='test-topic' ) notification = imm_20200930_models.Notification( mns=notification_mns ) input_file_0custom_labels = { 'category': 'Persons' } input_file_0 = imm_20200930_models.InputFile( uri='oss://test-bucket/test-object1.jpg', custom_labels=input_file_0custom_labels ) input_file_1custom_labels = { 'category': 'Pets' } input_file_1 = imm_20200930_models.InputFile( uri='oss://test-bucket/test-object2.jpg', custom_labels=input_file_1custom_labels ) batch_index_file_meta_request = imm_20200930_models.BatchIndexFileMetaRequest( project_name='test-project', dataset_name='test-dataset', files=[ input_file_0, input_file_1 ], notification=notification ) runtime = util_models.RuntimeOptions() try: # Write your code to print the response of the API operation if necessary. await client.batch_index_file_meta_with_options_async(batch_index_file_meta_request, runtime) except Exception as error: # Print the error message if necessary. UtilClient.assert_as_string(error.message) if __name__ == '__main__': Sample.main(sys.argv[1:])
Call the IndexFileMeta operation
The following sample code creates a metadata index in the "test-dataset" dataset of the "test-project" project for the "oss://test-bucket/test-object1.jpg" OSS object:
Sample request
{ "ProjectName": "test-project", "DatasetName": "test-dataset", "File": { "URI": "oss://test-bucket/test-object1.jpg", "CustomLabels": { "category": "Persons" } }, "Notification": { "MNS": { "TopicName": "test-topic" } } }
Sample response
{ "RequestId": "5AA694AD-3D10-0B6A-85B2-****", "EventId": "17C-1Kofq1mlJxRYF7vAGF****" }
Sample Simple Message Queue message (For more information about Simple Message Queue SDKs, see Step 4: Receive and delete the message)
{ "ProjectName": "test-project", "DatasetName": "test-dataset", "RequestId": "658FFD57-B495-07C0-B24B-B64CC52993CB", "StartTime": "2022-07-06T07:18:18.664770352+08:00", "EndTime": "2022-07-06T07:18:20.762465221+08:00", "Success": true, "Message": "", "Files": [ { "URI": "oss://test-bucket/test-object1.jpg", "CustomLabels": { "category": "Persons" }, "Error": "" } ] }
NoteIf the value of the Success field is true, the metadata index is created.
The Files element contains the URI and error information of each object. If the value of the Error field is empty, the object is indexed.
Complete sample code (IMM SDK for Python)
# -*- coding: utf-8 -*- # This file is auto-generated, don't edit it. Thanks. import sys import os from typing import List from alibabacloud_imm20200930.client import Client as imm20200930Client from alibabacloud_tea_openapi import models as open_api_models from alibabacloud_imm20200930 import models as imm_20200930_models from alibabacloud_tea_util import models as util_models from alibabacloud_tea_util.client import Client as UtilClient class Sample: def __init__(self): pass @staticmethod def create_client( access_key_id: str, access_key_secret: str, ) -> imm20200930Client: """ Use your AccessKey ID and AccessKey secret to initialize the client. @param access_key_id: @param access_key_secret: @return: Client @throws Exception """ config = open_api_models.Config( access_key_id=access_key_id, access_key_secret=access_key_secret ) # Specify the IMM endpoint. config.endpoint = f'imm.cn-beijing.aliyuncs.com' return imm20200930Client(config) @staticmethod def main( args: List[str], ) -> None: # The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. We recommend that you use a RAM user to call API operations or perform routine O&M. # For security reasons, we recommend that you do not embed your AccessKey pair in your project code. # In this example, the AccessKey pair is obtained from the environment variables to implement identity verification for API access. For information about how to configure environment variables, visit https://www.alibabacloud.com/help/en/imm/developer-reference/configure-environment-variables. imm_access_key_id = os.getenv("AccessKeyId") imm_access_key_secret = os.getenv("AccessKeySecret") client = Sample.create_client(imm_access_key_id, imm_access_key_secret) notification_mns = imm_20200930_models.MNS( topic_name='test-topic' ) notification = imm_20200930_models.Notification( mns=notification_mns ) input_file_custom_labels = { 'category': 'Persons' } input_file = imm_20200930_models.InputFile( uri='oss://test-bucket/test-object1.jpg', custom_labels=input_file_custom_labels ) index_file_meta_request = imm_20200930_models.IndexFileMetaRequest( project_name='test-project', dataset_name='test-dataset', file=input_file, notification=notification ) runtime = util_models.RuntimeOptions() try: # Write your code to print the response of the API operation if necessary. client.index_file_meta_with_options(index_file_meta_request, runtime) except Exception as error: # Print the error message if necessary. UtilClient.assert_as_string(error.message) @staticmethod async def main_async( args: List[str], ) -> None: # The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. We recommend that you use a RAM user to call API operations or perform routine O&M. # For security reasons, we recommend that you do not embed your AccessKey pair in your project code. # In this example, the AccessKey pair is obtained from the environment variables to implement identity verification for API access. For information about how to configure environment variables, visit https://www.alibabacloud.com/help/en/imm/developer-reference/configure-environment-variables. imm_access_key_id = os.getenv("AccessKeyId") imm_access_key_secret = os.getenv("AccessKeySecret") client = Sample.create_client(imm_access_key_id, imm_access_key_secret) notification_mns = imm_20200930_models.MNS( topic_name='test-topic' ) notification = imm_20200930_models.Notification( mns=notification_mns ) input_file_custom_labels = { 'category': 'Persons' } input_file = imm_20200930_models.InputFile( uri='oss://test-bucket/test-object1.jpg', custom_labels=input_file_custom_labels ) index_file_meta_request = imm_20200930_models.IndexFileMetaRequest( project_name='test-project', dataset_name='test-dataset', file=input_file, notification=notification ) runtime = util_models.RuntimeOptions() try: # Write your code to print the response of the API operation if necessary. await client.index_file_meta_with_options_async(index_file_meta_request, runtime) except Exception as error: # Print the error message if necessary. UtilClient.assert_as_string(error.message) if __name__ == '__main__': Sample.main(sys.argv[1:])
Use the IMM console
You can index metadata of the specified objects in the OSS bucket into the dataset by using the IMM console. IMM asynchronously executes the workflow to extract and index the metadata of incremental objects. The workflow initiates asynchronous tasks. You can configure notification settings to receive task information. For information about how to specify a Simple Message Queue topic for receiving task execution results, see Asynchronous message examples.
Enter the project where the dataset was created, and go to Data management and indexing > Dataset List. In the data list, find the dataset that you created (test-dataset in this example).
Click the name of the dataset. On the dataset details page, click Data access. Click Batch Add.
On the Batch Add tab, click Add File to Dataset. In the Add File to Dataset panel, specify a Simple Message Queue topic, click Select File to select the objects whose metadata you want to index, and click OK.