All Products
Search
Document Center

Intelligent Media Management:Create a dataset

Last Updated:Apr 22, 2025

A dataset is a container for metadata in Intelligent Media Management (IMM). This topic describes how to create a dataset.

Usage notes

  • Searches across datasets are not supported. Therefore, we recommend that you store related files in the same dataset and unrelated files in different datasets.

  • The number of datasets in a project cannot exceed the specified upper limit.

  • The number of files in a dataset cannot exceed the specified maximum number of files that the dataset can hold. The total number of files in all datasets of a project cannot exceed the specified maximum number of files that the project can hold.

  • The number of Object Storage Service (OSS) buckets mapped to a dataset cannot exceed the specified maximum number of OSS buckets that can be mapped to the dataset. The total number of OSS buckets mapped to all datasets of a project cannot exceed the specified maximum number of mapped OSS buckets in the project.

  • When you create a metadata index for a dataset in a project, the workflow template of the dataset takes precedence over the workflow template of the project. If the workflow template of the dataset is empty, the workflow template of the project is used. For more information about workflow templates, see Workflow templates and operators.

Prerequisites

  • An AccessKey pair is created and obtained. For more information, see Create an AccessKey pair.

  • OSS is activated, a bucket is created, and objects are uploaded to the bucket. For more information, see Upload objects.

  • IMM is activated. For more information, see Activate IMM.

  • A project is created in the IMM console. For more information about how to create a project by using the IMM console, see Create a project.

    Note
    • You can also call the CreateProject operation to create a project. For more information, see CreateProject.

    • You can call the ListProjects operation to query existing projects in a specific region. For more information, see ListProjects.

Examples

Create a dataset

The following sample code uses the CreateDataset operation to create a dataset named test-dataset that has the "Dataset 1" description and uses the Official:ImageManagement workflow template in a project named test-project.

  • Sample request

    {
     "ProjectName": "test-project",
     "DatasetName": "test-dataset",
     "Description": "Dataset 1",
     "TemplateId": "Official:ImageManagement"
    }
  • Sample response

    {
        "RequestId": "9AB4BD43-C4E5-06AA-A7AB-****",
        "Dataset": {
            "FileCount": 0,
            "BindCount": 0,
            "ProjectName": "test-project",
            "CreateTime": "2022-07-05T10:43:32.429344821+08:00",
            "DatasetMaxTotalFileSize": 90000000000000000,
            "DatasetMaxRelationCount": 100000000000,
            "DatasetMaxFileCount": 100000000,
            "DatasetName": "test-dataset",
            "DatasetMaxBindCount": 10,
            "UpdateTime": "2022-07-05T10:43:32.429344821+08:00",
            "DatasetMaxEntityCount": 10000000000,
            "TotalFileSize": 0,
            "TemplateId": "Official:ImageManagement"
        }
    }
  • Complete sample code (for IMM SDK for Python V1.27.3)

    # -*- coding: utf-8 -*-
    
    import os
    from alibabacloud_imm20200930.client import Client as imm20200930Client
    from alibabacloud_tea_openapi import models as open_api_models
    from alibabacloud_imm20200930 import models as imm_20200930_models
    from alibabacloud_tea_util import models as util_models
    from alibabacloud_tea_util.client import Client as UtilClient
    
    
    class Sample:
        def __init__(self):
            pass
    
        @staticmethod
        def create_client(
            access_key_id: str,
            access_key_secret: str,
        ) -> imm20200930Client:
            """
            Use your AccessKey ID and AccessKey secret to initialize the client. 
            @param access_key_id:
            @param access_key_secret:
            @return: Client
            @throws Exception
            """
            config = open_api_models.Config(
                access_key_id=access_key_id,
                access_key_secret=access_key_secret
            )
            # Specify the IMM endpoint. 
            config.endpoint = f'imm.cn-shenzhen.aliyuncs.com'
            return imm20200930Client(config)
    
        @staticmethod
        def main() -> None:
            # The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. We recommend that you use a RAM user to call API operations or perform routine O&M. 
            # For security reasons, we recommend that you do not embed your AccessKey pair in your project code. 
            # In this example, the AccessKey pair is obtained from the environment variables to implement identity verification for API access. For information about how to configure environment variables, visit https://www.alibabacloud.com/help/en/imm/developer-reference/configure-environment-variables. 
            imm_access_key_id = os.getenv("AccessKeyId")
            imm_access_key_secret = os.getenv("AccessKeySecret")
            client = Sample.create_client(imm_access_key_id, imm_access_key_secret)
            create_dataset_request = imm_20200930_models.CreateDatasetRequest(
                project_name='test-project',
                dataset_name='test-dataset',
                description='Dataset 1',
                template_id='Official:ImageManagement'
            )
            runtime = util_models.RuntimeOptions()
            try:
                # Print the response of the API operation. 
                response = client.create_dataset_with_options(create_dataset_request, runtime)
                print(response.body.to_map())
            except Exception as error:
                # Print the error message if necessary. 
                UtilClient.assert_as_string(error.message)
                print(error)
    
    
    if __name__ == '__main__':
        Sample.main()

Query dataset information

The following sample code calls the GetDataset operation to query information about the "test-dataset" dataset in the "test-project" project.

  • Sample request

    {
     "ProjectName": "test-project",
     "DatasetName": "test-dataset"
    }
  • Sample response

    {
        "RequestId": "9AB4BD43-C4E5-06AA-E4B2-****",
        "Dataset": {
            "FileCount": 0,
            "BindCount": 0,
            "ProjectName": "test-project",
            "CreateTime": "2022-07-05T10:43:32.429344821+08:00",
            "DatasetMaxTotalFileSize": 90000000000000000,
            "DatasetMaxRelationCount": 100000000000,
            "DatasetMaxFileCount": 100000000,
            "DatasetName": "test-dataset",
            "DatasetMaxBindCount": 10,
            "UpdateTime": "2022-07-05T10:43:32.429344821+08:00",
            "DatasetMaxEntityCount": 10000000000,
            "TotalFileSize": 0,
            "TemplateId": "Official:ImageManagement"
        }
    }
  • Complete sample code (for IMM SDK for Python V1.27.3)

    # -*- coding: utf-8 -*-
    
    import os
    from alibabacloud_imm20200930.client import Client as imm20200930Client
    from alibabacloud_tea_openapi import models as open_api_models
    from alibabacloud_imm20200930 import models as imm_20200930_models
    from alibabacloud_tea_util import models as util_models
    from alibabacloud_tea_util.client import Client as UtilClient
    
    
    class Sample:
        def __init__(self):
            pass
    
        @staticmethod
        def create_client(
            access_key_id: str,
            access_key_secret: str,
        ) -> imm20200930Client:
            """
            Use your AccessKey ID and AccessKey secret to initialize the client. 
            @param access_key_id:
            @param access_key_secret:
            @return: Client
            @throws Exception
            """
            config = open_api_models.Config(
                access_key_id=access_key_id,
                access_key_secret=access_key_secret
            )
            # Specify the IMM endpoint. 
            config.endpoint = f'imm.cn-shenzhen.aliyuncs.com'
            return imm20200930Client(config)
    
        @staticmethod
        def main() -> None:
            # The AccessKey pair of an Alibaba Cloud account has permissions on all API operations. We recommend that you use a RAM user to call API operations or perform routine O&M. 
            # For security reasons, we recommend that you do not embed your AccessKey pair in your project code. 
            # In this example, the AccessKey pair is obtained from the environment variables to implement identity verification for API access. For information about how to configure environment variables, visit https://www.alibabacloud.com/help/en/imm/developer-reference/configure-environment-variables. 
            imm_access_key_id = os.getenv("AccessKeyId")
            imm_access_key_secret = os.getenv("AccessKeySecret")
            client = Sample.create_client(imm_access_key_id, imm_access_key_secret)
            get_dataset_request = imm_20200930_models.GetDatasetRequest(
                # Specify the name of the IMM project. 
                project_name='test-project',
                # Specify the name of the dataset. 
                dataset_name='test-dataset',
                # Specify that the operation does not return statistics such as the number of files and file size. 
                with_statistics=False
            )
            runtime = util_models.RuntimeOptions()
            try:
                # Print the response of the API operation. 
                response = client.get_dataset_with_options(get_dataset_request, runtime)
                print(response.body.to_map())
            except Exception as error:
                # Print the error message if necessary. 
                UtilClient.assert_as_string(error.message)
                print(error)
    
    
    if __name__ == '__main__':
        Sample.main()