Image clustering groups visually similar images together by analyzing their visual content and organizing them into clusters. This helps you find, manage, and deduplicate large image collections. A common use case is filtering through photos taken in continuous shooting mode, where many nearly identical images need to be sorted and grouped.
Scenarios
Image clustering supports the following use cases:
Online storage and photo album management: Automatically create clusters of similar images for more personalized album organization. For example, group vacation photos by scene or subject without manual sorting.
Image deduplication: Identify and deduplicate similar images across your applications or albums to reduce storage usage and costs.
Prerequisites
Before you create an image clustering task, complete the following steps:
Create an index based on the metadata in your application scenario. For more information, see Create a metadata index.
Use the Official:ImageManagement workflow template when you create the dataset.
Create an image clustering task
Call the CreateSimilarImageClusteringTask operation to create an asynchronous clustering task that incrementally processes the files in the specified dataset and groups similar images into clusters.
The following example creates an image clustering task for the test-dataset dataset in the test-project project.
Image clustering incurs API call fees. For more information, see Billable items.
Sample request
{
"ProjectName": "test-project",
"DatasetName": "test-dataset"
}Sample response
{
"TaskId": "SimilarImageClustering-3b4ce06c-f19e-43ba-8ae9-29a4ba617eac",
"RequestId": "0FA88E7A-85C8-5016-8182-80FA2A711D29",
"EventId": "3BF-1mc8MI8FsJWMMgJhDO6O98mepq1"
}Sample code
# -*- coding: utf-8 -*-
# This file is auto-generated, don't edit it. Thanks.
import os
import sys
from typing import List
from alibabacloud_imm20200930.client import Client as imm20200930Client
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_imm20200930 import models as imm_20200930_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_tea_util.client import Client as UtilClient
class Sample:
def __init__(self):
pass
@staticmethod
def create_client(
access_key_id: str,
access_key_secret: str,
) -> imm20200930Client:
"""
Use your AccessKey ID and AccessKey secret to initialize the client.
@param access_key_id:
@param access_key_secret:
@return: Client
@throws Exception
"""
config = open_api_models.Config(
# (Required) Specify your AccessKey ID.
access_key_id=access_key_id,
# (Required) Specify your AccessKey secret.
access_key_secret=access_key_secret
)
# Specify the IMM endpoint. For more information about endpoints, visit https://api.aliyun.com/product/imm.
config.endpoint = f'imm.cn-beijing.aliyuncs.com'
return imm20200930Client(config)
@staticmethod
def main(
args: List[str],
) -> None:
# Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables are configured.
# If the project code is leaked, the AccessKey pair may be leaked and the security of resources within your account may be compromised. The following lines show how to obtain an AccessKey pair from the environment variables and use the AccessKey pair to call API operations. We recommend that you use STS access credentials for higher security. For more information, visit https://www.alibabacloud.com/help/document_detail/378659.html.
client = Sample.create_client(os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'], os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'])
create_similar_image_clustering_task_request = imm_20200930_models.CreateSimilarImageClusteringTaskRequest(
project_name='test-project',
dataset_name='test-dataset'
)
runtime = util_models.RuntimeOptions()
try:
# Write your code to print the response of the API operation if necessary.
client.create_similar_image_clustering_task_with_options(create_similar_image_clustering_task_request, runtime)
except Exception as error:
# Handle exceptions with caution in your actual business scenario and never ignore exceptions in your project. In this example, error messages are printed to the console.
# Print error messages.
print(error.message)
# Show the URL for troubleshooting.
print(error.data.get("Recommend"))
UtilClient.assert_as_string(error.message)
@staticmethod
async def main_async(
args: List[str],
) -> None:
# Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables are configured.
# If the project code is leaked, the AccessKey pair may be leaked and the security of resources within your account may be compromised. The following lines show how to obtain an AccessKey pair from the environment variables and use the AccessKey pair to call API operations. We recommend that you use STS access credentials for higher security. For more information, visit https://www.alibabacloud.com/help/document_detail/378659.html.
client = Sample.create_client(os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'], os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'])
create_similar_image_clustering_task_request = imm_20200930_models.CreateSimilarImageClusteringTaskRequest(
project_name='test-project',
dataset_name='test-dataset'
)
runtime = util_models.RuntimeOptions()
try:
# Write your code to print the response of the API operation if necessary.
await client.create_similar_image_clustering_task_with_options_async(create_similar_image_clustering_task_request, runtime)
except Exception as error:
# Handle exceptions with caution in your actual business scenario and never ignore exceptions in your project. In this example, error messages are printed to the console.
# Print error messages.
print(error.message)
# Show the URL for troubleshooting.
print(error.data.get("Recommend"))
UtilClient.assert_as_string(error.message)
if __name__ == '__main__':
Sample.main(sys.argv[1:])Query image clusters
After the clustering task completes, call the QuerySimilarImageClusters operation to query the image clusters.
The following example queries image clusters in the test-dataset dataset of the test-project project.
Sample request
{
"ProjectName": "test-project",
"DatasetName": "test-dataset"
}Sample response
{
"SimilarImageClusters": [
{
"ObjectId": "SimilarImageCluster-e5cdfdad-c02a-4093-aa58-400ff2e4520b",
"CreateTime": "2024-03-07T14:57:13.047481088+08:00",
"UpdateTime": "2024-03-07T14:57:13.047481088+08:00",
"Files": [
{
"ImageScore": 0.749,
"URI": "oss://test-ivanivan/p637447.jpeg"
},
{
"ImageScore": 0.749,
"URI": "oss://test-ivanivan/p637448.jpeg"
}
]
},
{
"ObjectId": "SimilarImageCluster-3350bbcf-a044-42f2-bedc-57eede4d476f",
"CreateTime": "2024-03-07T14:57:12.955958016+08:00",
"UpdateTime": "2024-03-07T14:57:12.955958016+08:00",
"Files": [
{
"ImageScore": 0.736,
"URI": "oss://test-ivanivan/hanhong.png"
},
{
"ImageScore": 0.736,
"URI": "oss://test-ivanivan/hanhong2.png"
}
]
},
{
"ObjectId": "SimilarImageCluster-4c239671-5504-4910-90f6-03cd863f686e",
"CreateTime": "2024-03-07T14:57:12.886128896+08:00",
"UpdateTime": "2024-03-07T14:57:12.886128896+08:00",
"Files": [
{
"ImageScore": 0.692,
"URI": "oss://test-ivanivan/dir1/mp4_png.png"
},
{
"ImageScore": 0.67,
"URI": "oss://test-ivanivan/dir1/demo.gif"
}
]
},
{
"ObjectId": "SimilarImageCluster-e77ac1ad-44b4-49d5-baa7-ad871efd0503",
"CreateTime": "2024-03-07T14:57:12.817118976+08:00",
"UpdateTime": "2024-03-07T14:57:12.817118976+08:00",
"Files": [
{
"ImageScore": 0.717,
"URI": "oss://test-ivanivan/OIP-C.jpeg"
},
{
"ImageScore": 0.717,
"URI": "oss://test-ivanivan/OIP-C1.jpeg"
}
]
},
{
"ObjectId": "SimilarImageCluster-315751c6-5b69-43b4-8c37-00e7ad2ec0e6",
"CreateTime": "2024-03-07T14:57:12.745981952+08:00",
"UpdateTime": "2024-03-07T14:57:12.745981952+08:00",
"Files": [
{
"ImageScore": 0.714,
"URI": "oss://test-ivanivan/A6.jpg"
},
{
"ImageScore": 0.709,
"URI": "oss://test-ivanivan/A4 (1).jpg"
}
]
},
{
"ObjectId": "SimilarImageCluster-140d3e92-7e67-4b9d-8066-3aea778e5898",
"CreateTime": "2024-03-07T14:57:12.65400192+08:00",
"UpdateTime": "2024-03-07T14:57:12.65400192+08:00",
"Files": [
{
"ImageScore": 0.709,
"URI": "oss://test-ivanivan/A1 (1).jpg"
},
{
"ImageScore": 0.709,
"URI": "oss://test-ivanivan/A2 (1).jpg"
}
]
}
],
"RequestId": "5830FFD2-C2E5-5431-9180-EBBACCC2FECE",
"NextToken": ""
}Sample code
# -*- coding: utf-8 -*-
# This file is auto-generated, don't edit it. Thanks.
import os
import sys
from typing import List
from alibabacloud_imm20200930.client import Client as imm20200930Client
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_imm20200930 import models as imm_20200930_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_tea_util.client import Client as UtilClient
class Sample:
def __init__(self):
pass
@staticmethod
def create_client(
access_key_id: str,
access_key_secret: str,
) -> imm20200930Client:
"""
Use your AccessKey ID and AccessKey secret to initialize the client.
@param access_key_id:
@param access_key_secret:
@return: Client
@throws Exception
"""
config = open_api_models.Config(
# (Required) Specify your AccessKey ID.
access_key_id=access_key_id,
# (Required) Specify your AccessKey secret.
access_key_secret=access_key_secret
)
# Specify the IMM endpoint. For more information about endpoints, visit https://api.aliyun.com/product/imm.
config.endpoint = f'imm.cn-beijing.aliyuncs.com'
return imm20200930Client(config)
@staticmethod
def main(
args: List[str],
) -> None:
# Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables are configured.
# If the project code is leaked, the AccessKey pair may be leaked and the security of resources within your account may be compromised. The following lines show how to obtain an AccessKey pair from the environment variables and use the AccessKey pair to call API operations. We recommend that you use STS access credentials for higher security. For more information, visit https://www.alibabacloud.com/help/document_detail/378659.html.
client = Sample.create_client(os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'], os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'])
query_similar_image_clusters_request = imm_20200930_models.QuerySimilarImageClustersRequest(
dataset_name='test-dataset',
project_name='test-project'
)
runtime = util_models.RuntimeOptions()
try:
# Write your code to print the response of the API operation if necessary.
client.query_similar_image_clusters_with_options(query_similar_image_clusters_request, runtime)
except Exception as error:
# Handle exceptions with caution in your actual business scenario and never ignore exceptions in your project. In this example, error messages are printed to the console.
# Print error messages.
print(error.message)
# Show the URL for troubleshooting.
print(error.data.get("Recommend"))
UtilClient.assert_as_string(error.message)
@staticmethod
async def main_async(
args: List[str],
) -> None:
# Make sure that the ALIBABA_CLOUD_ACCESS_KEY_ID and ALIBABA_CLOUD_ACCESS_KEY_SECRET environment variables are configured.
# If the project code is leaked, the AccessKey pair may be leaked and the security of resources within your account may be compromised. The following lines show how to obtain an AccessKey pair from the environment variables and use the AccessKey pair to call API operations. We recommend that you use STS access credentials for higher security. For more information, visit https://www.alibabacloud.com/help/document_detail/378659.html.
client = Sample.create_client(os.environ['ALIBABA_CLOUD_ACCESS_KEY_ID'], os.environ['ALIBABA_CLOUD_ACCESS_KEY_SECRET'])
query_similar_image_clusters_request = imm_20200930_models.QuerySimilarImageClustersRequest(
dataset_name='test-dataset',
project_name='test-project'
)
runtime = util_models.RuntimeOptions()
try:
# Write your code to print the response of the API operation if necessary.
await client.query_similar_image_clusters_with_options_async(query_similar_image_clusters_request, runtime)
except Exception as error:
# Handle exceptions with caution in your actual business scenario and never ignore exceptions in your project. In this example, error messages are printed to the console.
# Print error messages.
print(error.message)
# Show the URL for troubleshooting.
print(error.data.get("Recommend"))
UtilClient.assert_as_string(error.message)
if __name__ == '__main__':
Sample.main(sys.argv[1:])