Intelligent Media Management (IMM) is designed to efficiently process large amounts of data by using AI-driven analysis capabilities based on consistent standards to meet data analysis and processing requirements in specific scenarios and implement device-cloud integration. The service architecture of IMM consists of three layers: data processing engine, metadata management, and scenario-based encapsulation.
The architecture has upstream and downstream dependent infrastructure and applications.
In the downstream direction, IMM securely accesses unstructured data (for example, images and videos) stored in Alibaba Cloud storage services such as Object Storage Service (OSS) and Apsara File Storage NAS and extracts information from the unstructured data.
In the upstream direction, IMM provides encapsulated scenario-specific capabilities that can be added to image and video applications such as online storage, cloud albums, social galleries, and home surveillance systems.

Processing engine layer
IMM uses a distributed computing framework that deploys computing resources near the regions of Alibaba Cloud storage services and supports asynchronous batch processing and real-time synchronous processing. After an IMM project is associated with resources in Alibaba Cloud storage services, such as the directory prefix or an object in OSS, IMM automatically processes data based on industry-leading data analysis and processing algorithms. The processing engine layer provides the following features:
Document format conversion
Converts documents in 48 formats, including Microsoft Office document formats, to JPG, PNG, PDF, TXT, and VECTOR formats. This feature can be used in many document processing scenarios, such as the preview of documents in online storage services.
Content recognition
Recognizes information such as scenes, objects, and events in images and automatically labels images. This feature can be used in scenarios such as image content moderation and image retrieval.
Face detection
Detects faces and other attributes such as the age, gender, and mood of people in images. This feature can be used in scenarios such as album classification.
QR code recognition
Detects images for QR codes and recognizes information stored in QR codes. This feature can be used in scenarios such as image content moderation.
Human body detection
Detects the positions of persons in images with confidence levels included in the detection result. This feature can be used in scenarios such as abnormal behavior detection.
Face search
Searches for the top N faces that are most similar to the face in the specified image, with results arranged in descending order of similarity. This feature can be used in scenarios such as member management, album classification, and person search.
Face comparison
Compares the largest two face parts of two images to measure face similarity. This feature can be used in scenarios such as identity verification.
Blind watermarking
Adds an image or text blind watermark to an image. The blind watermark is invisible until you decode the blind watermark by using the blind watermark decoding feature of IMM. The blind watermarking feature can be used in scenarios such as image copyright protection.
Metadata management layer
Based on capabilities at the processing engine layer and in-depth scene analysis, IMM encapsulates scene metadata management capabilities that allow developers to easily develop applications without focusing on maintaining metadata index databases. IMM supports the following metadata indexes:
Indexing for face clustering
Create a metadata index for face clustering to retrieve similar faces. This type of index is useful for scenarios such as people albums in online storage, stranger detection in home surveillance systems, and customer management in the new retail industry.
Indexing for label grouping
Create a metadata index for labels to search for images by label. This type of index is useful for label detection in various scenarios, including scene-specific albums in online storage, pet tracking within surveillance systems, and the identification of pornographic images.
Scenario-based encapsulation layer
IMM encapsulates capabilities in the processing engine layer and metadata management layer so that they can be easily added to your applications. You can apply these IA-driven capabilities to your scenarios to add functionality. IMM supports the following scenarios:
Document scenarios
IMM provides document format conversion and preview that you can use to implement intelligent document management capabilities.
Image scenarios
IMM integrates AI-driven capabilities such as content recognition and face detection that you can use to implement intelligent image management capabilities.