Build a Multimodal Image Search Pipeline with Smart Tagging - Platform for AI

Use large multimodal models to label and index image datasets, then search and filter data using metadata for model training.

Overview

Multimodal data management handles image data using large multimodal models and embedding models to preprocess data through intelligent labeling and semantic indexing, generating rich metadata. Search and filter multimodal data using this metadata to quickly identify data subsets for specific scenarios, then use them for data labeling and model training. PAI datasets provide a full set of OpenAPIs for easy integration into custom platforms. The service architecture:

Limitations

PAI multimodal data management has the following limitations:

Supported regions: Hangzhou, Shanghai, Shenzhen, Ulanqab, Beijing, Guangzhou, Singapore, Germany, US (Virginia), China (Hong Kong), Tokyo, Jakarta, US (Silicon Valley), Kuala Lumpur, and Seoul.
Storage type: Only Object Storage Service (OSS) is supported.
File types: Only image files are supported. Supported formats include JPG, JPEG, PNG, GIF, BMP, TIFF, and WEBP.
Number of files: A single dataset version supports up to 1,000,000 files. Contact your PAI PDSA for higher limits.
Models:
- Labeling model: Qwen-VL-Max or Qwen-VL-Plus on Alibaba Cloud Model Studio.
- Indexing model: Multimodal embedding models from Alibaba Cloud Model Studio (such as tongyi-embedding-vision-plus) or GME models from PAI Model Gallery. Deploy these models on PAI-EAS.
Metadata storage:
- Metadata: Stored securely in PAI’s built-in metadatabase.
- Embedding vectors: Stored in one of the following custom vector databases:
  - Elasticsearch (Vector Enhanced Edition, version 8.17.0 or later)
  - OpenSearch (Vector Search Edition)
  - Milvus (version 2.4 or later)
  - Hologres (version 4.0.9 or later)
  - Lindorm (Vector Engine Edition)
Dataset processing mode: Intelligent labeling and semantic indexing tasks support both full and incremental modes.

Workflow

PAI多模态数据管理使用说明

Prerequisites

Enable PAI and configure workspace

Use your root account to enable PAI and create a workspace. Go to the PAI console. In the upper-left corner, select your region. Then click one-click authorization to enable the service.
Grant permissions to your operation account. Skip this step if using your root account. For RAM users, assign the workspace administrator role. For details, see Manage Workspaces > Member Role Configuration.

Activate Model Studio and create API key

Enable Alibaba Cloud Model Studio and create an API key. For instructions, see Get an API Key.

Create vector database

Create vector database instance

Multimodal dataset management supports the following Alibaba Cloud vector databases:

Elasticsearch (Vector Enhanced Edition, version 8.17.0 or later)
OpenSearch (Vector Search Edition)
Milvus (version 2.4 or later)
Hologres (version 4.0.9 or later)
Lindorm (Vector Engine Edition)

For instructions on creating vector database instances, see corresponding product documentation.

Network and whitelist configuration

Public network access

If your vector database instance has a public endpoint, add the IP addresses below to the public access whitelist. Multimodal data management can then access the instance over the Internet. For Elasticsearch, see Configure Public or Private Network Access Whitelist.

Region	IP list
Hangzhou	47.110.230.142, 47.98.189.92
Shanghai	47.117.86.159, 106.14.192.90
Shenzhen	47.106.88.217, 39.108.12.110
Ulanqab	8.130.24.177, 8.130.82.15
Beijing	39.107.234.20, 182.92.58.94

Private network access

Submit a ticket to request access.

Create vector index table (optional)

The system can automatically create an index table. Skip this step if custom index table is not needed.

In some vector databases, an index table is called a collection or index.

Index table structure definition (Follow this structure):

Table Schema Definition

{
    "id":"text",                    // Primary key ID. Required in OpenSearch. Default in other databases.
    "index_set_id": "keyword",      // Index set ID. Must be indexed.
    "file_meta_id": "text",         // File metadata ID.
    "dataset_id": "text",           // Dataset ID.
    "dataset_version": "text",      // Dataset version.
    "uri": "text",                  // OSS URI of the file.
    "file_vector": {                // Vector field.
        "type": "float",            // Vector type: float.
        "dims": 1536,               // Vector dimensions. Customize as needed.
        "similarity": "DotProduct"  // Similarity algorithm: cosine or dot product.
    }
}

This topic uses Elasticsearch as an example. The following Python code shows how to create a semantic index table. For other vector databases, see their documentation.

Example code: Create a semantic index table in Elasticsearch

from elasticsearch import Elasticsearch

# 1. Connect to your Alibaba Cloud Elasticsearch instance.
# Note:
# (1) Use Python 3.9 or later: python3 -V
# (2) Install Elasticsearch client version 8.x: pip show elasticsearch
# (3) If using a VPC endpoint, ensure your client and Elasticsearch instance are in the same VPC.
#     Otherwise, use the public endpoint and add your client's public IP to Elasticsearch whitelist.
# Default username is elastic.
es_client = Elasticsearch(
    hosts=["http://es-cn-l4p***5z.elasticsearch.aliyuncs.com:9200"],
    basic_auth=("{userName}", "{password}"),
)

# 2. Define the index name and mapping. HNSW is used by default.
index_name = "dataset_embed_test"
index_mapping = {
    "settings": {
        "number_of_shards": 1,          # Number of shards.
        "number_of_replicas": 1         # Number of replicas.
    },
    "mappings": {
        "properties": {
            "index_set_id": {
                "type": "keyword"
            },
            "uri": {
                "type": "text"
            },
            "file_meta_id": {
                "type": "text"
            },
            "dataset_id": {
                "type": "text"
            },
            "dataset_version": {
                "type": "text"  
            },
            "file_vector": {
                "type": "dense_vector",  # Define file_vector as dense vector.
                "dims": 1536,  # Vector dimensions: 1536.
                "similarity": "dot_product"  # Similarity method: dot product.
            }
        }
    }
}

# 3. Create the index.
if not es_client.indices.exists(index=index_name):
    es_client.indices.create(index=index_name, body=index_mapping)
    print(f"Index {index_name} created successfully!")
else:
    print(f"Index {index_name} already exists. No action taken.")

# 4. View the index mapping (optional).
# indexes = es_client.indices.get(index=index_name)
# print(indexes)

Create dataset

In your PAI workspace, click AI Asset Management > Datasets > Create Dataset to open the dataset configuration page.
Configure dataset parameters. Key parameters are listed below. Use defaults for others.
1. Storage: Object Storage Service (OSS).
2. Type: Premium.
3. Content Type: Image.
4. OSS Path: Select the OSS path where your dataset is stored. If no dataset exists, download the sample dataset retrieval_demo_data, upload it to OSS, and try multimodal data management.
Note
Importing files or folders only records the path in the system. It does not copy the data.

Then click OK to create the dataset.

Create connections

Create connection for intelligent labeling

In your PAI workspace, click AI Asset Management > Connection > Model Service > Create Connection to open the connection creation page.
Select Alibaba Cloud Model Studio Service and configure the API key.
After successful creation, find your new connection in the list.

Create connection for semantic indexing

Skip this step if using Alibaba Cloud Model Studio’s semantic indexing service. In the left menu, click Model Gallery. Find and deploy the GME multimodal retrieval model. This creates an EAS service. Deployment takes about five minutes. When the status shows Running, deployment is complete.

Important
Stop and delete the service when no longer needed to avoid charges.
In your PAI workspace, click AI Asset Management > Connection > Model Service > Create Connection to open the connection creation page.
Configure the model connection based on whether you use Alibaba Cloud Model Studio’s semantic indexing model or your own EAS-deployed model.
Use Alibaba Cloud Model Studio’s semantic indexing model
- Connection Type: Select General Multimodal Embedding Model Service.
- Service Provider: Select Third-party service model.
- Model Name: tongyi-embedding-vision-plus.
- base_url: https://dashscope.aliyuncs.com/api/v1/services/embeddings/multimodal-embedding/multimodal-embedding
- api_key: Get your API key from Get an API Key and enter it here.
Use your own EAS-deployed semantic indexing model
- Connection Type: Select General Multimodal Embedding Model Service.
- Service Provider: Select PAI-EAS Model Service.
- EAS Service: Select the GME multimodal retrieval model you just deployed. If the service is not under your current account, choose a third-party model service.
After successful creation, find your new connection in the list.

Create vector database connection

In the left menu, click AI Asset Management > Connection > Database > Create Connection to open the connection creation page.

Multimodal search supports Milvus, Lindorm, OpenSearch, Elasticsearch, and Hologres. This example uses Elasticsearch. Select Elasticsearch and configure uri, username, and password. For details, see Create a Database Connection.

Connection format examples for each vector database:

Milvus

uri: http://xxx.milvus.aliyuncs.com:19530 
database: {your_data_base} 
token: root:{password}

OpenSearch

uri: http://xxxx.ha.aliyuncs.com
username: {username} 
password: {password}

Hologres

host: xxxx.hologres.aliyuncs.com
database: {your_data_base} 
port: {port}
access_key_id={password}

Elasticsearch

uri: http://xxxx.elasticsearch.aliyuncs.com:9200
username: {username} 
password: {password}

Lindorm

uri: xxxx.lindorm.aliyuncs.com:{port}
username: {username} 
password: root:{password}

After successful creation, find your new connection in the list.

Create intelligent labeling job

Create intelligent label definition

In the left menu, click AI Asset Management > Datasets > Intelligent Tag Definition > Create Intelligent Tag Definition to open the label configuration page. Example configuration:

Guide Prompt: As an experienced driver with many years on highways and city roads, you know how to handle common driving scenarios.

Tag Definition:

Autonomous driving label example

{
    "Reflective tape": "Usually yellow or black-and-yellow striped. Attached to permanent obstacles like corners to warn drivers. Strip-shaped—not traffic cones, wheel locks, or water barrels!",
    "Wheel lock": "Also called a parking lock. Prevents unauthorized parking when raised. Always specify if raised or lowered. Raised if there is a frame.",
    "Lit construction vehicle": "Has two arrow-shaped lights, lit. Not present otherwise.",
    "Overturned vehicle": "Vehicle lying on its side.",
    "Fallen water barrel": "A plastic barrier used to divide roads or block traffic. Usually red and wall-shaped. Common on highways, city roads, and overpasses. Larger than cones and flat. Specify if fallen.",
    "Fallen traffic cone": "Also called a traffic cone or snow cone. Cone-shaped temporary road marker. Not rod- or flat-shaped. To check if fallen, see if the base touches the ground.",
    "Charging parking spot": "Near a wall with visible charging equipment or labeled 'new energy vehicle'. Found only in parking lots (indoor or outdoor). Wheel locks are unrelated.",
    "Speed bump": "Usually yellow or black-and-yellow. Narrow ridge across the road to slow vehicles. Never in parking spots.",
    "Deceleration lane markings": "Fishbone-style dashed lines on both sides of the lane, inside solid lines.",
    "Ramp": "Large curved highway segment. Usually on the right side of main highways. Only confirm at toll plazas.",
    "Ground shadow": "Clear shadows on the ground.",
    "Cloudy": "Only if sky is visible and clearly cloudy.",
    "Glare from headlights": "Headlights appear as streaks instead of points—common at night or in rain.",
    "Left-turn, right-turn, U-turn arrows": "White (or sometimes yellow) arrows painted on lanes—not green-white highway signs. Only count clear center-lane arrows. Right-turn: clockwise from base to tip. Left-turn: counterclockwise. U-turn: U-shaped.",
    "Crosswalk": "White parallel stripes on roads or parking lots, for pedestrians. Never on highways, ramps, or tunnels.",
    "Overexposure": "Camera overexposed due to direct sunlight—daytime only.",
    "Motor vehicle": "Any other motorized vehicle in view.",
    "Lane merge/diverge": "Where multiple lanes become one—or one splits into many.",
    "Intersection": "Road intersection without lane markings inside it.",
    "No-parking sign": "Sign hanging or standing on ground with 'no parking' text or circle-with-P-and-slash symbol.",
    "Lane markings": "Road lane lines—especially blurry ones.",
    "Stones or tires on road": "Obstacles blocking traffic.",
    "Tunnel": "Watch for tunnel entrances and exits.",
    "Wet road in rain": "Road surface wet and slippery in rain.",
    "Non-motorized vehicle": "Bicycles, e-bikes, wheelchairs, unicycles, shopping carts—parked or moving."
  }

Create offline intelligent labeling job

Click Custom Dataset. Click the dataset name to open its details page. Then click Dataset jobs.
On the jobs page, click Create job > Smart tag and configure job parameters.
- Dataset Version: Select the version to label, such as v1.
- Labeling Model Connection: Select your Alibaba Cloud Model Studio model connection.
- Smart Labeling Model: Supports Qwen-VL-Max and Qwen-VL-Plus.
- Max Concurrency: Set based on your EAS model service specs. Suggested maximum per GPU: 5.
- Intelligent Tag Definition: Select the definition you just created.
- Labeling Mode: Choose Increment or Full.
After successful creation, find your labeling job in the list. To monitor or stop it, click the link on the right side of the list.

Note
The first run builds metadata. Wait patiently—it may take time.

3.5 Create a semantic indexing job

Click the dataset name to open its details page. In the Index Configuration section, click Edit.
Configure the index library.
- Index Model Connection: Select the index model connection created in 3.3.2.
- Index Database Connection: Select the index database connection created in 3.3.3.
- Index Database Table: Enter the index table name created in Create Vector Index Table (Optional): dataset_embed_test.
Click Save > Refresh Now. This starts a semantic indexing job for the selected dataset version. It updates semantic indexes for all files in that version. To view job details, click Semantic Indexing Task in the top-right corner of the dataset details page.

Note
The first run builds metadata. Wait patiently—it may take time.

If you cancel instead of clicking Refresh Now, create the job manually:

On the dataset details page, click Dataset jobs to go to the jobs page.

Click Create job > Semantic Indexing. Configure the dataset version. Set the maximum number of concurrent jobs based on your EAS model service specifications, with a recommended maximum of 5 per GPU. Then click OK to create the job.

Preview data

After intelligent labeling and semantic indexing jobs finish, click View Data on the dataset details page to preview images in that version.
In the View Data page, preview images. Switch between Gallery View and List View.
Click an image to view it full-size and see its labels.
Click the checkbox in the top-left corner of a thumbnail to select it. Hold Shift and click checkboxes to select multiple rows.

Search data (combined search)

In the left toolbar of the View Data page, use Index Retrieval and Search by Tag. Press Enter or click Search.
Index Retrieval: Text keyword search. Matches keywords against image index vectors. In Advanced Settings, set top-k and score threshold.
Index Retrieval: Search by image. Upload a local image or select one from OSS. Matches against image index vectors. In Advanced Settings, set top-k and score threshold.
Search by Tag: Matches keywords against image labels. Use logic: Include Any of Following (NOT), Include All Following (AND), or Exclude Any of Following (NOT).
Metadata: Search by filename, storage path, or last modified time.

All search conditions use AND logic.

3.8 Advanced data search (DSL)

Advanced search supports DSL search. DSL is a domain-specific language for complex queries. It supports grouping, Boolean logic (AND/OR/NOT), range comparisons (>, >=, <, <=), attribute existence (HAS/NOT HAS), token matching (:), and exact matching (=). For syntax details, see List Dataset File Metadata.

3.9 Export search results

Note

This step exports search results as a file list index for later model training or data analytics.

After searching, click Export Results at the bottom of the page. Two export options are available:

Export as file

Click Export as file. On the config page, set export content and target OSS directory. Click OK.
To track progress, click AI Asset Management > Job > Dataset jobs.
Use the exported result. After export, mount the result file and original dataset to your training environment (such as DLC or DSW instances). Then use code to read the index file and load target files for model training or analysis.

Export to logical dataset version

Export a search result from an advanced dataset to a version of a logical dataset. Later, use the dataset SDK to access that version.

Click Export to logical dataset version. Select a target logical dataset and click Confirm.

If a Boolean dataset is not available, see the following:
Create a logical dataset
Create a logical dataset. In the left menu, click AI Asset Management > Dataset > Create Dataset. Configure key parameters below. Adjust others as needed:
- Dataset Type: Select Logical.
- Metadata OSS path: Select an exported OSS path.
- Import method: Select Import later.
Click OK to create the dataset.
Use the logical dataset. After the import job finishes, the target logical dataset contains the exported metadata. Load and use it with the SDK. See the dataset details page for SDK usage instructions.

Install the SDK with:
```
pip install https://pai-sdk.oss-cn-shanghai.aliyuncs.com/dataset/pai_dataset_sdk-1.0.0-py3-none-any.whl
```

Custom semantic indexing model (optional)

Fine-tune a custom semantic retrieval model. After deploying it on EAS, create a model connection using the steps in 3.3.2. Then use it in multimodal data management.

Prepare data

This topic provides a sample dataset retrieval_demo_data. Click to download.

Data format requirements

Each data sample is one JSON line in dataset.jsonl. Include these fields:

image_id: Unique identifier for the image (e.g., filename or ID).
tags: List of text labels for the image. Must be a string array.

Example format:

{  
    "image_id": "c909f3df-ac4074ed",  
    "tags": ["silver sedan", "white SUV", "city street", "snow", "night"], 
}

File organization

Put all image files in an images folder. Place dataset.jsonl in the same directory as the images folder.

Directory example:

├── images
│   ├── image1.jpg
│   ├── image2.jpg
│   └── image3.jpg
└── dataset.jsonl

Important

Use the exact filename dataset.jsonl. Do not rename the images folder.

Train model

In Model Gallery, find retrieval-related models. Choose one based on size and compute resource needs.

	VRAM for fine-tuning (bs=4)	Fine-tuning speed (4×A800, samples/sec)	Deploy VRAM	Vector dimensions
GME-2B	14 GB	16.331	5G	1536
GME-7B	35 GB	13.868	16 GB	3584

As an example, train the GME-2B model. Click Train. Enter the data path (default is the sample data path). Enter the model output path. Then start training.

Deploy model

After training, click Deploy in the training job to deploy the fine-tuned model.

Click the Deploy button in the Model Gallery tab to deploy the original GME model.

After deployment, get the EAS Endpoint and Token.

Call model service

Input parameters

Name

Type

Required

Example

Description

model

String

Yes

pai-multimodal-embedding-v1

Model type. Supports custom models and base model version updates.

contents.input

list(dict) or list(str)

input = [{'text': text}]

input=[xxx,xxx,xxx,...]

input = [{'text': text},{'image', f"data:image/{image_format};base64,{image64}"}]

Content to embed.

Supports text and image only.

Output parameters

Name	Type	Example	Description
status_code	Integer	200	HTTP status code. 200: Success 204: Partial success 400: Failure
message	list(str)	['Invalid input data: must be a list of strings or dict']	Error message
output	dict	See next table	Embedding result

DashScope returns {'output', {'embeddings': list(dict), 'usage': xxx, 'request_id':xxx}} (ignore 'usage' and 'request_id').

Each element in embeddings includes these keys (errors go to message):

Name	Type	Example	Description
index	Data ID	0	HTTP status code. 200, 400, 500, etc.
embedding	List[Float]	[0.0391846,0.0518188,.....,-0.0329895, 0.0251465] 1536	Embedded vector
type	String	"Internal execute error."	Error message

Call example code

import base64
import json
import os
import sys
from io import BytesIO

import requests
from PIL import Image, PngImagePlugin
import numpy as np

ENCODING = 'utf-8'

hosts = 'EAS URL'
head = {
    'Authorization': 'EAS TOKEN'
}

def encode_image_to_base64(image_path):
    """
    Encode an image file to Base64 string
    """
    with open(image_path, "rb") as image_file:
        # Read binary data
        image_data = image_file.read()
        # Encode to Base64 string
        base64_encoded = base64.b64encode(image_data).decode('utf-8')
    
    return base64_encoded

if __name__=='__main__':
    iamege_path = "path_to_your_image"
    text = 'prompt'

    image_format = 'jpg'
    input_data = []
    
    image64 = encode_image_to_base64(image_path)
    input_data.append({'image': f"data:image/{image_format};base64,{image64}"})

    input_data.append({'text': text})

    datas = json.dumps({
        'input': {
            'contents': input_data
        }
    })
    r = requests.post(hosts, data=datas, headers=head)
    data = json.loads(r.content.decode('utf-8'))

    if data['status_code']==200:
        if len(data['message'])!=0:
            print('Part failed for the following reasons.')
            print(data['message'])

        for result_item in data['output']['embeddings']:
            print('The following succeed.')
            print('index', result_item['index'])
            print('type', result_item['type'])
            print('embedding', len(result_item['embedding']))
    else:
        print('Processed fail')
        print(data['message'])

Output example:

{
    "status_code": 200,
    "message": "",
    "output": {
        "embeddings": [
            {
                "index": 0,
                "embedding": [
                    -0.020782470703125,
                    -0.01399993896484375,
                    -0.0229949951171875,
                    ...
                ],
                "type": "text"
            }
        ]
    }
}

Evaluate model

Results on our sample data (using the evaluation file):

Precision of original model

Precision after 1 epoch fine-tuning

gme2b

Precision@1 0.3542

Precision@5 0.5280

Precision@10 0.5923

Precision@50 0.5800

Precision@100 0.5792

Precision@1 0.4271

Precision@5 0.6480

Precision@10 0.7308

Precision@50 0.7331

Precision@100 0.7404

gme7b

Precision@1 0.3958

Precision@5 0.5920

Precision@10 0.6667

Precision@50 0.6517

Precision@100 0.6415

Precision@1 0.4375

Precision@5 0.6680

Precision@10 0.7590

Precision@50 0.7683

Precision@100 0.7723

Model evaluation script example

import base64
import json
import os
import requests
import numpy as np
import torch
from tqdm import tqdm
from collections import defaultdict


# Constants
ENCODING = 'utf-8'
HOST_URL = 'http://1xxxxxxxx4.cn-xxx.pai-eas.aliyuncs.com/api/xxx'
AUTH_HEADER = {'Authorization': 'ZTg*********Mw=='}

def encode_image_to_base64(image_path):
    """Encode an image file to Base64 string"""
    with open(image_path, "rb") as image_file:
        image_data = image_file.read()
        base64_encoded = base64.b64encode(image_data).decode(ENCODING)
    return base64_encoded


def load_image_features(feature_file):
    print("Begin to load image features...")
    image_ids, image_feats = [], []
    with open(feature_file, "r") as fin:
        for line in tqdm(fin):
            obj = json.loads(line.strip())
            image_ids.append(obj['image_id'])
            image_feats.append(obj['feature'])
    image_feats_array = np.array(image_feats, dtype=np.float32)
    print("Finished loading image features.")
    return image_ids, image_feats_array


def precision_at_k(predictions, gts, k):
    """
    Calculate precision at K.
    
    :param predictions: [(image_id, similarity_score), ...]
    :param gts: set of ground truth image_ids
    :param k: int, top K results
    :return: float, precision
    """
    if len(predictions) > k:
        predictions = predictions[:k]
    
    predicted_ids = {p[0] for p in predictions}
    relevant_and_retrieved = predicted_ids.intersection(gts)
    precision = len(relevant_and_retrieved) / k
    return precision


def main():
    root_dir = '/mnt/data/retrieval/data/'
    data_dir = os.path.join(root_dir, 'images')
    tag_file = os.path.join(root_dir, 'meta/test.jsonl')
    model_type = 'finetune_gme7b_final'
    save_feature_file = os.path.join(root_dir, 'features', f'features_{model_type}_eas.jsonl')
    final_result_log = os.path.join(root_dir, 'results', f'retrieval_{model_type}_log_eas.txt')
    final_result = os.path.join(root_dir, 'results', f'retrieval_{model_type}_log_eas.jsonl')

    os.makedirs(os.path.join(root_dir, 'features'), exist_ok=True)
    os.makedirs(os.path.join(root_dir, 'results'), exist_ok=True)

    tag_dict = defaultdict(list)
    gt_image_ids = []
    with open(tag_file, 'r') as f:
        lines = f.readlines()
        for line in lines:
            data = json.loads(line.strip())
            gt_image_ids.append(data['image_id'])
            img_id = data['image_id'].split('.')[0]
            for caption in data['tags']:
                tag_dict[caption.strip()].append(img_id)

    print('Total tags:', len(tag_dict.keys()))

    prefix = ''
    texts = [prefix + text for text in tag_dict.keys()]
    images = [os.path.join(data_dir, i+'.jpg') for i in gt_image_ids]
    print('Total images:', len(images))

    encode_images = True
    if encode_images:
        with open(save_feature_file, "w") as fout:
            for image_path in tqdm(images):
                image_id = os.path.basename(image_path).split('.')[0]
                image64 = encode_image_to_base64(image_path)
                input_data = [{'image': f"data:image/jpg;base64,{image64}"}]

                datas = json.dumps({'input': {'contents': input_data}})
                r = requests.post(HOST_URL, data=datas, headers=AUTH_HEADER)

                data = json.loads(r.content.decode(ENCODING))
                if data['status_code'] == 200:
                    if len(data['message']) != 0:
                        print('Part failed:', data['message'])
                    for result_item in data['output']['embeddings']:
                        fout.write(json.dumps({"image_id": image_id, "feature": result_item['embedding']}) + "\n")
                else:
                    print('Processed fail:', data['message'])

    image_ids, image_feats_array = load_image_features(save_feature_file)

    top_k_list = [1, 5, 10, 50, 100]
    top_k_list_precision  = [[] for _ in top_k_list]

    with open(final_result, 'w') as f_w, open(final_result_log, 'w') as f:
        for tag in tqdm(texts):
            datas = json.dumps({'input': {'contents': [{'text': tag}]}})
            r = requests.post(HOST_URL, data=datas, headers=AUTH_HEADER)
            data = json.loads(r.content.decode(ENCODING))

            if data['status_code'] == 200:
                if len(data['message']) != 0:
                    print('Part failed:', data['message'])

                for result_item in data['output']['embeddings']:
                    text_feat_tensor = result_item['embedding']
                    idx = 0
                    score_tuples = []
                    batch_size = 128
                    while idx < len(image_ids):
                        img_feats_tensor = torch.from_numpy(image_feats_array[idx:min(idx + batch_size, len(image_ids))]).cuda()
                        batch_scores = torch.from_numpy(np.array(text_feat_tensor)).cuda().float() @ img_feats_tensor.t()
                        for image_id, score in zip(image_ids[idx:min(idx + batch_size, len(image_ids))], batch_scores.squeeze(0).tolist()):
                            score_tuples.append((image_id, score))
                        idx += batch_size
                    
                    predictions = sorted(score_tuples, key=lambda x: x[1], reverse=True)
            else:
                print('Processed fail:', data['message'])

            gts = tag_dict[tag.replace(prefix, '')]

            # Write result
            predictions_tmp = predictions[:10]
            result_dict = {'tag': tag, 'gts': gts, 'preds': [pred[0] for pred in predictions_tmp]}
            f_w.write(json.dumps(result_dict, ensure_ascii=False, indent=4) + '\n')

            for top_k_id, k in enumerate(top_k_list):
                need_exit = False

                if k > len(gts):
                    k = len(gts)
                    need_exit = True

                prec = precision_at_k(predictions, gts, k)

                f.write(f'Tag {tag}, Len(GT) {len(gts)}, Precision@{k} {prec:.4f} \n')
                f.flush()

                if need_exit:
                    break
                else:
                    top_k_list_precision[top_k_id].append(prec)
                    
    for idx, k in enumerate(top_k_list):
        print(f'Precision@{k} {np.mean(top_k_list_precision[idx]):.4f}')


if __name__ == "__main__":
    main()

Use model

After deploying your fine-tuned embedding model on EAS, create a model connection using the steps in 3.3.2. Then use it in multimodal data management.