All Products
Search
Document Center

Platform For AI:Multimodal data management and usage

Last Updated:Mar 11, 2026

Use large multimodal models to label and index image datasets, then search and filter data using metadata for model training.

Overview

Multimodal data management handles image data using large multimodal models and embedding models to preprocess data through intelligent labeling and semantic indexing, generating rich metadata. Search and filter multimodal data using this metadata to quickly identify data subsets for specific scenarios, then use them for data labeling and model training. PAI datasets provide a full set of OpenAPIs for easy integration into custom platforms. The service architecture:

image

Limitations

PAI multimodal data management has the following limitations:

  • Supported regions: Hangzhou, Shanghai, Shenzhen, Ulanqab, Beijing, Guangzhou, Singapore, Germany, US (Virginia), China (Hong Kong), Tokyo, Jakarta, US (Silicon Valley), Kuala Lumpur, and Seoul.

  • Storage type: Only Object Storage Service (OSS) is supported.

  • File types: Only image files are supported. Supported formats include JPG, JPEG, PNG, GIF, BMP, TIFF, and WEBP.

  • Number of files: A single dataset version supports up to 1,000,000 files. Contact your PAI PDSA for higher limits.

  • Models:

    • Labeling model: Qwen-VL-Max or Qwen-VL-Plus on Alibaba Cloud Model Studio.

    • Indexing model: Multimodal embedding models from Alibaba Cloud Model Studio (such as tongyi-embedding-vision-plus) or GME models from PAI Model Gallery. Deploy these models on PAI-EAS.

  • Metadata storage:

    • Metadata: Stored securely in PAI’s built-in metadatabase.

    • Embedding vectors: Stored in one of the following custom vector databases:

      • Elasticsearch (Vector Enhanced Edition, version 8.17.0 or later)

      • OpenSearch (Vector Search Edition)

      • Milvus (version 2.4 or later)

      • Hologres (version 4.0.9 or later)

      • Lindorm (Vector Engine Edition)

  • Dataset processing mode: Intelligent labeling and semantic indexing tasks support both full and incremental modes.

Workflow

PAI多模态数据管理使用说明

Prerequisites

Enable PAI and configure workspace

  1. Use your root account to enable PAI and create a workspace. Go to the PAI console. In the upper-left corner, select your region. Then click one-click authorization to enable the service.

  2. Grant permissions to your operation account. Skip this step if using your root account. For RAM users, assign the workspace administrator role. For details, see Manage Workspaces > Member Role Configuration.

Activate Model Studio and create API key

Enable Alibaba Cloud Model Studio and create an API key. For instructions, see Get an API Key.

Create vector database

Create vector database instance

Multimodal dataset management supports the following Alibaba Cloud vector databases:

  • Elasticsearch (Vector Enhanced Edition, version 8.17.0 or later)

  • OpenSearch (Vector Search Edition)

  • Milvus (version 2.4 or later)

  • Hologres (version 4.0.9 or later)

  • Lindorm (Vector Engine Edition)

For instructions on creating vector database instances, see corresponding product documentation.

Network and whitelist configuration

  • Public network access

    If your vector database instance has a public endpoint, add the IP addresses below to the public access whitelist. Multimodal data management can then access the instance over the Internet. For Elasticsearch, see Configure Public or Private Network Access Whitelist.

    Region

    IP list

    Hangzhou

    47.110.230.142, 47.98.189.92

    Shanghai

    47.117.86.159, 106.14.192.90

    Shenzhen

    47.106.88.217, 39.108.12.110

    Ulanqab

    8.130.24.177, 8.130.82.15

    Beijing

    39.107.234.20, 182.92.58.94

  • Private network access

    Submit a ticket to request access.

Create vector index table (optional)

The system can automatically create an index table. Skip this step if custom index table is not needed.

In some vector databases, an index table is called a collection or index.

Index table structure definition (Follow this structure):

Table Schema Definition

{
    "id":"text",                    // Primary key ID. Required in OpenSearch. Default in other databases.
    "index_set_id": "keyword",      // Index set ID. Must be indexed.
    "file_meta_id": "text",         // File metadata ID.
    "dataset_id": "text",           // Dataset ID.
    "dataset_version": "text",      // Dataset version.
    "uri": "text",                  // OSS URI of the file.
    "file_vector": {                // Vector field.
        "type": "float",            // Vector type: float.
        "dims": 1536,               // Vector dimensions. Customize as needed.
        "similarity": "DotProduct"  // Similarity algorithm: cosine or dot product.
    }
}

This topic uses Elasticsearch as an example. The following Python code shows how to create a semantic index table. For other vector databases, see their documentation.

Example code: Create a semantic index table in Elasticsearch

from elasticsearch import Elasticsearch

# 1. Connect to your Alibaba Cloud Elasticsearch instance.
# Note:
# (1) Use Python 3.9 or later: python3 -V
# (2) Install Elasticsearch client version 8.x: pip show elasticsearch
# (3) If using a VPC endpoint, ensure your client and Elasticsearch instance are in the same VPC.
#     Otherwise, use the public endpoint and add your client's public IP to Elasticsearch whitelist.
# Default username is elastic.
es_client = Elasticsearch(
    hosts=["http://es-cn-l4p***5z.elasticsearch.aliyuncs.com:9200"],
    basic_auth=("{userName}", "{password}"),
)

# 2. Define the index name and mapping. HNSW is used by default.
index_name = "dataset_embed_test"
index_mapping = {
    "settings": {
        "number_of_shards": 1,          # Number of shards.
        "number_of_replicas": 1         # Number of replicas.
    },
    "mappings": {
        "properties": {
            "index_set_id": {
                "type": "keyword"
            },
            "uri": {
                "type": "text"
            },
            "file_meta_id": {
                "type": "text"
            },
            "dataset_id": {
                "type": "text"
            },
            "dataset_version": {
                "type": "text"  
            },
            "file_vector": {
                "type": "dense_vector",  # Define file_vector as dense vector.
                "dims": 1536,  # Vector dimensions: 1536.
                "similarity": "dot_product"  # Similarity method: dot product.
            }
        }
    }
}

# 3. Create the index.
if not es_client.indices.exists(index=index_name):
    es_client.indices.create(index=index_name, body=index_mapping)
    print(f"Index {index_name} created successfully!")
else:
    print(f"Index {index_name} already exists. No action taken.")

# 4. View the index mapping (optional).
# indexes = es_client.indices.get(index=index_name)
# print(indexes)

Create dataset

  1. In your PAI workspace, click AI Asset Management > Datasets > Create Dataset to open the dataset configuration page.

    image

  2. Configure dataset parameters. Key parameters are listed below. Use defaults for others.

    1. Storage: Object Storage Service (OSS).

    2. Type: Premium.

    3. Content Type: Image.

    4. OSS Path: Select the OSS path where your dataset is stored. If no dataset exists, download the sample dataset retrieval_demo_data, upload it to OSS, and try multimodal data management.

    Note

    Importing files or folders only records the path in the system. It does not copy the data.

    image

    Then click OK to create the dataset.

Create connections

Create connection for intelligent labeling

  1. In your PAI workspace, click AI Asset Management > Connection > Model Service > Create Connection to open the connection creation page.

    image

  2. Select Alibaba Cloud Model Studio Service and configure the API key.

    image

  3. After successful creation, find your new connection in the list.

    image

Create connection for semantic indexing

  1. Skip this step if using Alibaba Cloud Model Studio’s semantic indexing service. In the left menu, click Model Gallery. Find and deploy the GME multimodal retrieval model. This creates an EAS service. Deployment takes about five minutes. When the status shows Running, deployment is complete.

    Important

    Stop and delete the service when no longer needed to avoid charges.

    image

  2. In your PAI workspace, click AI Asset Management > Connection > Model Service > Create Connection to open the connection creation page.

  3. Configure the model connection based on whether you use Alibaba Cloud Model Studio’s semantic indexing model or your own EAS-deployed model.

    Use Alibaba Cloud Model Studio’s semantic indexing model

    • Connection Type: Select General Multimodal Embedding Model Service.

    • Service Provider: Select Third-party service model.

    • Model Name: tongyi-embedding-vision-plus.

    • base_url: https://dashscope.aliyuncs.com/api/v1/services/embeddings/multimodal-embedding/multimodal-embedding

    • api_key: Get your API key from Get an API Key and enter it here.

    image

    Use your own EAS-deployed semantic indexing model

    • Connection Type: Select General Multimodal Embedding Model Service.

    • Service Provider: Select PAI-EAS Model Service.

    • EAS Service: Select the GME multimodal retrieval model you just deployed. If the service is not under your current account, choose a third-party model service.

    image

    image

  4. After successful creation, find your new connection in the list.

    image

Create vector database connection

  1. In the left menu, click AI Asset Management > Connection > Database > Create Connection to open the connection creation page.

    image

  2. Multimodal search supports Milvus, Lindorm, OpenSearch, Elasticsearch, and Hologres. This example uses Elasticsearch. Select Elasticsearch and configure uri, username, and password. For details, see Create a Database Connection.

    image

    Connection format examples for each vector database:

    Milvus

    uri: http://xxx.milvus.aliyuncs.com:19530 
    database: {your_data_base} 
    token: root:{password}

    OpenSearch

    uri: http://xxxx.ha.aliyuncs.com
    username: {username} 
    password: {password}

    Hologres

    host: xxxx.hologres.aliyuncs.com
    database: {your_data_base} 
    port: {port}
    access_key_id={password}

    Elasticsearch

    uri: http://xxxx.elasticsearch.aliyuncs.com:9200
    username: {username} 
    password: {password}

    Lindorm

    uri: xxxx.lindorm.aliyuncs.com:{port}
    username: {username} 
    password: root:{password}
  3. After successful creation, find your new connection in the list.

    image

Create intelligent labeling job

Create intelligent label definition

In the left menu, click AI Asset Management > Datasets > Intelligent Tag Definition > Create Intelligent Tag Definition to open the label configuration page. Example configuration:

  • Guide Prompt: As an experienced driver with many years on highways and city roads, you know how to handle common driving scenarios.

  • Tag Definition:

    Autonomous driving label example

    {
        "Reflective tape": "Usually yellow or black-and-yellow striped. Attached to permanent obstacles like corners to warn drivers. Strip-shaped—not traffic cones, wheel locks, or water barrels!",
        "Wheel lock": "Also called a parking lock. Prevents unauthorized parking when raised. Always specify if raised or lowered. Raised if there is a frame.",
        "Lit construction vehicle": "Has two arrow-shaped lights, lit. Not present otherwise.",
        "Overturned vehicle": "Vehicle lying on its side.",
        "Fallen water barrel": "A plastic barrier used to divide roads or block traffic. Usually red and wall-shaped. Common on highways, city roads, and overpasses. Larger than cones and flat. Specify if fallen.",
        "Fallen traffic cone": "Also called a traffic cone or snow cone. Cone-shaped temporary road marker. Not rod- or flat-shaped. To check if fallen, see if the base touches the ground.",
        "Charging parking spot": "Near a wall with visible charging equipment or labeled 'new energy vehicle'. Found only in parking lots (indoor or outdoor). Wheel locks are unrelated.",
        "Speed bump": "Usually yellow or black-and-yellow. Narrow ridge across the road to slow vehicles. Never in parking spots.",
        "Deceleration lane markings": "Fishbone-style dashed lines on both sides of the lane, inside solid lines.",
        "Ramp": "Large curved highway segment. Usually on the right side of main highways. Only confirm at toll plazas.",
        "Ground shadow": "Clear shadows on the ground.",
        "Cloudy": "Only if sky is visible and clearly cloudy.",
        "Glare from headlights": "Headlights appear as streaks instead of points—common at night or in rain.",
        "Left-turn, right-turn, U-turn arrows": "White (or sometimes yellow) arrows painted on lanes—not green-white highway signs. Only count clear center-lane arrows. Right-turn: clockwise from base to tip. Left-turn: counterclockwise. U-turn: U-shaped.",
        "Crosswalk": "White parallel stripes on roads or parking lots, for pedestrians. Never on highways, ramps, or tunnels.",
        "Overexposure": "Camera overexposed due to direct sunlight—daytime only.",
        "Motor vehicle": "Any other motorized vehicle in view.",
        "Lane merge/diverge": "Where multiple lanes become one—or one splits into many.",
        "Intersection": "Road intersection without lane markings inside it.",
        "No-parking sign": "Sign hanging or standing on ground with 'no parking' text or circle-with-P-and-slash symbol.",
        "Lane markings": "Road lane lines—especially blurry ones.",
        "Stones or tires on road": "Obstacles blocking traffic.",
        "Tunnel": "Watch for tunnel entrances and exits.",
        "Wet road in rain": "Road surface wet and slippery in rain.",
        "Non-motorized vehicle": "Bicycles, e-bikes, wheelchairs, unicycles, shopping carts—parked or moving."
      }

Create offline intelligent labeling job

  1. Click Custom Dataset. Click the dataset name to open its details page. Then click Dataset jobs.

    image

  2. On the jobs page, click Create job > Smart tag and configure job parameters.

    image

    • Dataset Version: Select the version to label, such as v1.

    • Labeling Model Connection: Select your Alibaba Cloud Model Studio model connection.

    • Smart Labeling Model: Supports Qwen-VL-Max and Qwen-VL-Plus.

    • Max Concurrency: Set based on your EAS model service specs. Suggested maximum per GPU: 5.

    • Intelligent Tag Definition: Select the definition you just created.

    • Labeling Mode: Choose Increment or Full.

  3. After successful creation, find your labeling job in the list. To monitor or stop it, click the link on the right side of the list.

    Note

    The first run builds metadata. Wait patiently—it may take time.

3.5 Create a semantic indexing job

  1. Click the dataset name to open its details page. In the Index Configuration section, click Edit.

    image

  2. Configure the index library.

    • Index Model Connection: Select the index model connection created in 3.3.2.

    • Index Database Connection: Select the index database connection created in 3.3.3.

    • Index Database Table: Enter the index table name created in Create Vector Index Table (Optional): dataset_embed_test.

    Click Save > Refresh Now. This starts a semantic indexing job for the selected dataset version. It updates semantic indexes for all files in that version. To view job details, click Semantic Indexing Task in the top-right corner of the dataset details page.

    Note

    The first run builds metadata. Wait patiently—it may take time.

    If you cancel instead of clicking Refresh Now, create the job manually:

    On the dataset details page, click Dataset jobs to go to the jobs page.

    image

    Click Create job > Semantic Indexing. Configure the dataset version. Set the maximum number of concurrent jobs based on your EAS model service specifications, with a recommended maximum of 5 per GPU. Then click OK to create the job.

    image

Preview data

  1. After intelligent labeling and semantic indexing jobs finish, click View Data on the dataset details page to preview images in that version.

    image.png

  2. In the View Data page, preview images. Switch between Gallery View and List View.

    image.png

    image.png

  3. Click an image to view it full-size and see its labels.

    image.png

  4. Click the checkbox in the top-left corner of a thumbnail to select it. Hold Shift and click checkboxes to select multiple rows.

    image.png

Search data (combined search)

  1. In the left toolbar of the View Data page, use Index Retrieval and Search by Tag. Press Enter or click Search.

  2. Index Retrieval: Text keyword search. Matches keywords against image index vectors. In Advanced Settings, set top-k and score threshold.

    image

  3. Index Retrieval: Search by image. Upload a local image or select one from OSS. Matches against image index vectors. In Advanced Settings, set top-k and score threshold.

    image

  4. Search by Tag: Matches keywords against image labels. Use logic: Include Any of Following (NOT), Include All Following (AND), or Exclude Any of Following (NOT).

    image

  5. Metadata: Search by filename, storage path, or last modified time.

    image

    All search conditions use AND logic.

3.8 Advanced data search (DSL)

Advanced search supports DSL search. DSL is a domain-specific language for complex queries. It supports grouping, Boolean logic (AND/OR/NOT), range comparisons (>, >=, <, <=), attribute existence (HAS/NOT HAS), token matching (:), and exact matching (=). For syntax details, see List Dataset File Metadata.

image

3.9 Export search results

Note

This step exports search results as a file list index for later model training or data analytics.

After searching, click Export Results at the bottom of the page. Two export options are available:

image

Export as file

  1. Click Export as file. On the config page, set export content and target OSS directory. Click OK.

    image.png

  2. To track progress, click AI Asset Management > Job > Dataset jobs.

  3. Use the exported result. After export, mount the result file and original dataset to your training environment (such as DLC or DSW instances). Then use code to read the index file and load target files for model training or analysis.

Export to logical dataset version

Export a search result from an advanced dataset to a version of a logical dataset. Later, use the dataset SDK to access that version.

  1. Click Export to logical dataset version. Select a target logical dataset and click Confirm.

    image.png

    If a Boolean dataset is not available, see the following:

    Create a logical dataset

    Create a logical dataset. In the left menu, click AI Asset Management > Dataset > Create Dataset. Configure key parameters below. Adjust others as needed:

    • Dataset Type: Select Logical.

    • Metadata OSS path: Select an exported OSS path.

    • Import method: Select Import later.

    Click OK to create the dataset.

  2. Use the logical dataset. After the import job finishes, the target logical dataset contains the exported metadata. Load and use it with the SDK. See the dataset details page for SDK usage instructions.

    image

    image

    Install the SDK with:

    pip install https://pai-sdk.oss-cn-shanghai.aliyuncs.com/dataset/pai_dataset_sdk-1.0.0-py3-none-any.whl

Custom semantic indexing model (optional)

Fine-tune a custom semantic retrieval model. After deploying it on EAS, create a model connection using the steps in 3.3.2. Then use it in multimodal data management.

Prepare data

This topic provides a sample dataset retrieval_demo_data. Click to download.

Data format requirements

Each data sample is one JSON line in dataset.jsonl. Include these fields:

  • image_id: Unique identifier for the image (e.g., filename or ID).

  • tags: List of text labels for the image. Must be a string array.

Example format:

{  
    "image_id": "c909f3df-ac4074ed",  
    "tags": ["silver sedan", "white SUV", "city street", "snow", "night"], 
}

File organization

Put all image files in an images folder. Place dataset.jsonl in the same directory as the images folder.

Directory example:

├── images
│   ├── image1.jpg
│   ├── image2.jpg
│   └── image3.jpg
└── dataset.jsonl  
Important

Use the exact filename dataset.jsonl. Do not rename the images folder.

Train model

  1. In Model Gallery, find retrieval-related models. Choose one based on size and compute resource needs.

    image

    VRAM for fine-tuning (bs=4)

    Fine-tuning speed (4×A800, samples/sec)

    Deploy VRAM

    Vector dimensions

    GME-2B

    14 GB

    16.331

    5G

    1536

    GME-7B

    35 GB

    13.868

    16 GB

    3584

  2. As an example, train the GME-2B model. Click Train. Enter the data path (default is the sample data path). Enter the model output path. Then start training.

    image

    image

Deploy model

After training, click Deploy in the training job to deploy the fine-tuned model.

Click the Deploy button in the Model Gallery tab to deploy the original GME model.

image

After deployment, get the EAS Endpoint and Token. image

Call model service

Input parameters

Name

Type

Required

Example

Description

model

String

Yes

pai-multimodal-embedding-v1

Model type. Supports custom models and base model version updates.

contents.input

list(dict) or list(str)

No

input = [{'text': text}]

input=[xxx,xxx,xxx,...]

input = [{'text': text},{'image', f"data:image/{image_format};base64,{image64}"}]

Content to embed.

Supports text and image only.

Output parameters

Name

Type

Example

Description

status_code

Integer

200

HTTP status code.

200: Success

204: Partial success

400: Failure

message

list(str)

['Invalid input data: must be a list of strings or dict']

Error message

output

dict

See next table

Embedding result

DashScope returns {'output', {'embeddings': list(dict), 'usage': xxx, 'request_id':xxx}} (ignore 'usage' and 'request_id').

Each element in embeddings includes these keys (errors go to message):

Name

Type

Example

Description

index

Data ID

0

HTTP status code.

200, 400, 500, etc.

embedding

List[Float]

[0.0391846,0.0518188,.....,-0.0329895,

0.0251465]

1536

Embedded vector

type

String

"Internal execute error."

Error message

Call example code

import base64
import json
import os
import sys
from io import BytesIO

import requests
from PIL import Image, PngImagePlugin
import numpy as np

ENCODING = 'utf-8'

hosts = 'EAS URL'
head = {
    'Authorization': 'EAS TOKEN'
}

def encode_image_to_base64(image_path):
    """
    Encode an image file to Base64 string
    """
    with open(image_path, "rb") as image_file:
        # Read binary data
        image_data = image_file.read()
        # Encode to Base64 string
        base64_encoded = base64.b64encode(image_data).decode('utf-8')
    
    return base64_encoded

if __name__=='__main__':
    iamege_path = "path_to_your_image"
    text = 'prompt'

    image_format = 'jpg'
    input_data = []
    
    image64 = encode_image_to_base64(image_path)
    input_data.append({'image': f"data:image/{image_format};base64,{image64}"})

    input_data.append({'text': text})

    datas = json.dumps({
        'input': {
            'contents': input_data
        }
    })
    r = requests.post(hosts, data=datas, headers=head)
    data = json.loads(r.content.decode('utf-8'))

    if data['status_code']==200:
        if len(data['message'])!=0:
            print('Part failed for the following reasons.')
            print(data['message'])

        for result_item in data['output']['embeddings']:
            print('The following succeed.')
            print('index', result_item['index'])
            print('type', result_item['type'])
            print('embedding', len(result_item['embedding']))
    else:
        print('Processed fail')
        print(data['message'])

Output example:

{
    "status_code": 200,
    "message": "",
    "output": {
        "embeddings": [
            {
                "index": 0,
                "embedding": [
                    -0.020782470703125,
                    -0.01399993896484375,
                    -0.0229949951171875,
                    ...
                ],
                "type": "text"
            }
        ]
    }
}

Evaluate model

Results on our sample data (using the evaluation file):

Precision of original model

Precision after 1 epoch fine-tuning

gme2b

Precision@1 0.3542

Precision@5 0.5280

Precision@10 0.5923

Precision@50 0.5800

Precision@100 0.5792

Precision@1 0.4271

Precision@5 0.6480

Precision@10 0.7308

Precision@50 0.7331

Precision@100 0.7404

gme7b

Precision@1 0.3958

Precision@5 0.5920

Precision@10 0.6667

Precision@50 0.6517

Precision@100 0.6415

Precision@1 0.4375

Precision@5 0.6680

Precision@10 0.7590

Precision@50 0.7683

Precision@100 0.7723

Model evaluation script example

import base64
import json
import os
import requests
import numpy as np
import torch
from tqdm import tqdm
from collections import defaultdict


# Constants
ENCODING = 'utf-8'
HOST_URL = 'http://1xxxxxxxx4.cn-xxx.pai-eas.aliyuncs.com/api/xxx'
AUTH_HEADER = {'Authorization': 'ZTg*********Mw=='}

def encode_image_to_base64(image_path):
    """Encode an image file to Base64 string"""
    with open(image_path, "rb") as image_file:
        image_data = image_file.read()
        base64_encoded = base64.b64encode(image_data).decode(ENCODING)
    return base64_encoded


def load_image_features(feature_file):
    print("Begin to load image features...")
    image_ids, image_feats = [], []
    with open(feature_file, "r") as fin:
        for line in tqdm(fin):
            obj = json.loads(line.strip())
            image_ids.append(obj['image_id'])
            image_feats.append(obj['feature'])
    image_feats_array = np.array(image_feats, dtype=np.float32)
    print("Finished loading image features.")
    return image_ids, image_feats_array


def precision_at_k(predictions, gts, k):
    """
    Calculate precision at K.
    
    :param predictions: [(image_id, similarity_score), ...]
    :param gts: set of ground truth image_ids
    :param k: int, top K results
    :return: float, precision
    """
    if len(predictions) > k:
        predictions = predictions[:k]
    
    predicted_ids = {p[0] for p in predictions}
    relevant_and_retrieved = predicted_ids.intersection(gts)
    precision = len(relevant_and_retrieved) / k
    return precision


def main():
    root_dir = '/mnt/data/retrieval/data/'
    data_dir = os.path.join(root_dir, 'images')
    tag_file = os.path.join(root_dir, 'meta/test.jsonl')
    model_type = 'finetune_gme7b_final'
    save_feature_file = os.path.join(root_dir, 'features', f'features_{model_type}_eas.jsonl')
    final_result_log = os.path.join(root_dir, 'results', f'retrieval_{model_type}_log_eas.txt')
    final_result = os.path.join(root_dir, 'results', f'retrieval_{model_type}_log_eas.jsonl')

    os.makedirs(os.path.join(root_dir, 'features'), exist_ok=True)
    os.makedirs(os.path.join(root_dir, 'results'), exist_ok=True)

    tag_dict = defaultdict(list)
    gt_image_ids = []
    with open(tag_file, 'r') as f:
        lines = f.readlines()
        for line in lines:
            data = json.loads(line.strip())
            gt_image_ids.append(data['image_id'])
            img_id = data['image_id'].split('.')[0]
            for caption in data['tags']:
                tag_dict[caption.strip()].append(img_id)

    print('Total tags:', len(tag_dict.keys()))

    prefix = ''
    texts = [prefix + text for text in tag_dict.keys()]
    images = [os.path.join(data_dir, i+'.jpg') for i in gt_image_ids]
    print('Total images:', len(images))

    encode_images = True
    if encode_images:
        with open(save_feature_file, "w") as fout:
            for image_path in tqdm(images):
                image_id = os.path.basename(image_path).split('.')[0]
                image64 = encode_image_to_base64(image_path)
                input_data = [{'image': f"data:image/jpg;base64,{image64}"}]

                datas = json.dumps({'input': {'contents': input_data}})
                r = requests.post(HOST_URL, data=datas, headers=AUTH_HEADER)

                data = json.loads(r.content.decode(ENCODING))
                if data['status_code'] == 200:
                    if len(data['message']) != 0:
                        print('Part failed:', data['message'])
                    for result_item in data['output']['embeddings']:
                        fout.write(json.dumps({"image_id": image_id, "feature": result_item['embedding']}) + "\n")
                else:
                    print('Processed fail:', data['message'])

    image_ids, image_feats_array = load_image_features(save_feature_file)

    top_k_list = [1, 5, 10, 50, 100]
    top_k_list_precision  = [[] for _ in top_k_list]

    with open(final_result, 'w') as f_w, open(final_result_log, 'w') as f:
        for tag in tqdm(texts):
            datas = json.dumps({'input': {'contents': [{'text': tag}]}})
            r = requests.post(HOST_URL, data=datas, headers=AUTH_HEADER)
            data = json.loads(r.content.decode(ENCODING))

            if data['status_code'] == 200:
                if len(data['message']) != 0:
                    print('Part failed:', data['message'])

                for result_item in data['output']['embeddings']:
                    text_feat_tensor = result_item['embedding']
                    idx = 0
                    score_tuples = []
                    batch_size = 128
                    while idx < len(image_ids):
                        img_feats_tensor = torch.from_numpy(image_feats_array[idx:min(idx + batch_size, len(image_ids))]).cuda()
                        batch_scores = torch.from_numpy(np.array(text_feat_tensor)).cuda().float() @ img_feats_tensor.t()
                        for image_id, score in zip(image_ids[idx:min(idx + batch_size, len(image_ids))], batch_scores.squeeze(0).tolist()):
                            score_tuples.append((image_id, score))
                        idx += batch_size
                    
                    predictions = sorted(score_tuples, key=lambda x: x[1], reverse=True)
            else:
                print('Processed fail:', data['message'])

            gts = tag_dict[tag.replace(prefix, '')]

            # Write result
            predictions_tmp = predictions[:10]
            result_dict = {'tag': tag, 'gts': gts, 'preds': [pred[0] for pred in predictions_tmp]}
            f_w.write(json.dumps(result_dict, ensure_ascii=False, indent=4) + '\n')

            for top_k_id, k in enumerate(top_k_list):
                need_exit = False

                if k > len(gts):
                    k = len(gts)
                    need_exit = True

                prec = precision_at_k(predictions, gts, k)

                f.write(f'Tag {tag}, Len(GT) {len(gts)}, Precision@{k} {prec:.4f} \n')
                f.flush()

                if need_exit:
                    break
                else:
                    top_k_list_precision[top_k_id].append(prec)
                    
    for idx, k in enumerate(top_k_list):
        print(f'Precision@{k} {np.mean(top_k_list_precision[idx]):.4f}')


if __name__ == "__main__":
    main()

Use model

After deploying your fine-tuned embedding model on EAS, create a model connection using the steps in 3.3.2. Then use it in multimodal data management.