Vectorize multi-modal data by using open source multi-modal embedding models of ModelScope - DashVector

This topic describes how to vectorize multi-modal data by using open source multi-modal vectorization models of ModelScope and import the vectors to DashVector for vector search.

ModelScope seeks to build a next-generation open source model-as-a-service (MaaS) platform and provide pan-AI developers with flexible, easy-to-use, and cost-efficient one-stop models.

ModelScope aims to reduce repeated R&D costs and provide an environment-friendlier and opener AI development environment and model services by bringing together industry-leading pre-trained models. This way, ModelScope can contribute to the cause of the digital economy. ModelScope provides various types of high-quality models in an open source manner. Developers can download and experience the models from ModelScope free of charge.

On ModelScope, you can:

Use and download pre-trained models free of charge.
Perform command line-based model prediction to validate model effects simply and quickly.
Fine-tune models with your own data for customization.
Engage in theoretical and practical training to effectively improve your R&D abilities.
Share your ideas with the entire community.

Prerequisites

DashVector:
- A cluster is created. For more information, see Create a cluster.
- An API key is obtained. For more information, see Manage API keys.
- The SDK of the latest version is installed. For more information, see Install DashVector SDK.
ModelScope:
- The SDK of the latest version is installed by running the pip install -U modelscope command.

CLIP model

Overview

This project uses the Chinese version of the CLIP model, which is trained by using large-scale data from the Chinese language (up to 0.2 billion text-graphic pairs). This model can be used in graphic searches and characterization extraction of images and text. This model is applicable to scenarios such as search and recommendation.

Model ID	Vector dimensions	Distance metric	Vector data type	Remarks
damo/multi-modal_clip-vit-base-patch16_zh	512	Cosine	Float32	Chinese-general fields-base-224 resolution Maximum text length: 512
damo/multi-modal_clip-vit-large-patch14_zh	768	Cosine	Float32	Chinese-general fields-large-224 resolution Maximum text length: 512
damo/multi-modal_clip-vit-huge-patch14_zh	1,024	Cosine	Float32	Chinese-general fields-huge-224 resolution Maximum text length: 512
damo/multi-modal_clip-vit-large-patch14_336_zh	768	Cosine	Float32	Chinese-general fields-large-336 resolution Maximum text length: 512

Note

For more information about the CLIP model, visit the ModelScope official website.

Example

Note

You must perform the following operations for the code to run properly:

Replace {your-dashvector-api-key} in the sample code with your DashVector API key.
Replace {your-dashvector-cluster-endpoint} in the sample code with the endpoint of your DashVector cluster.
Replace {model_id} in the sample code with one of the values in the Model ID column of the preceding table.
Replace {model_dim} in sample code with one of the values in the Vector dimensions column of the preceding table.

Python

from modelscope.utils.constant import Tasks
from modelscope.pipelines import pipeline
from modelscope.preprocessors.image import load_image
from typing import List
from dashvector import Client


pipeline = pipeline(task=Tasks.multi_modal_embedding, model='{model_id}')


def generate_text_embeddings(texts: List[str]):
    inputs = {'text': texts}
    result = pipeline.forward(input=inputs)
    return result['text_embedding'].numpy()


def generate_img_embeddings(img: str):
    input_img = load_image(img)
    inputs = {'img': input_img}
    result = pipeline.forward(input=inputs)
    return result['img_embedding'].numpy()[0]


# Create a DashVector client.
client = Client(
    api_key='{your-dashvector-api-key}',
    endpoint='{your-dashvector-cluster-endpoint}'
)

# Create a DashVector collection.
rsp = client.create('CLIP-embedding', dimension={model_dim})
assert rsp
collection = client.get('CLIP-embedding')
assert collection

# Convert text into a vector and store it in DashVector.
collection.insert(
    [
        ('ID1', generate_text_embeddings(['DashVector developed by Alibaba Cloud is one of the most high-performance and cost-effective vector database services.'])[0]),
        ('ID2', generate_img_embeddings('https://clip-cn-beijing.oss-cn-beijing.aliyuncs.com/pokemon.jpeg'))
    ]
)

# Perform a vector search.
docs = collection.query(
    generate_text_embeddings(['The best vector database'])[0]
)
print(docs)