This topic describes how to vectorize multi-modal data by using open source multi-modal vectorization models of ModelScope and import the vectors to DashVector for vector search.
ModelScope seeks to build a next-generation open source model-as-a-service (MaaS) platform and provide pan-AI developers with flexible, easy-to-use, and cost-efficient one-stop models.
ModelScope aims to reduce repeated R&D costs and provide an environment-friendlier and opener AI development environment and model services by bringing together industry-leading pre-trained models. This way, ModelScope can contribute to the cause of the digital economy. ModelScope provides various types of high-quality models in an open source manner. Developers can download and experience the models from ModelScope free of charge.
On ModelScope, you can:
Use and download pre-trained models free of charge.
Perform command line-based model prediction to validate model effects simply and quickly.
Fine-tune models with your own data for customization.
Engage in theoretical and practical training to effectively improve your R&D abilities.
Share your ideas with the entire community.
Prerequisites
DashVector:
A cluster is created. For more information, see Create a cluster.
An API key is obtained. For more information, see Manage API keys.
The SDK of the latest version is installed. For more information, see Install DashVector SDK.
ModelScope:
The SDK of the latest version is installed by running the
pip install -U modelscopecommand.
CLIP model
Overview
This project uses the Chinese version of the CLIP model, which is trained by using large-scale data from the Chinese language (up to 0.2 billion text-graphic pairs). This model can be used in graphic searches and characterization extraction of images and text. This model is applicable to scenarios such as search and recommendation.
Model ID | Vector dimensions | Distance metric | Vector data type | Remarks |
damo/multi-modal_clip-vit-base-patch16_zh | 512 | Cosine | Float32 |
|
damo/multi-modal_clip-vit-large-patch14_zh | 768 | Cosine | Float32 |
|
damo/multi-modal_clip-vit-huge-patch14_zh | 1,024 | Cosine | Float32 |
|
damo/multi-modal_clip-vit-large-patch14_336_zh | 768 | Cosine | Float32 |
|
For more information about the CLIP model, visit the ModelScope official website.
Example
You must perform the following operations for the code to run properly:
Replace {your-dashvector-api-key} in the sample code with your DashVector API key.
Replace {your-dashvector-cluster-endpoint} in the sample code with the endpoint of your DashVector cluster.
Replace {model_id} in the sample code with one of the values in the Model ID column of the preceding table.
Replace {model_dim} in sample code with one of the values in the Vector dimensions column of the preceding table.
from modelscope.utils.constant import Tasks
from modelscope.pipelines import pipeline
from modelscope.preprocessors.image import load_image
from typing import List
from dashvector import Client
pipeline = pipeline(task=Tasks.multi_modal_embedding, model='{model_id}')
def generate_text_embeddings(texts: List[str]):
inputs = {'text': texts}
result = pipeline.forward(input=inputs)
return result['text_embedding'].numpy()
def generate_img_embeddings(img: str):
input_img = load_image(img)
inputs = {'img': input_img}
result = pipeline.forward(input=inputs)
return result['img_embedding'].numpy()[0]
# Create a DashVector client.
client = Client(
api_key='{your-dashvector-api-key}',
endpoint='{your-dashvector-cluster-endpoint}'
)
# Create a DashVector collection.
rsp = client.create('CLIP-embedding', dimension={model_dim})
assert rsp
collection = client.get('CLIP-embedding')
assert collection
# Convert text into a vector and store it in DashVector.
collection.insert(
[
('ID1', generate_text_embeddings(['DashVector developed by Alibaba Cloud is one of the most high-performance and cost-effective vector database services.'])[0]),
('ID2', generate_img_embeddings('https://clip-cn-beijing.oss-cn-beijing.aliyuncs.com/pokemon.jpeg'))
]
)
# Perform a vector search.
docs = collection.query(
generate_text_embeddings(['The best vector database'])[0]
)
print(docs)