Build Custom Embedding Models to Optimize Vector Search - OpenSearch

Model customization lets you enhance the performance of text embedding models using your business data. You can also train custom embedding dimensionality reduction models using embedding data that you provide. In a typical business scenario, you first vectorize text or queries using an embedding model. Then, you use an embedding dimensionality reduction model to reduce the embedding dimensions.

Background information

In intelligent search and retrieval-augmented generation (RAG) scenarios, the performance of the embedding model is critical to business outcomes. However, the effectiveness of general-purpose models in specific domains is often limited by the coverage of their training data. To improve retrieval performance, you can fine-tune a general-purpose model with your business data. Meanwhile, the dimensions of embedding models are increasing. This leads to a significant increase in storage and computing costs for large-scale data vectorization. For this reason, the AI Search Open Platform provides an embedding dimensionality reduction service. This service uses custom models to convert high-dimensional vectors into lower-dimensional ones. This saves costs without significantly reducing vectorization performance.

Customize the embedding dimensionality reduction service

In the AI Search Open Platform console, choose Model Service > Model Customization, and click Create.

If you use a RAM account to create models, modify configurations, view model details, or perform other operations, you must grant the RAM account the required model service operation permissions in advance.

On the Model Customization page, you can configure the following parameters.

Parameter	Description
Model Name	The name of the model to be used when you invoke the embedding dimensionality reduction service.
Model Type	The type of the model to train. Select Vector Dimensionality Reduction (embedding-dim-reduction).
Base Model	The base model used for training, such as ops-embedding-dim-reduction-001.
Data Source	maxCompute or oss

MaxCompute

Parameter	Description
Data Source	MaxCompute
Region	The region where the MaxCompute project is located.
Project Name	The name of the project in MaxCompute.
AccessKey ID	The AccessKey ID of the Alibaba Cloud account or RAM user that has the permissions to read from and write to MaxCompute. You can go to the AccessKey Management page to obtain an AccessKey ID.
Secret	The AccessKey secret that corresponds to the AccessKey ID.
Table Name	The name of the table that stores the training data in MaxCompute.
Table Partition	The partition information of the table.
Training Fields	You must first grant the Grant GetTableFields (to retrieve the MaxCompute table schema) permission to the RAM user that is used to read and write the MaxCompute table schema. This lets you select the primary key field and embedding fields of the String type. The dimension of the embedding fields must be from 1024 to 4096.

OSS

Parameter	Description
Data Source	OSS
Region	The region where the OSS bucket is located.
OSS Bucket	The name of the OSS bucket.
Doc Data	The data in OSS used for training.
OSS Endpoint	Generated after you complete the preceding configurations.

Click OK. In the dialog box that appears, click Create and Train. The model is then pre-processed, and training begins after pre-processing is complete.

If you click Confirm Creation, the model is added to the model customization list with a status of To Be Trained and can be trained later.

In the model list, models with the Active status have completed training and can be invoked. Click Experience to test the fine-tuned embedding model.

Customize the text embedding service

In the AI Search Open Platform console, choose Model Service > Model Customization and click Create.

If you use a RAM user to perform operations such as creating a model, modifying configurations, or viewing model details, you must grant the RAM user the required permissions for the model service in advance.

On the Model Customization page, configure the following parameters.

Parameter	Description
Model Name	Offers customization.
Model Type	The type of the model to train. Select Text Embedding (text-embedding).
Base Model	The foundation model used for training, such as ops-text-embedding-001.
Dimensionality Reduction	If you enable this option, embedding dimensionality reduction training is performed at the same time.
Base Model for Reduction	The model used for dimensionality reduction. This parameter is available only when you enable embedding dimensionality reduction.
Data Source	maxCompute or oss

MaxCompute

Parameter	Description
Data Source	MaxCompute.
Region	The region where the MaxCompute project is located.
Project Name	The name of the project in MaxCompute.
AccessKey ID	The AccessKey ID of the Alibaba Cloud account or RAM user that has the permissions to read from and write to MaxCompute. You can go to the AccessKey Management page to obtain an AccessKey ID.
Secret	The AccessKey secret that corresponds to the AccessKey ID.
Table Name	The name of the table that stores the training data in MaxCompute.
Table Partition	The partition information of the table.
Training Fields	You must grant GetTableFields permission (to retrieve the MaxCompute table schema) to the RAM user that is used to read and write MaxCompute table schemas. This lets you select primary key fields and String-type text data.
query-doc pair	For more information, see the sample data in the console.

OSS

Parameter	Description
Data Source	OSS
Region	The region where the OSS bucket is located.
OSS bucket	The name of the OSS bucket.
Doc data	The data in OSS used for training.
query-doc pair	For more information, see the sample data in the console.
OSS Endpoint	Generated by the system after you complete the preceding configurations.

Click OK. In the dialog box that appears, click Create and Train. The model starts pre-processing, and training begins after pre-processing is complete.

After you click Confirm Creation, the model appears in the model customization list with a To Be Trained status. You can start the training at a later time.

In the model list, a model with an Active status has completed training and can be deployed.

Service invocation

When the model service meets your requirements, you can call the service using an API. For more information, see embedding Dimensionality Reduction Service API and Custom Deployment Service API.