This topic describes how to create an index on a vector field in Milvus to accelerate similarity searches. Vector indexes use advanced index schemas and algorithms, such as Inverted File (IVF) and Hierarchical Navigable Small World (HNSW). These indexes compress the vector space and accelerate searches for the most similar data points in large datasets. This process significantly improves the recall rate and response speed for applications such as image recognition, voice retrieval, and recommendation systems.
Background information
Milvus supports multiple index types for efficient similarity retrieval. It provides three metrics to calculate the distance between vectors: Cosine Similarity (COSINE), Euclidean Distance (L2), and Inner Product (IP). You can create indexes on frequently queried vector and scalar fields to optimize retrieval performance.
Prerequisites
You have installed the PyMilvus library on your local client and updated it to the latest version.
If you have not installed the PyMilvus library or need to update it, run the following command.
pip install --upgrade pymilvusYou have created a Milvus instance. For more information, see Create a Milvus instance.
Preparations
Before managing indexes, you need to create a collection first. There are two approaches to creating a collection:
Option 1: Create a collection with indexes
If you want to create and load an index while creating the collection, you need to:
Declare the dimension of the vector field (
dimension)Provide index-related parameters (including
metric_typeand index configuration)
Option 2: Create a collection without an index
The following code snippet uses this approach, so the created collection will not contain an index and will not be automatically loaded into memory:
from pymilvus import MilvusClient, DataType
client = MilvusClient(
uri="http://c-xxxx.milvus.aliyuncs.com:19530", # The public endpoint of the Milvus instance.
token="<yourUsername>:<yourPassword>", # The username and password for the Milvus instance.
db_name="default" # The name of the database to connect to. This example uses the default database.
)
schema = MilvusClient.create_schema(
auto_id=False,
enable_dynamic_field=True,
)
schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="vec", datatype=DataType.FLOAT_VECTOR, dim=5)
client.create_collection(
collection_name="<yourCollectionName>",
schema=schema,
)
Create an index
To create an index, call the create_index function and pass in the defined index_params.
index_params = MilvusClient.prepare_index_params()
# Define the index parameters.
index_params.add_index(
field_name="vec", # Specify the vector field name, such as "vec".
metric_type="L2", # Set the metric type, such as L2.
index_type="IVF_PQ", # Set the index type, such as IVF_PQ.
index_name="vector_index" # Set the index name as needed.
)
# Create the index.
client.create_index(
collection_name="<yourCollectionName>",
index_params=index_params
)
View index
res = client.describe_index(
collection_name="<yourCollectionName>",
index_name="<yourIndexName>"
)
print(res)
Delete an index
client.drop_index(
collection_name="<yourCollectionName>",
index_name="<yourIndexName>"
)