Vector search is a key technology for efficient similarity search. This topic shows you how to quickly perform a vector search by using detailed examples.
Prerequisites
You have installed the PyMilvus library on your local client and updated it to the latest version.
If you have not installed the PyMilvus library or need to update it, run the following command.
pip install --upgrade pymilvusYou have created a Milvus instance. For more information, see Create a Milvus instance.
Considerations
Vector Retrieval Service for Milvus (Milvus) supports connections over internal networks and the Internet. Before you connect to a Milvus instance, ensure your client has the required network access permissions. For more information, see Configure network access.
Procedure
Step 1: Connect to a Milvus instance
Use the following code to connect to the Milvus instance.
from pymilvus import MilvusClient
# Create a Milvus client.
client = MilvusClient(
uri="http://c-xxxx.milvus.aliyuncs.com:19530", # The public endpoint of the Milvus instance.
token="<yourUsername>:<yourPassword>", # The username and password used to authenticate with the Milvus instance.
db_name="default" # The target database name. Defaults to "default".
)
Step 2: Create a collection
Use the following code to create a collection. For more information about custom parameters, see Manage collections.
client.create_collection(
collection_name="demo", # The name of the collection.
dimension=5 # The vector dimension.
)This code sets the collection name and vector dimension. It also applies the following default configurations:
The primary key field is named
idand the vector field is namedvector.The
metric_typeproperty is set toCOSINE.The primary key field
idis a non-auto-incrementing integer.An additional
$metafield stores data for fields not defined in the schema as key-value pairs.
Step 3: Insert data
After the collection is created, the system automatically loads the collection and its index into memory. Use the following code to insert test data into the collection.
Insert a small amount of data
This code inserts 10 predefined entities. Each entity has a fixed vector and a color label.
data=[{'id': 0, 'vector': [-0.493313706583155, -0.172001225836391, 0.16825615330139554, -0.0198911518739604, -0.9756816265213708], 'color': 'green_5760'}, {'id': 1, 'vector': [0.6695699219225086, 0.49952523907354496, -0.49870548178008534, 0.8824655547230731, -0.7182693622931615], 'color': 'blue_2330'}, {'id': 2, 'vector': [-0.6057771959702387, 0.9141473782193543, 0.32053983678483466, -0.32126010092015655, 0.725222856037071], 'color': 'grey_9673'}, {'id': 3, 'vector': [0.14082089434165868, 0.9924029949938447, 0.7943279666144052, -0.7898608705081103, -0.9941425813199956], 'color': 'white_2829'}, {'id': 4, 'vector': [-0.46180540826224026, 0.33216876051895783, 0.5786699695956004, 0.8891120357625131, 0.04872530176990697], 'color': 'pink_9061'}, {'id': 5, 'vector': [-0.6097452740606673, 0.35648319550551144, -0.5699789153006387, 0.15085357921088316, -0.8817226997144627], 'color': 'pink_8525'}, {'id': 6, 'vector': [0.7843522543512762, -0.7663837586858071, -0.8681839054724569, 0.6880645348647785, -0.5151293183261791], 'color': 'green_5016'}, {'id': 7, 'vector': [-0.9967116931989293, 0.5741923070732655, -0.019126124261334976, -0.34163875885482753, -0.8189843931354175], 'color': 'brown_7434'}, {'id': 8, 'vector': [0.7347243385915765, -0.7358853080124825, -0.23737428377511716, 0.06980552357261627, -0.30613964550461437], 'color': 'blue_5059'}, {'id': 9, 'vector': [-0.21187155428455862, -0.3288541717216129, -0.32564136453418824, -0.14054963599686743, 0.5491320339870627], 'color': 'yellow_9887'}]
res = client.insert(
collection_name="demo",
data=data
)
Insert more data
This code uses a list comprehension to generate multiple entities. The vectors and color labels for these entities are randomly generated.
import random
colors = ["green", "blue", "yellow", "red", "black", "white", "purple", "pink", "orange", "brown", "grey"]
data = [ {
"id": i,
"vector": [ random.uniform(-1, 1) for _ in range(5) ],
"color": f"{random.choice(colors)}_{str(random.randint(1000, 9999))}"
} for i in range(1000) ]
res = client.insert(
collection_name="demo",
data=data[1:]
)
print(res)Step 4: Perform a vector search
Data insertion is an asynchronous process. The search index is not updated immediately after you insert data. To query the latest data, wait a few seconds for the index to update before you perform a search.
Single vector search
You can perform a similarity search for a single vector by providing a query vector.
query_vectors = [
[-0.8832567462711804, -0.2999882617491647, 0.9921295273224382, -0.272575369985379, -0.688914679645338]
]
res = client.search(
collection_name="demo", # The collection to query.
data=query_vectors, # The query vectors.
limit=3, # The number of entities to return.
)
print(res)
Batch vector search
You can perform batch similarity searches by providing a list of query vectors.
query_vectors = [
[0.06586461994037252, 0.7693023529849932, 0.8199991781350795, -0.6988017611187176, 0.408383847889378],
[0.8988257992203861, 0.021911711196309414, 0.19086900086430836, 0.63590610476426, -0.6713237387993141]
]
res = client.search(
collection_name="demo",
data=query_vectors,
limit=3,
)
print(res)Step 5: Search with a filter
You can use the fields defined in the schema to set filter conditions. This narrows the search scope and improves search efficiency.
Filter based on a numeric field
The following example shows how to filter based on the numeric range of the id field.
query_vectors = [
[-0.30932351869632435, -0.7132856078639205, 0.6006201320181415, 0.40140510356426784, -0.21223937444001328]
]
res = client.search(
collection_name="demo",
data=query_vectors,
filter="3 < id < 5", # Filters for IDs within the range (3, 5).
limit=3
)
print(res)
Filter based on a metadata field ($meta)
The following example shows how to search for records where the color property starts with "green". The example also specifies that the color field is included in the output.
query_vectors = [
[0.9636568288732006, -0.5900490884830603, 0.2504591754023724, 0.7120903924474389, 0.7620604497390009]
]
res = client.search(
collection_name="demo",
data=query_vectors,
filter='$meta["color"] like "green%"', # Filters for records where the 'color' metadata starts with "green".
limit=3,
output_fields=["color"] # Specifies which fields to return in the output.
)
print(res)