Manage collections - Vector Retrieval Service for Milvus

Collections in Milvus are similar to tables in a relational database. They are the fundamental units for organizing and managing vector data and its associated scalar metadata. You can customize the data processing flow by configuring indexes, partitions, and shards. This lets you create an end-to-end solution for data ingestion, storage, queries, and analysis.

Prerequisites

You have installed the PyMilvus library on your local client and updated it to the latest version.
If you have not installed the PyMilvus library or need to update it, run the following command.
```
pip install --upgrade pymilvus
```
You have created a Milvus instance. For more information, see Create a Milvus instance.

Create a collection

Quickly create a collection

You can quickly create a collection by specifying its name and vector dimension.

from pymilvus import MilvusClient

# Create a Milvus client.
client = MilvusClient(
    uri="http://c-xxxx.milvus.aliyuncs.com:19530",  # The public endpoint of the Milvus instance.
    token="<yourUsername>:<yourPassword>",  # The username and password for the Milvus instance.
    db_name="default"  # The name of the database to connect to. This example uses the default database.
)

# Create a collection named milvus_collection.
client.create_collection(
    collection_name="milvus_collection",
    dimension=5
)

res = client.get_load_state(
    collection_name="milvus_collection"
)
# Return the current loading status of the collection.
print(res)

This code creates a collection with only two fields: `id` (primary key) and `vector` (vector field). The `auto_id and `enable_dynamic_field` properties are enabled by default.

auto_id: When this property is enabled, Milvus automatically assigns an auto-incrementing primary key to each record. You do not need to set the primary key value.
enable_dynamic_field: When this property is enabled, any fields other than the predefined `id` and `vector` fields are treated as dynamic fields. They are stored as key-value pairs in a special field named $meta. This property lets you insert data for fields that have not been defined in the schema.

These settings allow you to flexibly insert and manage data without defining all fields in advance.

Create a custom collection

When creating a custom collection, you can specify schema and index parameters.

from pymilvus import MilvusClient, DataType, time

# Create a Milvus client.
client = MilvusClient(
    uri="http://c-xxxx.milvus.aliyuncs.com:19530",  # The Internet-accessible address of the Milvus instance.
    token="<yourUsername>:<yourPassword>",  # The username and password to log on to the Milvus instance.
    db_name="default"  # The name of the database to connect to. This example uses the default database.
)

# Create a schema.
schema = MilvusClient.create_schema(
    auto_id=False,  # Disable automatic ID generation.
    enable_dynamic_field=True,  # Enable dynamic fields.
)

# Add schema fields.
# Add an INT64 field named test_id as the primary key.
schema.add_field(field_name="test_id", datatype=DataType.INT64, is_primary=True)
# Add a 768-dimensional float vector field named test_vector.
schema.add_field(field_name="test_vector", datatype=DataType.FLOAT_VECTOR, dim=768)

# Define index parameters.
index_params = client.prepare_index_params()

# Add indexes.
index_params.add_index(
    field_name="test_id",  
    index_type="STL_SORT"  # Set the index type.
)


index_params.add_index(
    field_name="test_vector",  
    index_type="IVF_SQ8",      # Set the index type.
    metric_type="L2",          # Set the metric type, such as L2.
    params={"nlist": 128}
)
# Create a collection with the specified schema and index.
client.create_collection(
    collection_name="milvus_collection",  # Create a collection named milvus_collection.
    schema=schema,
    index_params=index_params
)

time.sleep(5)

res = client.get_load_state(
    collection_name="milvus_collection"
)

print(res)

You can also create the collection and the index separately. The following code is an example.

# Create a collection named milvus_collection1.
client.create_collection(
    collection_name="milvus_collection1",
    schema=schema,
)
# Query the loading status of milvus_collection1.
res = client.get_load_state(
    collection_name="milvus_collection1"
)

print(res)
# Create an index for milvus_collection1.
client.create_index(
    collection_name="milvus_collection1",
    index_params=index_params
)
# Query the loading status of milvus_collection1 after the index is created.
res = client.get_load_state(
    collection_name="milvus_collection1"
)

print(res)

View collections

You can view the details of a collection.

res = client.describe_collection(collection_name="milvus_collection")

print(res)

You can view all collections in the current database.
```
res = client.list_collections()

print(res)
```

Load and release a collection

When you load a collection, Milvus loads its associated index files into memory. When you release a collection, Milvus unloads the index files from memory. You must load a collection into memory before you can perform a search on it.

Load a collection

client.load_collection(
    collection_name="milvus_collection"
)

res = client.get_load_state(
    collection_name="milvus_collection"
)

print(res)

Release a collection

client.release_collection(
    collection_name="milvus_collection"
)

res = client.get_load_state(
    collection_name="milvus_collection"
)

print(res)

Delete a collection

client.drop_collection(
    collection_name="milvus_collection"
)

client.drop_collection(
    collection_name="milvus_collection1"
)