Manage External Collections via Milvus Manager - Vector Retrieval Service for Milvus

Limitations

This feature is only supported in Milvus 2.6 and later. The minor version of your instance must be 2.6.3-0.4.12_3.9.0 or later. To upgrade your instance, see Upgrade version.
Currently, only Paimon tables are supported.

Key concept

Concept	Description
External Collection	Maps the schema of a DLF lake table to a Milvus collection for querying, without requiring data migration.
lake table	A table in a data lake. Currently, only Paimon tables are supported.
snapshot	A point-in-time snapshot of a lake table.
tag	A label applied to a specific snapshot of a lake table.

Associate a RAM identity

Before using External Collections, you must associate a RAM identity with a Milvus user. Milvus then uses that RAM identity to authorize access to DLF resources.

RAM account association rules:

A Milvus administrator can bind or unbind a RAM account for any Milvus user within their management scope.
A non-admin Milvus user can only bind or unbind a RAM account for their own account.
When logged in with an Alibaba Cloud account, you can bind any RAM user under that account to the target Milvus account.
When logged in as a RAM user, you can only bind that same user account.
Associating an Alibaba Cloud account is not supported.

Create an external collection

Use the three-step wizard to create an External Collection:

Step 1: Link a lake table

Select a DLF instance, database, and target table. The page displays the schema (field name and type) of the lake table. Unsupported types are highlighted in red. For detailed mapping rules, see Schema mapping rules below.

Step 2: Configure the external collection

Configure the following settings:

Name and Description: The name and description of the collection.
Primary key configuration: The primary key supports INT64 and VarChar types. Auto ID is not supported.
Vector field: Supports Float, Binary, Float16, and BFloat16 vector types. You must specify the dimension.
Scalar field configuration: Configure the scalar field mapping based on the fields of the lake table.

Note

You must specify the vector dimension based on the actual data in your DLF lake table.

Step 3: Confirm data mapping

Confirm the mapping between DLF lake table fields and Milvus collection fields, and then click Create to create the Milvus collection.

Snapshot sync

After creating an External Collection, you must complete a snapshot sync before you can query data. Use the quick sync wizard that appears after creation, or configure the sync later on the snapshot sync page.

This page displays the tag information, tag status, sync time, and row count for the DLF lake table associated with the current External Collection.

Sync operations

Auto mode: Automatically retrieves the latest snapshot of the DLF table (preferring the COMPACT commit type), creates a tag, and syncs it to the Milvus External Collection. You can configure a snapshot prefix filter.
Manual mode: Select an existing tag name to sync.

Scheduled sync configuration

Enable scheduled sync and configure a sync policy:

Simple interval mode: Set a sync interval in minutes, hours, or days.
Cron expression mode: Use a cron expression to customize the sync schedule. You can also configure a snapshot prefix.

Sync task list

Filter tasks by status or tag and browse results with pagination. For failed tasks, you can view the failure reason.

Unsupported features for external collections

External collections have the following limitations compared with regular collections:

Schema limitations

Unsupported field types:

auto_id=True is not supported. External Collections must use an external primary key.
TIMESTAMPZ and GEOMETRY types are not supported.
SparseFloatVector type is not supported.
Nested complex types (such as Array>) are not supported.
DynamicField is not supported.
PartitionKey is not supported.

Field naming limitations:

Field names are case-sensitive.
Field names must exactly match the Paimon table field names.
Field name mapping is not supported (for example, mapping user_id in Milvus to userId in Paimon).

Other limitations:

Schema evolution is not supported. The schema cannot be changed after creation.
Field name remapping is not supported.
is_function_output=True (function output fields) is not supported.

Data operation limitations

Unsupported operations:

insert(), upsert(), delete(), and flush() operations are not supported.
Direct data modification is not supported. Data must be modified on the Paimon table side.

Supported operations:

search(): Vector search.
query(): Scalar query.
create_index(): Create an index.
load()/release(): Load or release a collection.
bulk_import(): Trigger a refresh (incremental sync).

Index limitations

Supported index types:

Vector indexes: HNSW, HNSW_SQ, HNSW_PQ, IVF_FLAT, IVF_SQ8, IVF_PQ, IVF_RABITQ, and SCANN.
Scalar indexes: INVERTED, BITMAP, and STL_SORT.

Index behavior:

Indexes are built on the Milvus side and stored in the Milvus object storage.
Indexes do not affect the Paimon table data.
Dropping a collection also deletes its indexes.

Performance limitations

Cold read performance: Without indexes, queries must scan the corresponding column data in the Paimon table. Create indexes for all mapped fields to improve query performance.

Sync operation performance: Sync operation speed depends on the snapshot size of the Paimon table, network bandwidth, and catalog response speed.

Consistency limitations

Tag consistency:

External Collections provide consistency guarantees based on DLF tags. You are responsible for managing the lifecycle of tags.
Query results reflect the state of the tag specified during the last sync. During a sync operation, you can query data from the previously refreshed tag.
To query the latest data, you must manually trigger a refresh.

Concurrency limitations:

Only one sync task can run at a time for a single External Collection.
Queries can be performed normally during a sync operation. The queries return data from the last completed sync.

Schema mapping rules

The following tables list the supported mappings between Milvus field types and Paimon table fields:

Scalar type mapping

Milvus type	Paimon type	Description
Bool	BOOLEAN	Boolean type
Int8	TINYINT	8-bit integer
Int16	SMALLINT	16-bit integer
Int32	INT	32-bit integer
Int64	BIGINT	64-bit integer
Float	FLOAT	Single-precision floating point
Double	DOUBLE	Double-precision floating point
VarChar	STRING / VARCHAR / CHAR	String type

Vector type mapping

Milvus type	Paimon type	Description
FloatVector	ARRAY / ARRAY	Float vector
Float16Vector	ARRAY / ARRAY	Half-precision float vector
BinaryVector	ARRAY	Boolean vector

Array type mapping

Milvus type	Paimon type	Description
Array	ARRAY	Boolean array
Array	ARRAY	Int8 array
Array	ARRAY	Int16 array
Array	ARRAY	Int32 array
Array	ARRAY	Int64 array
Array	ARRAY	Float array
Array	ARRAY	Double array
Array	ARRAY	String array