All Products
Search
Document Center

Vector Retrieval Service for Milvus:Manage External Collections

Last Updated:Jun 03, 2026

An External Collection is an innovative feature of Alibaba Cloud Milvus that lets you query data directly from Data Lake Formation (DLF) lake tables. By synchronizing only metadata, Milvus can efficiently query vector data in DLF without duplicating or storing the raw data. This document explains how to create and manage External Collections using Milvus Manager.

Limitations

  • This feature is only supported in Milvus 2.6 and later. The minor version of your instance must be 2.6.3-0.4.12_3.9.0 or later. To upgrade your instance, see Upgrade Version.

  • Currently, only Paimon tables are supported.

Key concepts

Concept

Description

External Collection

Maps the schema of a DLF lake table to a Milvus collection for querying, without requiring data migration.

lake table

A table in a data lake. Currently, only Paimon tables are supported.

snapshot

A point-in-time snapshot of a lake table.

tag

A label applied to a specific snapshot of a lake table.

Associate a RAM identity

Before using the External Collection feature, you must associate a RAM identity with a Milvus user. Once associated, Milvus uses that RAM account's identity to authorize access to DLF resources.

The following rules apply to RAM account association:

  1. A Milvus administrator can bind or unbind a RAM account for any Milvus user within their management scope.

  2. A non-admin Milvus user can only bind or unbind a RAM account for their own account.

  3. When logged in with an Alibaba Cloud account, you can bind any RAM user under that account to the target Milvus account.

  4. When logged in as a RAM user, you can only bind that same user account.

  5. Associating an Alibaba Cloud account is not supported.

Create an external collection

Follow the three-step wizard to create an External Collection:

Step 1: Link a lake table

Select a DLF instance, database, and target table. The page displays the schema (field name and type) of the lake table. Unsupported types are highlighted in red. For detailed mapping rules, see Schema mapping rules below.

Step 2: Configure the external collection

Configure the following information:

  • Name and Description: The name and description of the collection.

  • Primary key configuration: The primary key supports INT64 and VarChar types. Auto ID is not supported.

  • Vector field: Supports Float, Binary, Float16, and BFloat16 vector types. You must specify the dimension.

  • Scalar field configuration: Configure the scalar field mapping based on the fields of the lake table.

Note

You must specify the vector dimension based on the actual data in your DLF lake table.

Step 3: Confirm data mapping

Review the mapping between the DLF lake table fields and the Milvus collection fields, then click Create to create the corresponding Milvus collection.

Snapshot sync

After creating an External Collection, you must synchronize a snapshot before you can query data. You can perform a quick sync using the post-creation wizard, or configure it later on the Snapshot sync page.

The Snapshot sync page displays the tag information, tag status, synchronization time, and row count for the DLF lake table synchronized with the current External Collection.

Sync operations

  • Auto mode: Automatically fetches the latest snapshot of the DLF lake table (prioritizing COMPACT commit types), creates a tag, and synchronizes it to the Milvus External Collection. You can configure a snapshot prefix for filtering.

  • Manual mode: Synchronizes a specific, existing tag by its name.

Scheduled sync

You can enable scheduled sync and configure a policy for automatic synchronization:

  • Simple cycle mode: Set the sync interval by minute, hour, or day.

  • Cron expression mode: Use a Cron expression to define a custom sync schedule. You can also configure a snapshot prefix.

Sync task list

You can filter tasks by status or tag and page through the results.

Unsupported features

Compared to regular collections, External Collections have the following limitations:

Schema limitations

Unsupported field types:

  • The auto_id=True parameter is not supported. External Collections must use an external primary key.

  • TIMESTAMPZ and GEOMETRY types are not supported.

  • SparseFloatVector type is not supported.

  • Nested complex types, such as Array<Array<T>>, are not supported.

  • A dynamic field is not supported.

  • A partition key is not supported.

Field naming limitations:

  • Field names are case-sensitive.

  • Field names must exactly match the corresponding Paimon table field names.

  • Field name mapping is not supported. For example, you cannot map a Milvus field user_id to a Paimon field userId.

Other limitations:

  • Schema evolution is not supported.

  • Field name remapping is not supported.

  • The is_function_output=True parameter for function output fields is not supported.

Data operation limitations

Unsupported operations:

  • insert(), upsert(), delete(), and flush() operations are not supported.

  • Data cannot be modified directly; it must be changed in the source Paimon table.

Supported operations:

  • search() for vector search.

  • query() for scalar queries.

  • create_index() to create an index.

  • load() and release() to load or release a collection.

  • bulk_import() to trigger a refresh (incremental sync).

Index limitations

Supported index types:

  • Vector indexes: HNSW, HNSW_SQ, HNSW_PQ, IVF_FLAT, IVF_SQ8, IVF_PQ, IVF_RABITQ, and SCANN.

  • Scalar indexes: INVERTED, BITMAP, and STL_SORT.

Index behavior:

  • Indexes are built by Milvus and stored in Milvus object storage.

  • Indexing does not affect the data in the Paimon table.

  • Deleting the collection also deletes its indexes.

Performance limitations

Cold-read performance: Without an index, queries must scan the corresponding columns in the Paimon table. To improve query performance, create an index for each mapped field.

Synchronization performance: Synchronization speed depends on the size of the Paimon table's snapshot, network bandwidth, and Catalog response time.

Consistency limitations

Tag consistency:

  • External Collections provide consistency guarantees based on DLF tags. You are responsible for managing the lifecycle of these tags.

  • Query results reflect the data state associated with the tag from the last successful synchronization. While a new synchronization is in progress, you can continue to query the data associated with the previous tag.

  • To query the latest data, you must manually trigger a refresh.

Concurrency limitations:

  • Only one synchronization task can run at a time for the same External Collection.

  • You can perform queries during synchronization. Queries run against the data from the previous synchronization.

Schema mapping rules

External Collections support mapping the following Milvus field types to Paimon table fields:

Scalar type mapping

Milvus type

Paimon type

Description

Bool

BOOLEAN

Boolean type

Int8

TINYINT

8-bit integer

Int16

SMALLINT

16-bit integer

Int32

INT

32-bit integer

Int64

BIGINT

64-bit integer

Float

FLOAT

Single-precision floating-point

Double

DOUBLE

Double-precision floating-point

VarChar

STRING / VARCHAR / CHAR

String type

Vector type mapping

Milvus type

Paimon type

Description

FloatVector

ARRAY<FLOAT> / ARRAY<DOUBLE>

Floating-point vector

Float16Vector

ARRAY<FLOAT> / ARRAY<DOUBLE>

Half-precision floating-point vector

BinaryVector

ARRAY<BOOL>

Boolean vector

Array type mapping

Milvus type

Paimon type

Description

Array<Bool>

ARRAY<BOOLEAN>

Boolean array

Array<Int8>

ARRAY<TINYINT>

Int8 array

Array<Int16>

ARRAY<SMALLINT>

Int16 array

Array<Int32>

ARRAY<INT>

Int32 array

Array<Int64>

ARRAY<BIGINT>

Int64 array

Array<Float>

ARRAY<FLOAT>

Float array

Array<Double>

ARRAY<DOUBLE>

Double array

Array<VarChar>

ARRAY<STRING>

String array