Schema-free mechanism and the benefits of predefining fields - DashVector

DashVector is schema-free by design. When you insert, update, or upsert a document, pass any key-value pairs in the fields parameter. No upfront schema definition is required.

Python

collection.insert(
    Doc(
        id='1',
        vector=np.random.rand(4),
        fields={
            'name': 'zhangsan',
            'weight': 70.0,
            'age': 30,
            'anykey1': 'anyvalue',
            'anykey2': 1,
            'anykey3': True,
            'anykey4': 3.1415926
            ... ...
        }
    )
)

Note

Each additional field consumes memory and disk resources. Only include fields that serve your filtering or retrieval needs.

Supported data types

Fields accept four Python data types:

Type	Description	Constraints
`str`	String values	--
`float`	Floating-point numbers	--
`int`	Integer values	32-bit signed only: -2,147,483,648 to 2,147,483,647
`bool`	Boolean values	`True` or `False`

Important

Python's int type supports arbitrary precision, but DashVector accepts only 32-bit signed integers (-2,147,483,648 to 2,147,483,647). Values outside this range cause overflow errors.

Filter by fields

Use field key-value pairs in filter expressions to narrow down search results:

Python

ret = collection.query(
    vector=[0.1, 0.2, 0.3, 0.4],
    filter='(age > 18 and anykey2 = 1) or (name like "zhang%" and anykey3 = false)'
)

Note

More fields and more complex filter expressions increase CPU usage and query latency.

When to predefine a field schema

Although DashVector is schema-free by default, predefining a field schema when creating a collection improves query performance, reduces storage overhead, and enables input validation.

Python

ret = client.create(
    name='complex',
    dimension=4,
    fields_schema={'name': str, 'weight': float, 'age': int}
)

Benefits of predefined fields

Benefit	Description
Faster filtering	Conditional filtering on predefined fields uses less CPU and returns results faster than filtering on ad-hoc fields.
Lower storage overhead	Predefined fields store values only. Ad-hoc fields store both keys and values, consuming more memory and disk space.
Filter pre-validation	DashVector validates filter syntax against predefined field types and returns an error for type mismatches. Without a schema, type validation is not available.

Recommended approach

Predefine fields that appear in most documents and that you filter on frequently. Use ad-hoc fields at insert time for attributes specific to a subset of documents.