DashVector is schema-free by design. When you insert, update, or upsert a document, pass any key-value pairs in the fields parameter. No upfront schema definition is required.
collection.insert(
Doc(
id='1',
vector=np.random.rand(4),
fields={
'name': 'zhangsan',
'weight': 70.0,
'age': 30,
'anykey1': 'anyvalue',
'anykey2': 1,
'anykey3': True,
'anykey4': 3.1415926
... ...
}
)
)Each additional field consumes memory and disk resources. Only include fields that serve your filtering or retrieval needs.
Supported data types
Fields accept four Python data types:
| Type | Description | Constraints |
|---|---|---|
str | String values | -- |
float | Floating-point numbers | -- |
int | Integer values | 32-bit signed only: -2,147,483,648 to 2,147,483,647 |
bool | Boolean values | True or False |
Python's int type supports arbitrary precision, but DashVector accepts only 32-bit signed integers (-2,147,483,648 to 2,147,483,647). Values outside this range cause overflow errors.
Filter by fields
Use field key-value pairs in filter expressions to narrow down search results:
ret = collection.query(
vector=[0.1, 0.2, 0.3, 0.4],
filter='(age > 18 and anykey2 = 1) or (name like "zhang%" and anykey3 = false)'
)More fields and more complex filter expressions increase CPU usage and query latency.
When to predefine a field schema
Although DashVector is schema-free by default, predefining a field schema when creating a collection improves query performance, reduces storage overhead, and enables input validation.
ret = client.create(
name='complex',
dimension=4,
fields_schema={'name': str, 'weight': float, 'age': int}
)Benefits of predefined fields
| Benefit | Description |
|---|---|
| Faster filtering | Conditional filtering on predefined fields uses less CPU and returns results faster than filtering on ad-hoc fields. |
| Lower storage overhead | Predefined fields store values only. Ad-hoc fields store both keys and values, consuming more memory and disk space. |
| Filter pre-validation | DashVector validates filter syntax against predefined field types and returns an error for type mismatches. Without a schema, type validation is not available. |
Recommended approach
Predefine fields that appear in most documents and that you filter on frequently. Use ad-hoc fields at insert time for attributes specific to a subset of documents.