DashVector uses the following data types to represent documents, collections, and their states. This page provides both Python and Java definitions for each type.
Doc
A Doc is a single record in a DashVector collection. It pairs a vector with an ID and optional metadata:
{
"id": "doc-001",
"vector": [0.1, 0.2, 0.3, 0.4],
"sparse_vector": {"10": 0.5, "25": 0.8},
"fields": {"category": "article", "year": 2024},
"score": 0.95
}
Python
@dataclass(frozen=True)
class Doc(object):
id: str # The primary key.
vector: Union[List[int], List[float], np.ndarray] # The vector.
sparse_vector: Optional[Dict[int, float]] = None # The sparse vector.
fields: Optional[FieldDataDict] = None # The custom fields in the document.
score: float = 0.0 # The similarity between vectors.
Java
@Data
@Builder
public class Doc {
// The primary key.
@NonNull private String id;
// The vector.
@NonNull private Vector vector;
// The sparse vector.
private TreeMap<Integer, Float> sparseVector;
// The custom fields in the document.
@Builder.Default private Map<String, Object> fields = new HashMap<>();
// The similarity between vectors.
private float score;
public void addField(String key, String value) {
this.fields.put(key, value);
}
public void addField(String key, Integer value) {
this.fields.put(key, value);
}
public void addField(String key, Float value) {
this.fields.put(key, value);
}
public void addField(String key, Boolean value) {
this.fields.put(key, value);
}
}
Field reference
| Field | Type (Python) | Type (Java) | Description |
|---|
id | str | String | The primary key that uniquely identifies a document. |
vector | Union[List[int], List[float], np.ndarray] | Vector | The dense vector. |
sparse_vector | Optional[Dict[int, float]] | TreeMap<Integer, Float> | The sparse vector. |
fields | Optional[FieldDataDict] | Map<String, Object> | Custom key-value pairs for metadata filtering. |
score | float (default: 0.0) | float | The similarity between vectors. |
Supported field data types
| Type | Python | Java |
|---|
| String | str | String |
| Integer | int | Integer |
| Float | float | Float |
| Boolean | bool | Boolean |
CollectionMeta
CollectionMeta describes a collection's configuration: its name, vector dimensions, distance metric, field schema, and partition layout.
Python
@dataclass(frozen=True)
class CollectionMeta(object):
name: str # The name of the collection.
dimension: int # The number of vector dimensions.
dtype: str # The data type of the vector. Valid values: float and int.
metric: str # The distance metric. Valid values: euclidean, dotproduct, and cosine.
status: Status # The status of the collection.
fields: Dict[str, str] # The fields in the collection. Supported data types of fields: float, bool, int, and str.
partitions: Dict[str, Status] # The information about the partitions in the collection.
Java
@Getter
public class CollectionMeta {
// The name of the collection.
private final String name;
// The number of vector dimensions.
private final int dimension;
// The data type of the vector. Valid values: float and int.
private final CollectionInfo.DataType dataType;
// The distance metric. Valid values: euclidean, dotproduct, and cosine.
private final CollectionInfo.Metric metric;
// The status of the collection.
private final String status;
// The fields in the collection. Supported data types of fields: float, bool, int, and str.
private final Map<String, FieldType> fieldsSchema;
// The information about the partitions in the collection.
private final Map<String, Status> partitionStatus;
public CollectionMeta(CollectionInfo collectionInfo) {
this.name = collectionInfo.getName();
this.dimension = collectionInfo.getDimension();
this.dataType = collectionInfo.getDtype();
this.metric = collectionInfo.getMetric();
this.status = collectionInfo.getStatus().name();
this.fieldsSchema = collectionInfo.getFieldsSchemaMap();
this.partitionStatus = collectionInfo.getPartitionsMap();
}
}
Field reference
| Field | Type (Python) | Type (Java) | Description |
|---|
name | str | String | The name of the collection. |
dimension | int | int | The number of vector dimensions. |
dtype | str | CollectionInfo.DataType | The data type of vectors in the collection. Valid values: float, int. |
metric | str | CollectionInfo.Metric | The distance metric for similarity search. Valid values: euclidean, dotproduct, cosine. |
status | Status | String | The current status of the collection. See Status. |
fields | Dict[str, str] | Map<String, FieldType> | The schema of custom fields defined for the collection. Supported data types: float, bool, int, str. |
partitions | Dict[str, Status] | Map<String, Status> | A mapping of partition names to their current status. |
Distance metrics
| Metric | API value |
|---|
| Euclidean distance | euclidean |
| Dot product | dotproduct |
| Cosine similarity | cosine |
Vector data types
| Data type | API value |
|---|
| Float | float |
| Integer | int |
CollectionStats
CollectionStats reports the document count and index build progress of a collection.
Python
@dataclass(frozen=True)
class CollectionStats(object):
total_doc_count: int # The total number of documents inserted into the collection.
index_completeness: float # The completeness of data insertion into the collection.
partitions: Dict[str, PartitionStats] # The information about the partitions in the collection.
Java
@Getter
public class CollectionStats {
// The total number of documents inserted into the collection.
private final long totalDocCount;
// The completeness of data insertion into the collection.
private final float indexCompleteness;
// The information about the partitions in the collection.
private final Map<String, PartitionStats> partitions;
public CollectionStats(StatsCollectionResponse.CollectionStats collectionStats) {
this.totalDocCount = collectionStats.getTotalDocCount();
this.indexCompleteness = collectionStats.getIndexCompleteness();
this.partitions = new HashMap<>();
collectionStats
.getPartitionsMap()
.forEach((key, value) -> this.partitions.put(key, new PartitionStats(value)));
}
}
Field reference
| Field | Type (Python) | Type (Java) | Description |
|---|
total_doc_count | int | long | The total number of documents inserted into the collection. |
index_completeness | float | float | The completeness of data insertion into the collection. |
partitions | Dict[str, PartitionStats] | Map<String, PartitionStats> | A mapping of partition names to their statistics. See PartitionStats. |
PartitionStats
PartitionStats reports the document count for a single partition within a collection.
Python
@dataclass(frozen=True)
class PartitionStats(object):
total_doc_count: int # The total number of documents in the partition.
Java
@Getter
public class PartitionStats {
// The total number of documents in the partition.
private final long totalDocCount;
public PartitionStats(com.aliyun.dashvector.proto.PartitionStats partitionStats) {
this.totalDocCount = partitionStats.getTotalDocCount();
}
}
Field reference
| Field | Type (Python) | Type (Java) | Description |
|---|
total_doc_count | int | long | The total number of documents in the partition. |
Status
The Status enum defines the lifecycle states of a collection or partition.
class Status(IntEnum):
INITIALIZED = 0 # The collection or partition is being created.
SERVING = 1 # The collection or partition is in service.
DROPPING = 2 # The collection or partition is being deleted.
ERROR = 3 # The collection or partition is abnormal.
| Value | Integer | Description |
|---|
INITIALIZED | 0 | The collection or partition is being created. |
SERVING | 1 | The collection or partition is in service. |
DROPPING | 2 | The collection or partition is being deleted. |
ERROR | 3 | The collection or partition encountered an error. |
Type aliases (Python)
The Python SDK defines the following type aliases:
long = NewType("long", int)
FieldDataType = Union[long, str, int, float, bool]
FieldDataDict = Dict[str, FieldDataType]
VectorValueType = Union[List[int], List[float], np.ndarray]
| Alias | Definition | Description |
|---|
long | NewType("long", int) | Extended integer type. |
FieldDataType | Union[long, str, int, float, bool] | Accepted value types for custom document fields. |
FieldDataDict | Dict[str, FieldDataType] | Type of the fields parameter in Doc. |
VectorValueType | Union[List[int], List[float], np.ndarray] | Accepted types for the vector parameter in Doc. |