A Milvus data source provides a channel to write data to a Milvus vector database. This topic describes the support that DataWorks provides for Milvus data synchronization.
Supported Milvus versions
Milvus: 2.4.x
Milvus: 2.5.x
Supported field types
The following table lists the data type mappings for Milvus Writer.
Type classification | Milvus data type |
LONG | Int8, Int16, Int32, Int64 |
DOUBLE | Float, Double, FloatVector |
STRING | String, VarChar, SparseFloatVector, JSON, Array |
BOOLEAN | Bool |
BYTES | BFloat16Vector, Float16Vector, BinaryVector |
Add a data source
Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.
Develop a data synchronization task
For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.
Configure an offline sync task for a single table
For instructions, see Codeless UI configuration or Code editor configuration.
For all parameters and a script demo for the code editor, see the Appendix: Script demo and parameters section.
Appendix: Script demo and parameters
Configure a batch synchronization task by using the code editor
If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configuration in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.
Reader script demo
{
"job": {
"content": [
{
"reader": {
"parameter": {
"endpoint": "http://xxxx.milvus.aliyuncs.com:19530",
"collection": "testColection",
"database": "default",
"password": "xxxxxxx",
"username": "root",
"column": [
{
"name": "id",
"type": "Int64",
"primaryKey": "true"
},
{
"name": "int8col",
"type": "Int8"
},
{
"name": "int16col",
"type": "Int16"
}
]
},
"name": "milvusreader"
},
"writer": {
"stepType": "stream",
"parameter": {
},
"name": "Writer",
"category": "writer"
}
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"throttle": false,
"concurrent": 1,
"channel": 1
}
}
}
}Reader script parameters
Parameter | Description | Required | Default value |
collection | The collection (table name) to read from Milvus. | Yes | None |
batchSize | The number of records to read in each batch. | No | 1024 |
filter | The filter condition for reading data. This is equivalent to a WHERE clause. For configuration details, see https://milvus.io/docs/boolean.md. | No | None |
column | The source Milvus fields to read. You can configure dynamic field synchronization in two ways:
| Yes | None |
Writer script demo
{
"transform": false,
"type": "job",
"version": "2.0",
"steps": [
{
"stepType":"stream",
"parameter":{},
"name":"Reader",
"category":"reader"
},
{
"stepType": "milvus",
"parameter": {
"schemaCreateMode": "createIfNotExist", // The mode for creating the collection.
"enableDynamicSchema": true, // Specifies whether to enable dynamic fields when creating the collection.
"envType": 1,
"datasource": "zm_test",
"column": [ // The fields to synchronize.
{
"name": "floatv1",
"type": "FloatVector",
"dimension": "3"
},
{
"name": "incol",
"type": "Int16"
}
],
"writeMode": "insert", // The write mode.
"collection": "test", // The destination collection.
"batchSize": 1024 // The number of records to write in each batch.
},
"name": "Writer",
"category": "writer"
}
],
"setting": {
"errorLimit": {
"record": "0"
},
"speed": {
"concurrent": 2,
"throttle": false
}
}
}Writer script parameters
Parameter | Description | Required | Default value |
datasource | The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source. | Yes | None |
collection | The name of the destination collection in Milvus. | Yes | None |
partition | The partition of the destination collection in Milvus. If you leave this parameter empty, data is written to the _default partition. | No | _default |
column | The destination fields in Milvus. Configure this parameter as an array. Configure the information for a single field in JSON format. The content includes:
| Yes | None |
writeMode | Milvus supports two write modes: upsert and insert.
| No | upsert |
batchSize | The number of records to write to Milvus in each batch. | No | 1024 |
schemaCreateMode | Before synchronization, DataWorks checks the collection and performs an operation based on the configured mode. The following modes are supported:
| Yes | createIfNotExist |
enableDynamicSchema | Specifies whether to enable a dynamic schema when creating the collection. | No | true |