DataWorks support for Milvus data synchronization - DataWorks

A Milvus data source provides a channel to write data to a Milvus vector database. This topic describes the support that DataWorks provides for Milvus data synchronization.

Supported Milvus versions

Milvus: 2.4.x
Milvus: 2.5.x

Supported field types

The following table lists the data type mappings for Milvus Writer.

Type classification	Milvus data type
LONG	Int8, Int16, Int32, Int64
DOUBLE	Float, Double, FloatVector
STRING	String, VarChar, SparseFloatVector, JSON, Array
BOOLEAN	Bool
BYTES	BFloat16Vector, Float16Vector, BinaryVector

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configure an offline sync task for a single table

For instructions, see Codeless UI configuration or Code editor configuration.
For all parameters and a script demo for the code editor, see the Appendix: Script demo and parameters section.

Appendix: Script demo and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configuration in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script demo

{
  "job": {
    "content": [
      {
        "reader": {
          "parameter": {
            "endpoint": "http://xxxx.milvus.aliyuncs.com:19530",
            "collection": "testColection",
            "database": "default",
            "password": "xxxxxxx",
            "username": "root",
            "column": [
              {
                "name": "id",
                "type": "Int64",
                "primaryKey": "true"
              },
              {
                "name": "int8col",
                "type": "Int8"
              },
              {
                "name": "int16col",
                "type": "Int16"
              }
            ]
          },
          "name": "milvusreader"
        },
        "writer": {
          "stepType": "stream",
          "parameter": {

          },
          "name": "Writer",
          "category": "writer"
        }
      }
    ],
    "setting": {
      "errorLimit": {
        "record": "0"
      },
      "speed": {
        "throttle": false,
        "concurrent": 1,
        "channel": 1
      }
    }
  }
}

Reader script parameters

Parameter	Description	Required	Default value
collection	The collection (table name) to read from Milvus.	Yes	None
batchSize	The number of records to read in each batch.	No	1024
filter	The filter condition for reading data. This is equivalent to a WHERE clause. For configuration details, see https://milvus.io/docs/boolean.md.	No	None
column	The source Milvus fields to read. You can configure dynamic field synchronization in two ways: Synchronize all dynamic fields as one JSON field. `"cloumn":[{ "name":"dynamicName", "type":"json", "dynamicFileType":"allDynamicField" }]` Synchronize a single dynamic field. {singleDynamicName} is the name of the dynamic field in the collection. `"cloumn":[{ "name":"{singleDynamicName}", "type":"int", "dynamicFileType":"singleDynamicField" }]`	Yes	None

Writer script demo

{
  "transform": false,
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType":"stream",
      "parameter":{},
      "name":"Reader",
      "category":"reader"
    },
    {
      "stepType": "milvus",
      "parameter": {
        "schemaCreateMode": "createIfNotExist",     // The mode for creating the collection.
        "enableDynamicSchema": true,            // Specifies whether to enable dynamic fields when creating the collection.
        "envType": 1,
        "datasource": "zm_test",
        "column": [  // The fields to synchronize.
          {
            "name": "floatv1",
            "type": "FloatVector",
            "dimension": "3"
          },
          {
            "name": "incol",
            "type": "Int16"
          }
        ],
        "writeMode": "insert",  // The write mode.
        "collection": "test",  // The destination collection.
        "batchSize": 1024      // The number of records to write in each batch.
      },
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": {
      "record": "0"
    },
    "speed": {
      "concurrent": 2,
      "throttle": false
    }
  }
}

Writer script parameters

Parameter	Description	Required	Default value
datasource	The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source.	Yes	None
collection	The name of the destination collection in Milvus.	Yes	None
partition	The partition of the destination collection in Milvus. If you leave this parameter empty, data is written to the _default partition.	No	_default
column	The destination fields in Milvus. Configure this parameter as an array. Configure the information for a single field in JSON format. The content includes: name: The name of the field. type: The data type of the field. Field properties, such as the dimension of a vector field: `"dimension":3`	Yes	None
writeMode	Milvus supports two write modes: upsert and insert. upsert: For a collection that does not have auto-incrementing primary keys, this mode updates an entity in the collection based on its primary key. For a collection that has auto-incrementing primary keys, Milvus replaces the primary key in the entity with an auto-generated one and inserts the data. insert: This mode is often used to insert data into a collection that has auto-incrementing primary keys. Milvus automatically generates the primary keys. If you use this mode for a collection that does not have auto-incrementing primary keys, data may be duplicated.	No	upsert
batchSize	The number of records to write to Milvus in each batch.	No	1024
schemaCreateMode	Before synchronization, DataWorks checks the collection and performs an operation based on the configured mode. The following modes are supported: createIfNotExist: If the collection does not exist, DataWorks creates a collection based on the configured column and other information, and then starts the synchronization. Ignore: If the collection does not exist, an error is reported. recreate: Before each synchronization, DataWorks deletes the original collection and then creates a new one based on the configured column and other information.	Yes	createIfNotExist
enableDynamicSchema	Specifies whether to enable a dynamic schema when creating the collection.	No	true