All Products
Search
Document Center

DataWorks:Milvus data source

Last Updated:Nov 14, 2025

A Milvus data source provides a channel to write data to a Milvus vector database. This topic describes the support that DataWorks provides for Milvus data synchronization.

Supported Milvus versions

  • Milvus: 2.4.x

  • Milvus: 2.5.x

Supported field types

The following table lists the data type mappings for Milvus Writer.

Type classification

Milvus data type

LONG

Int8, Int16, Int32, Int64

DOUBLE

Float, Double, FloatVector

STRING

String, VarChar, SparseFloatVector, JSON, Array

BOOLEAN

Bool

BYTES

BFloat16Vector, Float16Vector, BinaryVector

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view the infotips of parameters in the DataWorks console to understand the meanings of the parameters when you add a data source.

Develop a data synchronization task

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Configure an offline sync task for a single table

Appendix: Script demo and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configuration in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

Reader script demo

{
  "job": {
    "content": [
      {
        "reader": {
          "parameter": {
            "endpoint": "http://xxxx.milvus.aliyuncs.com:19530",
            "collection": "testColection",
            "database": "default",
            "password": "xxxxxxx",
            "username": "root",
            "column": [
              {
                "name": "id",
                "type": "Int64",
                "primaryKey": "true"
              },
              {
                "name": "int8col",
                "type": "Int8"
              },
              {
                "name": "int16col",
                "type": "Int16"
              }
            ]
          },
          "name": "milvusreader"
        },
        "writer": {
          "stepType": "stream",
          "parameter": {

          },
          "name": "Writer",
          "category": "writer"
        }
      }
    ],
    "setting": {
      "errorLimit": {
        "record": "0"
      },
      "speed": {
        "throttle": false,
        "concurrent": 1,
        "channel": 1
      }
    }
  }
}

Reader script parameters

Parameter

Description

Required

Default value

collection

The collection (table name) to read from Milvus.

Yes

None

batchSize

The number of records to read in each batch.

No

1024

filter

The filter condition for reading data. This is equivalent to a WHERE clause. For configuration details, see https://milvus.io/docs/boolean.md.

No

None

column

The source Milvus fields to read. You can configure dynamic field synchronization in two ways:

  • Synchronize all dynamic fields as one JSON field.

    "cloumn":[{
      "name":"dynamicName",
      "type":"json",
      "dynamicFileType":"allDynamicField"
    }]
  • Synchronize a single dynamic field. {singleDynamicName} is the name of the dynamic field in the collection.

    "cloumn":[{
      "name":"{singleDynamicName}", 
      "type":"int",
      "dynamicFileType":"singleDynamicField"
    }]

Yes

None

Writer script demo

{
  "transform": false,
  "type": "job",
  "version": "2.0",
  "steps": [
    {
      "stepType":"stream",
      "parameter":{},
      "name":"Reader",
      "category":"reader"
    },
    {
      "stepType": "milvus",
      "parameter": {
        "schemaCreateMode": "createIfNotExist",     // The mode for creating the collection.
        "enableDynamicSchema": true,            // Specifies whether to enable dynamic fields when creating the collection.
        "envType": 1,
        "datasource": "zm_test",
        "column": [  // The fields to synchronize.
          {
            "name": "floatv1",
            "type": "FloatVector",
            "dimension": "3"
          },
          {
            "name": "incol",
            "type": "Int16"
          }
        ],
        "writeMode": "insert",  // The write mode.
        "collection": "test",  // The destination collection.
        "batchSize": 1024      // The number of records to write in each batch.
      },
      "name": "Writer",
      "category": "writer"
    }
  ],
  "setting": {
    "errorLimit": {
      "record": "0"
    },
    "speed": {
      "concurrent": 2,
      "throttle": false
    }
  }
}

Writer script parameters

Parameter

Description

Required

Default value

datasource

The name of the data source. The code editor supports adding data sources. The value of this parameter must be the same as the name of the added data source.

Yes

None

collection

The name of the destination collection in Milvus.

Yes

None

partition

The partition of the destination collection in Milvus. If you leave this parameter empty, data is written to the _default partition.

No

_default

column

The destination fields in Milvus. Configure this parameter as an array. Configure the information for a single field in JSON format. The content includes:

  • name: The name of the field.

  • type: The data type of the field.

  • Field properties, such as the dimension of a vector field: "dimension":3

Yes

None

writeMode

Milvus supports two write modes: upsert and insert.

  • upsert: For a collection that does not have auto-incrementing primary keys, this mode updates an entity in the collection based on its primary key. For a collection that has auto-incrementing primary keys, Milvus replaces the primary key in the entity with an auto-generated one and inserts the data.

  • insert: This mode is often used to insert data into a collection that has auto-incrementing primary keys. Milvus automatically generates the primary keys. If you use this mode for a collection that does not have auto-incrementing primary keys, data may be duplicated.

No

upsert

batchSize

The number of records to write to Milvus in each batch.

No

1024

schemaCreateMode

Before synchronization, DataWorks checks the collection and performs an operation based on the configured mode. The following modes are supported:

  • createIfNotExist: If the collection does not exist, DataWorks creates a collection based on the configured column and other information, and then starts the synchronization.

  • Ignore: If the collection does not exist, an error is reported.

  • recreate: Before each synchronization, DataWorks deletes the original collection and then creates a new one based on the configured column and other information.

Yes

createIfNotExist

enableDynamicSchema

Specifies whether to enable a dynamic schema when creating the collection.

No

true