All Products
Search
Document Center

Vector Retrieval Service for Milvus:Update instance configurations

Last Updated:Sep 09, 2025

You can view and modify instance configurations in the Vector Retrieval Service for Milvus console. This topic describes how to update the configurations of a Milvus instance.

Procedure

  1. Go to the instance configuration page.

    1. Log on to the Vector Retrieval Service for Milvus console.

    2. In the left navigation pane, click Instances.

    3. In the top menu bar, select a region.

    4. On the Instances page, click the name of your Milvus instance.

    5. Select the Configurations tab.

  2. In the Configurations text box, copy and paste the following code to overwrite the default configurations, and then click Save Configurations.

    The code must be in YAML format. Example:

    # Configurations for rootCoord, which handles DDL and DCL requests.
    rootCoord:
      maxDatabaseNum: 64 # Maximum number of databases.
      maxPartitionNum: 4096 # Maximum number of partitions in a collection.
      minSegmentSizeToEnableIndex: 1024 # If a segment is smaller than this value, it is not indexed.
      importTaskExpiration: 900 # The duration in seconds after which an import task expires and is terminated. Default value: 900 (15 minutes).
      importTaskRetention: 86400 # The minimum duration in seconds that Milvus retains import task records. Default value: 86400 (24 hours).
      grpc:
        serverMaxSendSize: 536870912
        serverMaxRecvSize: 268435456
        clientMaxSendSize: 268435456
        clientMaxRecvSize: 536870912
        
    # Configurations for the proxy, which validates client requests and reduces the returned results.
    proxy:
      timeTickInterval: 200 # The interval in milliseconds at which the proxy syncs the time tick.
      healthCheckTimeout: 3000 # The interval in milliseconds for component health checks.
      maxNameLength: 255 # Maximum length of a collection name or alias.
      # Max number of fields in a collection.
      # As of Milvus 2.2.0, don't set maxFieldNum to 64 or a greater value.
      # Adjust this parameter at your own risk.
      maxFieldNum: 64
      maxTaskNum: 1024 # The maximum number of tasks in the proxy task queue.
      grpc:
        serverMaxSendSize: 268435456
        serverMaxRecvSize: 67108864
        clientMaxSendSize: 268435456
        clientMaxRecvSize: 67108864
    
    # Configurations for queryCoord, which manages topology and load balancing for query nodes, and handles handoffs from growing segments to sealed segments.
    queryCoord:
      autoHandoff: true # Enables or disables automatic handoff.
      autoBalance: true # Enables or disables automatic balancing.
      balancer: ScoreBasedBalancer # The balancer to use.
      overloadedMemoryThresholdPercentage: 90 # The memory overload threshold, as a percentage.
      balanceIntervalSeconds: 60
      memoryUsageMaxDifferencePercentage: 30
      checkInterval: 1000
      channelTaskTimeout: 60000 # 1 minute
      segmentTaskTimeout: 120000 # 2 minutes
      distPullInterval: 500
      heartbeatAvailableInterval: 10000 # The interval in seconds. Only Query Nodes that fetch heartbeats within this interval are considered available.
      loadTimeoutSeconds: 600
      checkHandoffInterval: 5000
      grpc:
        serverMaxSendSize: 536870912
        serverMaxRecvSize: 268435456
        clientMaxSendSize: 268435456
        clientMaxRecvSize: 536870912
    
    # Configurations for queryNode, which runs hybrid searches between vector and scalar data.
    queryNode:
      dataSync:
        flowGraph:
          maxQueueLength: 16 # Maximum length of the task queue in the flow graph.
          maxParallelism: 1024 # Maximum number of parallel tasks in the flow graph.
      stats:
        publishInterval: 1000 # The interval in milliseconds at which a Query Node reports node information.
      segcore:
        cgoPoolSizeRatio: 2.0 # The ratio of the cgo pool size to the maximum read concurrency.
        knowhereThreadPoolNumRatio: 4
        # Use more threads to improve SSD throughput for on-disk indexes.
        # This parameter takes effect only when enable-disk is set to true.
        # This value must be greater than 1 and less than 32.
        chunkRows: 128 # The number of vectors in a chunk.
        exprEvalBatchSize: 8192 # The batch size for the executor to get the next batch.
        interimIndex: # Builds an interim vector index for a growing segment or binary log to accelerate searches.
          enableIndex: true
          nlist: 128 # The nlist value for the segment index.
          nprobe: 16 # The nprobe value for searching a segment. This value must be smaller than nlist and depends on your accuracy requirements.
          memExpansionRate: 1.15 # The ratio of memory usage for building the interim index to the raw data size.
      loadMemoryUsageFactor: 1 # The multiplication factor for calculating memory usage when loading segments.
      enableDisk: false # Specifies whether to enable the query node to load disk indexes and search on them.
      maxDiskUsagePercentage: 95
      grouping:
        enabled: true
        maxNQ: 1000
        topKMergeRatio: 20
      scheduler:
        receiveChanSize: 10240
        unsolvedQueueSize: 10240
        # The concurrency ratio for read tasks, such as search and query tasks.
        # The maximum read concurrency is calculated as runtime.NumCPU × maxReadConcurrentRatio.
        # The default value is 2, which means the maximum read concurrency is runtime.NumCPU × 2.
        # The maximum read concurrency must be greater than or equal to 1 and less than or equal to runtime.NumCPU × 100.
        # Range: (0, 100].
        maxReadConcurrentRatio: 1
        cpuRatio: 10 # The ratio used to estimate CPU usage for read tasks.
        maxTimestampLag: 86400
        # The scheduling policy for read tasks. Default value: fifo. Optional value: user-task-polling.
        scheduleReadPolicy:
          # fifo: A First-In, First-Out (FIFO) queue supports the schedule.
          # user-task-polling:
          #     User tasks are polled and scheduled one by one.
          #     Scheduling is fair at the task level.
          #     The policy is based on the username for authentication.
          #     An empty username is treated as the same user.
          #     If there are no multiple users, the policy defaults to FIFO.
          name: fifo
          maxPendingTask: 10240
          # user-task-polling configurations:
          taskQueueExpire: 60 # The expiration time in seconds for the inner user task queue after it becomes empty. Default value: 60 (1 minute).
          enableCrossUserGrouping: false # Enables or disables cross-user grouping when using the user-task-polling policy. Default value: false. Set this to false if a user's tasks cannot be merged with others.
          maxPendingTaskPerUser: 1024 # The maximum number of pending tasks per user in the scheduler. Default value: 50.
      grpc:
        serverMaxSendSize: 536870912
        serverMaxRecvSize: 268435456
        clientMaxSendSize: 268435456
        clientMaxRecvSize: 536870912
    
    indexCoord:
      bindIndexNodeMode:
        enable: false
        withCred: false
      segment:
        minSegmentNumRowsToEnableIndex: 1024 # The minimum threshold. If the number of rows in a segment is less than this value, the segment is not indexed.
    
    indexNode:
      scheduler:
        buildParallel: 1
      enableDisk: true # Enables or disables the index node to build disk vector indexes.
      maxDiskUsagePercentage: 95
      grpc:
        serverMaxSendSize: 536870912
        serverMaxRecvSize: 268435456
        clientMaxSendSize: 268435456
        clientMaxRecvSize: 536870912
    
    dataCoord:
      channel:
        watchTimeoutInterval: 300 # The timeout in seconds for watching channels. A DataNode tickler update of the watch progress resets the timeout timer.
        balanceSilentDuration: 300 # The duration in seconds before the channelBalancer on the dataCoord runs.
        balanceInterval: 360 # The interval in seconds at which the channelBalancer on the dataCoord checks the balance status.
      segment:
        maxSize: 1024 # The maximum size of a segment, in MB.
        diskSegmentMaxSize: 2048 # The maximum size in MB of a segment for a collection that has a on-disk index.
        sealProportion: 0.12
        # The assignment expiration time, in milliseconds.
        # Warning: this is an expert parameter and is closely related to data integrity. Do not change it without a
        # specific goal and a solid understanding of the scenarios. If you must alter
        # this parameter, make sure that the new value is larger than the previous value used before the restart.
        # Otherwise, there is a high risk of data loss.
        assignmentExpiration: 2000
        maxLife: 86400 # The maximum lifetime of a segment, in seconds. 24 × 60 × 60.
        # If a segment does not accept DML records within maxIdleTime and its size is greater than
        # minSizeFromIdleToSealed, Milvus automatically seals it.
        # The maximum idle time of a segment, in seconds. 10 × 60.
        maxIdleTime: 600
        minSizeFromIdleToSealed: 16 # The minimum size in MB for a segment to be sealed after being idle.
        # The maximum number of binary logging files for one segment. The segment is sealed if
        # the number of binary logging files reaches this value.
        maxBinlogFileNumber: 32
        smallProportion: 0.5 # A segment is considered a "small segment" when its number of rows is smaller than
        # (smallProportion × segment max # of rows).
        # Compaction occurs on small segments if the segment after compaction has
        compactableProportion: 0.85
        # more than (compactableProportion × segment max # of rows) rows.
        # This value must be greater than or equal to <smallProportion>.
        # During compaction, the number of rows in a segment can exceed the maximum number of rows by (expansionRate - 1) × 100%.
        expansionRate: 1.25
        # Enables or disables level-zero segments.
        enableLevelZero: false
      enableCompaction: true # Enables or disables data segment compaction.
      compaction:
        enableAutoCompaction: true
        rpcTimeout: 10 # The timeout for compaction RPC requests, in seconds.
        maxParallelTaskNum: 10 # The maximum number of parallel compaction tasks.
        indexBasedCompaction: true
    
        levelzero:
          forceTrigger:
            minSize: 8 # The minimum size in MB to force trigger a LevelZero compaction.
            deltalogMinNum: 10 # The minimum number of deltalog files to force trigger a LevelZero compaction.
    
      enableGarbageCollection: true
      gc:
        interval: 3600 # The garbage collection (GC) interval, in seconds.
        missingTolerance: 3600 # The tolerance duration in seconds for missing file metadata.
        dropTolerance: 10800 # The tolerance duration in seconds for files that belong to a dropped entity.
      enableActiveStandby: false
      grpc:
        serverMaxSendSize: 536870912
        serverMaxRecvSize: 268435456
        clientMaxSendSize: 268435456
        clientMaxRecvSize: 536870912
    
    dataNode:
      dataSync:
        flowGraph:
          maxQueueLength: 16 # Maximum length of the task queue in the flow graph.
          maxParallelism: 1024 # Maximum number of tasks that can be run in parallel in the flow graph.
        maxParallelSyncMgrTasks: 256 # The maximum number of concurrent sync tasks for the DataNode sync manager globally.
        skipMode:
          # If only timetick messages are in the flow graph for a period longer than coldTime,
          # the flow graph enters skip mode to skip most timeticks. This reduces costs, especially when there are many channels.
          enable: true
          skipNum: 4
          coldTime: 60
      segment:
        insertBufSize: 16777216 # The maximum buffer size to flush for a single segment.
        deleteBufBytes: 67108864 # The maximum buffer size to flush deletions for a single channel.
        syncPeriod: 600 # The period in seconds to sync segments if the buffer is not empty.
      # You can specify an IP address. For example:
      # ip: 127.0.0.1
      grpc:
        serverMaxSendSize: 536870912
        serverMaxRecvSize: 268435456
        clientMaxSendSize: 268435456
        clientMaxRecvSize: 536870912
      memory:
        forceSyncEnable: true # `true`: Forces a sync if memory usage is too high.
        forceSyncSegmentNum: 1 # The number of segments to sync. Segments with the largest buffers are synced first.
        watermarkStandalone: 0.2 # The memory watermark for standalone mode. When this watermark is reached, segments are synced.
        watermarkCluster: 0.5 # The memory watermark for cluster mode. When this watermark is reached, segments are synced.
      timetick:
        byRPC: true
      channel:
        # Specifies the size of the global work pool for all channels.
        # If this parameter is less than or equal to 0, it is set to the maximum number of CPUs that can be executing.
        # Set a larger value for large numbers of collections to avoid blocking.
        workPoolSize: -1
        # Specifies the size of the global work pool for channel checkpoint updates.
        # If this parameter is less than or equal to 0, it is set to 1000.
        # Set a larger value for large numbers of collections to avoid blocking.
        updateChannelCheckpointMaxParallel: 1000
    
    grpc:
      client:
        compressionEnabled: false
        dialTimeout: 200
        keepAliveTime: 10000
        keepAliveTimeout: 20000
        maxAttempts: 10
        initialBackoff: 0.2 # seconds
        maxBackoff: 10 # seconds
        
    quotaAndLimits:
      enabled: true # `true`: Enables quotas and limits. `false`: Disables quotas and limits.
      limits:
        maxCollectionNum: 65536
        maxCollectionNumPerDB: 65536
      # The interval in seconds at which the quotaCenter
      # collects metrics from proxies, the query cluster, and the data cluster.
      # Range: 0 to 65536.
      quotaCenterCollectInterval: 3
      ddl:
        enabled: false
        collectionRate: -1 # The rate limit in queries per second (qps) for CreateCollection, DropCollection, LoadCollection, and ReleaseCollection. Default: no limit.
        partitionRate: -1 # The rate limit in qps for CreatePartition, DropPartition, LoadPartition, and ReleasePartition. Default: no limit.
      indexRate:
        enabled: false
        max: -1 # The rate limit in qps for CreateIndex and DropIndex. Default: no limit.
      flushRate:
        enabled: false
        max: -1 # The rate limit in qps for flush operations. Default: no limit.
      compactionRate:
        enabled: false
        max: -1 # The rate limit in qps for manual compaction. Default: no limit.
      dml:
        # DML rate limits. Default: no limit.
        # The rate does not exceed the max value.
        enabled: false
        insertRate:
          collection:
            max: -1 # The maximum rate in MB/s. Default: no limit.
          max: -1 # The maximum rate in MB/s. Default: no limit.
        upsertRate:
          collection:
            max: -1 # The maximum rate in MB/s. Default: no limit.
          max: -1 # The maximum rate in MB/s. Default: no limit.
        deleteRate:
          collection:
            max: -1 # The maximum rate in MB/s. Default: no limit.
          max: -1 # The maximum rate in MB/s. Default: no limit.
        bulkLoadRate:
          collection:
            max: -1 # The maximum rate in MB/s. Default: no limit. Not supported yet. TODO: Limit the bulkLoad rate.
          max: -1 # The maximum rate in MB/s. Default: no limit. Not supported yet. TODO: Limit the bulkLoad rate.
      dql:
        # DQL rate limits. Default: no limit.
        # The rate does not exceed the max value.
        enabled: false
        searchRate:
          collection:
            max: -1 # The maximum rate in vectors per second (vps). Default: no limit.
          max: -1 # The maximum rate in vps. Default: no limit.
        queryRate:
          collection:
            max: -1 # The maximum rate in qps. Default: no limit.
          max: -1 # The maximum rate in qps. Default: no limit.
      limitWriting:
        # forceDeny: false allows DML requests (except under specific
        # conditions, such as node memory reaching a watermark). forceDeny: true always rejects all DML requests.
        forceDeny: false
        ttProtection:
          enabled: false
          # Indicates the backpressure for DML operations.
          # DML rates are reduced based on the ratio of the time tick delay to maxTimeTickDelay.
          # If the time tick delay is greater than maxTimeTickDelay, all DML requests are rejected.
          # Unit: seconds.
          maxTimeTickDelay: 300
        memProtection:
          # If memory usage > memoryHighWaterLevel, all DML requests are rejected.
          # If memoryLowWaterLevel < memory usage < memoryHighWaterLevel, the DML rate is reduced.
          # If memory usage < memoryLowWaterLevel, no action is taken.
          enabled: true
          dataNodeMemoryLowWaterLevel: 0.85 # The memoryLowWaterLevel in DataNodes. Range: (0, 1].
          dataNodeMemoryHighWaterLevel: 0.95 # The memoryHighWaterLevel in DataNodes. Range: (0, 1].
          queryNodeMemoryLowWaterLevel: 0.85 # The memoryLowWaterLevel in QueryNodes. Range: (0, 1].
          queryNodeMemoryHighWaterLevel: 0.95 # The memoryHighWaterLevel in QueryNodes. Range: (0, 1].
        growingSegmentsSizeProtection:
          # No action is taken if the growing segment size is less than the low watermark.
          # When the growing segment size exceeds the low watermark, the DML rate is reduced,
          # but the rate will not be lower than minRateRatio × dmlRate.
          enabled: false
          minRateRatio: 0.5
          lowWaterLevel: 0.2
          highWaterLevel: 0.4
        diskProtection:
          enabled: true # If the total file size in object storage is greater than diskQuota, all DML requests are rejected.
          diskQuota: -1 # The disk quota in MB. Range: (0, +inf). Default: no limit.
          diskQuotaPerCollection: -1 # The disk quota per collection in MB. Range: (0, +inf). Default: no limit.
      limitReading:
        # forceDeny: false allows DQL requests (except for some
        # specific conditions, such as a collection being dropped). forceDeny: true always rejects all DQL requests.
        forceDeny: false
        queueProtection:
          enabled: false
          # Indicates that the system is under backpressure on the Search/Query path.
          # If the number of queries (NQ) in any QueryNode's queue is greater than nqInQueueThreshold, search&query rates gradually decrease
          # until the NQ in the queue no longer exceeds the threshold. The NQ of a query request is considered 1.
          # Type: integer. Default: no limit.
          nqInQueueThreshold: -1
          # Indicates that the system is under backpressure on the Search/Query path.
          # If the DQL queuing latency is greater than queueLatencyThreshold, search&query rates gradually decrease
          # until the queuing latency no longer exceeds the threshold.
          # The latency refers to the average latency over a period.
          # Unit: milliseconds. Default: no limit.
          queueLatencyThreshold: -1
        resultProtection:
          enabled: false
          # Indicates that the system is under backpressure on the Search/Query path.
          # If the DQL result rate is greater than maxReadResultRate, search&query rates gradually decrease
          # until the read result rate no longer exceeds the threshold.
          # Unit: MB/s. Default: no limit.
          maxReadResultRate: -1
        # The speed at which search and query rates decrease.
        # Range: (0, 1].
        coolOffSpeed: 0.9
  3. In the Note dialog, enter a reason for the change and click OK.

    If the modified parameters require a restart, the instance restarts automatically after you submit the changes. During this process, the instance status changes to Upgrading. After the configuration update is complete, the cluster status returns to Running.