You can view and modify instance configurations in the Vector Retrieval Service for Milvus console to meet different business requirements.
Procedure
-
Go to the instance configuration page.
-
Log on to the Alibaba Cloud Milvus console.
-
In the left navigation pane, click Instances.
-
In the top menu bar, select a region.
-
On the Instances page, click the name of your Milvus instance.
-
Select the Instance Configuration tab.
-
-
In the Instance Configuration text box, copy and paste the following code to overwrite the default configurations, and then click Save Configurations.
The code must be in YAML format. Example:
# Configurations for rootCoord, which handles DDL and DCL requests. rootCoord: maxDatabaseNum: 64 # Maximum number of databases. maxPartitionNum: 4096 # Maximum number of partitions in a collection. minSegmentSizeToEnableIndex: 1024 # If a segment is smaller than this value, it is not indexed. importTaskExpiration: 900 # The duration in seconds after which an import task expires and is terminated. Default value: 900 (15 minutes). importTaskRetention: 86400 # The minimum duration in seconds that Milvus retains import task records. Default value: 86400 (24 hours). grpc: serverMaxSendSize: 536870912 serverMaxRecvSize: 268435456 clientMaxSendSize: 268435456 clientMaxRecvSize: 536870912 # Configurations for the proxy, which validates client requests and reduces the returned results. proxy: timeTickInterval: 200 # The interval in milliseconds at which the proxy syncs the time tick. healthCheckTimeout: 3000 # The interval in milliseconds for component health checks. maxNameLength: 255 # Maximum length of a collection name or alias. # Max number of fields in a collection. # As of Milvus 2.2.0, don't set maxFieldNum to 64 or a greater value. # Adjust this parameter at your own risk. maxFieldNum: 64 maxTaskNum: 1024 # The maximum number of tasks in the proxy task queue. grpc: serverMaxSendSize: 268435456 serverMaxRecvSize: 67108864 clientMaxSendSize: 268435456 clientMaxRecvSize: 67108864 # Configurations for queryCoord, which manages topology and load balancing for query nodes, and handles handoffs from growing segments to sealed segments. queryCoord: autoHandoff: true # Enables or disables automatic handoff. autoBalance: true # Enables or disables automatic balancing. balancer: ScoreBasedBalancer # The balancer to use. overloadedMemoryThresholdPercentage: 90 # The memory overload threshold, as a percentage. balanceIntervalSeconds: 60 memoryUsageMaxDifferencePercentage: 30 checkInterval: 1000 channelTaskTimeout: 60000 # 1 minute segmentTaskTimeout: 120000 # 2 minutes distPullInterval: 500 heartbeatAvailableInterval: 10000 # The interval in seconds. Only Query Nodes that fetch heartbeats within this interval are considered available. loadTimeoutSeconds: 600 checkHandoffInterval: 5000 grpc: serverMaxSendSize: 536870912 serverMaxRecvSize: 268435456 clientMaxSendSize: 268435456 clientMaxRecvSize: 536870912 # Configurations for queryNode, which runs hybrid searches between vector and scalar data. queryNode: dataSync: flowGraph: maxQueueLength: 16 # Maximum length of the task queue in the flow graph. maxParallelism: 1024 # Maximum number of parallel tasks in the flow graph. stats: publishInterval: 1000 # The interval in milliseconds at which a Query Node reports node information. segcore: cgoPoolSizeRatio: 2.0 # The ratio of the cgo pool size to the maximum read concurrency. knowhereThreadPoolNumRatio: 4 # Use more threads to improve SSD throughput for on-disk indexes. # This parameter takes effect only when enable-disk is set to true. # This value must be greater than 1 and less than 32. chunkRows: 128 # The number of vectors in a chunk. exprEvalBatchSize: 8192 # The batch size for the executor to get the next batch. interimIndex: # Builds an interim vector index for a growing segment or binary log to accelerate searches. enableIndex: true nlist: 128 # The nlist value for the segment index. nprobe: 16 # The nprobe value for searching a segment. This value must be smaller than nlist and depends on your accuracy requirements. memExpansionRate: 1.15 # The ratio of memory usage for building the interim index to the raw data size. loadMemoryUsageFactor: 1 # The multiplication factor for calculating memory usage when loading segments. enableDisk: false # Specifies whether to enable the query node to load disk indexes and search on them. maxDiskUsagePercentage: 95 grouping: enabled: true maxNQ: 1000 topKMergeRatio: 20 scheduler: receiveChanSize: 10240 unsolvedQueueSize: 10240 # The concurrency ratio for read tasks, such as search and query tasks. # The maximum read concurrency is calculated as runtime.NumCPU × maxReadConcurrentRatio. # The default value is 2, which means the maximum read concurrency is runtime.NumCPU × 2. # The maximum read concurrency must be greater than or equal to 1 and less than or equal to runtime.NumCPU × 100. # Range: (0, 100]. maxReadConcurrentRatio: 1 cpuRatio: 10 # The ratio used to estimate CPU usage for read tasks. maxTimestampLag: 86400 # The scheduling policy for read tasks. Default value: fifo. Optional value: user-task-polling. scheduleReadPolicy: # fifo: A First-In, First-Out (FIFO) queue supports the schedule. # user-task-polling: # User tasks are polled and scheduled one by one. # Scheduling is fair at the task level. # The policy is based on the username for authentication. # An empty username is treated as the same user. # If there are no multiple users, the policy defaults to FIFO. name: fifo maxPendingTask: 10240 # user-task-polling configurations: taskQueueExpire: 60 # The expiration time in seconds for the inner user task queue after it becomes empty. Default value: 60 (1 minute). enableCrossUserGrouping: false # Enables or disables cross-user grouping when using the user-task-polling policy. Default value: false. Set this to false if a user's tasks cannot be merged with others. maxPendingTaskPerUser: 1024 # The maximum number of pending tasks per user in the scheduler. Default value: 50. grpc: serverMaxSendSize: 536870912 serverMaxRecvSize: 268435456 clientMaxSendSize: 268435456 clientMaxRecvSize: 536870912 indexCoord: bindIndexNodeMode: enable: false withCred: false segment: minSegmentNumRowsToEnableIndex: 1024 # The minimum threshold. If the number of rows in a segment is less than this value, the segment is not indexed. indexNode: scheduler: buildParallel: 1 enableDisk: true # Enables or disables the index node to build disk vector indexes. maxDiskUsagePercentage: 95 grpc: serverMaxSendSize: 536870912 serverMaxRecvSize: 268435456 clientMaxSendSize: 268435456 clientMaxRecvSize: 536870912 dataCoord: channel: watchTimeoutInterval: 300 # The timeout in seconds for watching channels. A DataNode tickler update of the watch progress resets the timeout timer. balanceSilentDuration: 300 # The duration in seconds before the channelBalancer on the dataCoord runs. balanceInterval: 360 # The interval in seconds at which the channelBalancer on the dataCoord checks the balance status. segment: maxSize: 1024 # The maximum size of a segment, in MB. diskSegmentMaxSize: 2048 # The maximum size in MB of a segment for a collection that has a on-disk index. sealProportion: 0.12 # The assignment expiration time, in milliseconds. # Warning: this is an expert parameter and is closely related to data integrity. Do not change it without a # specific goal and a solid understanding of the scenarios. If you must alter # this parameter, make sure that the new value is larger than the previous value used before the restart. # Otherwise, there is a high risk of data loss. assignmentExpiration: 2000 maxLife: 86400 # The maximum lifetime of a segment, in seconds. 24 × 60 × 60. # If a segment does not accept DML records within maxIdleTime and its size is greater than # minSizeFromIdleToSealed, Milvus automatically seals it. # The maximum idle time of a segment, in seconds. 10 × 60. maxIdleTime: 600 minSizeFromIdleToSealed: 16 # The minimum size in MB for a segment to be sealed after being idle. # The maximum number of binary logging files for one segment. The segment is sealed if # the number of binary logging files reaches this value. maxBinlogFileNumber: 32 smallProportion: 0.5 # A segment is considered a "small segment" when its number of rows is smaller than # (smallProportion × segment max # of rows). # Compaction occurs on small segments if the segment after compaction has compactableProportion: 0.85 # more than (compactableProportion × segment max # of rows) rows. # This value must be greater than or equal to <smallProportion>. # During compaction, the number of rows in a segment can exceed the maximum number of rows by (expansionRate - 1) × 100%. expansionRate: 1.25 # Enables or disables level-zero segments. enableLevelZero: false enableCompaction: true # Enables or disables data segment compaction. compaction: enableAutoCompaction: true rpcTimeout: 10 # The timeout for compaction RPC requests, in seconds. maxParallelTaskNum: 10 # The maximum number of parallel compaction tasks. indexBasedCompaction: true levelzero: forceTrigger: minSize: 8 # The minimum size in MB to force trigger a LevelZero compaction. deltalogMinNum: 10 # The minimum number of deltalog files to force trigger a LevelZero compaction. enableGarbageCollection: true gc: interval: 3600 # The garbage collection (GC) interval, in seconds. missingTolerance: 3600 # The tolerance duration in seconds for missing file metadata. dropTolerance: 10800 # The tolerance duration in seconds for files that belong to a dropped entity. enableActiveStandby: false grpc: serverMaxSendSize: 536870912 serverMaxRecvSize: 268435456 clientMaxSendSize: 268435456 clientMaxRecvSize: 536870912 dataNode: dataSync: flowGraph: maxQueueLength: 16 # Maximum length of the task queue in the flow graph. maxParallelism: 1024 # Maximum number of tasks that can be run in parallel in the flow graph. maxParallelSyncMgrTasks: 256 # The maximum number of concurrent sync tasks for the DataNode sync manager globally. skipMode: # If only timetick messages are in the flow graph for a period longer than coldTime, # the flow graph enters skip mode to skip most timeticks. This reduces costs, especially when there are many channels. enable: true skipNum: 4 coldTime: 60 segment: insertBufSize: 16777216 # The maximum buffer size to flush for a single segment. deleteBufBytes: 67108864 # The maximum buffer size to flush deletions for a single channel. syncPeriod: 600 # The period in seconds to sync segments if the buffer is not empty. # You can specify an IP address. For example: # ip: 127.0.0.1 grpc: serverMaxSendSize: 536870912 serverMaxRecvSize: 268435456 clientMaxSendSize: 268435456 clientMaxRecvSize: 536870912 memory: forceSyncEnable: true # `true`: Forces a sync if memory usage is too high. forceSyncSegmentNum: 1 # The number of segments to sync. Segments with the largest buffers are synced first. watermarkStandalone: 0.2 # The memory watermark for standalone mode. When this watermark is reached, segments are synced. watermarkCluster: 0.5 # The memory watermark for cluster mode. When this watermark is reached, segments are synced. timetick: byRPC: true channel: # Specifies the size of the global work pool for all channels. # If this parameter is less than or equal to 0, it is set to the maximum number of CPUs that can be executing. # Set a larger value for large numbers of collections to avoid blocking. workPoolSize: -1 # Specifies the size of the global work pool for channel checkpoint updates. # If this parameter is less than or equal to 0, it is set to 1000. # Set a larger value for large numbers of collections to avoid blocking. updateChannelCheckpointMaxParallel: 1000 grpc: client: compressionEnabled: false dialTimeout: 200 keepAliveTime: 10000 keepAliveTimeout: 20000 maxAttempts: 10 initialBackoff: 0.2 # seconds maxBackoff: 10 # seconds quotaAndLimits: enabled: true # `true`: Enables quotas and limits. `false`: Disables quotas and limits. limits: maxCollectionNum: 65536 maxCollectionNumPerDB: 65536 # The interval in seconds at which the quotaCenter # collects metrics from proxies, the query cluster, and the data cluster. # Range: 0 to 65536. quotaCenterCollectInterval: 3 ddl: enabled: false collectionRate: -1 # The rate limit in queries per second (qps) for CreateCollection, DropCollection, LoadCollection, and ReleaseCollection. Default: no limit. partitionRate: -1 # The rate limit in qps for CreatePartition, DropPartition, LoadPartition, and ReleasePartition. Default: no limit. indexRate: enabled: false max: -1 # The rate limit in qps for CreateIndex and DropIndex. Default: no limit. flushRate: enabled: false max: -1 # The rate limit in qps for flush operations. Default: no limit. compactionRate: enabled: false max: -1 # The rate limit in qps for manual compaction. Default: no limit. dml: # DML rate limits. Default: no limit. # The rate does not exceed the max value. enabled: false insertRate: collection: max: -1 # The maximum rate in MB/s. Default: no limit. max: -1 # The maximum rate in MB/s. Default: no limit. upsertRate: collection: max: -1 # The maximum rate in MB/s. Default: no limit. max: -1 # The maximum rate in MB/s. Default: no limit. deleteRate: collection: max: -1 # The maximum rate in MB/s. Default: no limit. max: -1 # The maximum rate in MB/s. Default: no limit. bulkLoadRate: collection: max: -1 # The maximum rate in MB/s. Default: no limit. Not supported yet. TODO: Limit the bulkLoad rate. max: -1 # The maximum rate in MB/s. Default: no limit. Not supported yet. TODO: Limit the bulkLoad rate. dql: # DQL rate limits. Default: no limit. # The rate does not exceed the max value. enabled: false searchRate: collection: max: -1 # The maximum rate in vectors per second (vps). Default: no limit. max: -1 # The maximum rate in vps. Default: no limit. queryRate: collection: max: -1 # The maximum rate in qps. Default: no limit. max: -1 # The maximum rate in qps. Default: no limit. limitWriting: # forceDeny: false allows DML requests (except under specific # conditions, such as node memory reaching a watermark). forceDeny: true always rejects all DML requests. forceDeny: false ttProtection: enabled: false # Indicates the backpressure for DML operations. # DML rates are reduced based on the ratio of the time tick delay to maxTimeTickDelay. # If the time tick delay is greater than maxTimeTickDelay, all DML requests are rejected. # Unit: seconds. maxTimeTickDelay: 300 memProtection: # If memory usage > memoryHighWaterLevel, all DML requests are rejected. # If memoryLowWaterLevel < memory usage < memoryHighWaterLevel, the DML rate is reduced. # If memory usage < memoryLowWaterLevel, no action is taken. enabled: true dataNodeMemoryLowWaterLevel: 0.85 # The memoryLowWaterLevel in DataNodes. Range: (0, 1]. dataNodeMemoryHighWaterLevel: 0.95 # The memoryHighWaterLevel in DataNodes. Range: (0, 1]. queryNodeMemoryLowWaterLevel: 0.85 # The memoryLowWaterLevel in QueryNodes. Range: (0, 1]. queryNodeMemoryHighWaterLevel: 0.95 # The memoryHighWaterLevel in QueryNodes. Range: (0, 1]. growingSegmentsSizeProtection: # No action is taken if the growing segment size is less than the low watermark. # When the growing segment size exceeds the low watermark, the DML rate is reduced, # but the rate will not be lower than minRateRatio × dmlRate. enabled: false minRateRatio: 0.5 lowWaterLevel: 0.2 highWaterLevel: 0.4 diskProtection: enabled: true # If the total file size in object storage is greater than diskQuota, all DML requests are rejected. diskQuota: -1 # The disk quota in MB. Range: (0, +inf). Default: no limit. diskQuotaPerCollection: -1 # The disk quota per collection in MB. Range: (0, +inf). Default: no limit. limitReading: # forceDeny: false allows DQL requests (except for some # specific conditions, such as a collection being dropped). forceDeny: true always rejects all DQL requests. forceDeny: false queueProtection: enabled: false # Indicates that the system is under backpressure on the Search/Query path. # If the number of queries (NQ) in any QueryNode's queue is greater than nqInQueueThreshold, search&query rates gradually decrease # until the NQ in the queue no longer exceeds the threshold. The NQ of a query request is considered 1. # Type: integer. Default: no limit. nqInQueueThreshold: -1 # Indicates that the system is under backpressure on the Search/Query path. # If the DQL queuing latency is greater than queueLatencyThreshold, search&query rates gradually decrease # until the queuing latency no longer exceeds the threshold. # The latency refers to the average latency over a period. # Unit: milliseconds. Default: no limit. queueLatencyThreshold: -1 resultProtection: enabled: false # Indicates that the system is under backpressure on the Search/Query path. # If the DQL result rate is greater than maxReadResultRate, search&query rates gradually decrease # until the read result rate no longer exceeds the threshold. # Unit: MB/s. Default: no limit. maxReadResultRate: -1 # The speed at which search and query rates decrease. # Range: (0, 1]. coolOffSpeed: 0.9 -
In the Note dialog, enter a reason for the change and click OK.
If the modified parameters require a restart, the instance restarts automatically after you submit the changes. During the restart, the instance status changes to Upgrading. After the update is complete, the cluster status returns to Running.