All Products
Search
Document Center

OpenSearch:Configure an index table loading policy

Last Updated:Apr 01, 2026

An index table loading policy controls how each index file is loaded into memory when a Searcher worker loads an index table. The policy is a list of rules — the system matches each index file against the rules from top to bottom and applies the first match.

How it works

The configuration is a load_config array. Each entry specifies:

  • Which files to target (file_patterns): a list of regular expressions matched against file paths relative to the segment directory

  • How to load them (load_strategy): mmap (memory-mapped files) or cache (block cache)

  • Where to read from (remote) and whether to copy locally (deploy)

  • Whether to prefetch (warmup_strategy): applies only to mmap

Sample configuration

{
    "load_config": [
        {
            "file_patterns": [
                "_ATTRIBUTE_",
                "/index/title/.*",
                "/index/body/dictionary"
            ],
            "load_strategy": "mmap",
            "lifecycle": "hot",
            "load_strategy_param": {
                "lock": true,
                "partial_lock": true,
                "advise_random": false,
                "slice": 4194304,
                "interval": 2
            },
            "remote": false,
            "deploy": true,
            "warmup_strategy": "sequential"
        },
        {
            "file_patterns": [
                "_SUMMARY_"
            ],
            "load_strategy": "cache",
            "load_strategy_param": {
                "global_cache": false,
                "direct_io": true,
                "cache_size": 4096
            },
            "remote": true,
            "deploy": false
        },
        {
            "file_patterns": [
                ".*"
            ],
            "warmup_strategy": "none",
            "load_strategy": "mmap",
            "load_strategy_param": {
                "lock": false
            }
        }
    ]
}

Parameters

file_patterns

A list of regular expressions matched against the file path relative to the segment directory. The system uses the first rule whose file_patterns matches the file.

To target the inverted index named title, use /index/title/.* — the title directory lives under the index directory, and .* matches all files in it.

Three built-in macros simplify common patterns:

MacroEquivalent patternTargets
_ATTRIBUTE_/attribute/.*All forward indexes
_INDEX_/index/.*All inverted indexes
_SUMMARY_/summary/All summary indexes

load_strategy

Valid values: mmap, cache.

load_strategy_param — mmap parameters

ParameterDefaultDescription
lockfalsePins matched indexes in memory so they cannot be swapped out. Ensures consistent query latency but increases memory usage.
partial_lockfalseFor inverted indexes only: locks the first-level dictionary in memory but leaves the second-level dictionary unpinned. Reduces memory overhead compared to full lock while protecting the most frequently accessed structure.
advise_randomfalseSignals the OS to minimize read-ahead requests. Set to true when indexes are larger than available memory and disk I/O is a query bottleneck — this significantly reduces unnecessary prefetching from disk.
slice4194304 (4 MB)Bytes read per prefetching step. Used together with interval.
interval0 (no throttling)Milliseconds the system sleeps between prefetching steps. Set together with slice to control the prefetching rate and avoid saturating disk I/O during index load.

load_strategy_param — cache parameters

ParameterDefaultDescription
direct_iofalseReads files using Direct I/O, bypassing the OS page cache. Improves query performance when reading from SSDs.
global_cachefalseEnables the global block cache. Keep this set to false — the global block cache is not available.
cache_size1 MBSize of the per-rule block cache. Takes effect only when global_cache is false.
block_size4096 bytesSize of a single cache block.

remote

Reads matching index files from the remote distributed storage system instead of local disk. Takes effect only when need_read_remote_index is set to true — if need_read_remote_index is false, remote is fixed as false regardless of this setting.

Valid values: true, false.

deploy

Distributes matching index files to local disks. Takes effect only when need_deploy_index is set to true — if need_deploy_index is false, deploy is fixed as false.

Valid values: true, false.

warmup_strategy

Controls prefetching when the index is loaded. Applies only when load_strategy is mmap.

ValueBehavior
none (default)No prefetching. The OS loads pages on demand as queries access them.
sequentialPrefetches data in sequence.

Examples

mmap loading policy

{
    "load_config": [
        {
            "file_patterns": [
                "/attribute/price/.*",
                "/index/title/.*",
                "/index/body/dictionary",
                "/index/vector/aitheta.*"
            ],
            "load_strategy": "mmap",
            "load_strategy_param": {
                "lock": true,
                "partial_lock": true,
                "slice": 4194304,
                "interval": 2
            },
            "remote": false,
            "deploy": true,
            "warmup_strategy": "sequential"
        },
        {
            "file_patterns": [
                "/attribute/tags",
                "/index/description/.*"
            ],
            "load_strategy": "mmap",
            "load_strategy_param": {
                "lock": false
            },
            "remote": false,
            "deploy": true,
            "warmup_strategy": "none"
        }
    ]
}

This configuration applies two rules:

  • Rule 1: The price attribute, title inverted index, body dictionary, and vector index are locked in memory and prefetched sequentially at load time. The partial_lock setting locks only the first-level dictionary of the inverted indexes, saving memory compared to a full lock. The slice and interval settings throttle prefetching to 4 MB every 2 ms.

  • Rule 2: The tags attribute and description inverted index use mmap without locking or prefetching — they are paged in on demand.

cache loading policy

{
    "load_config": [
        {
            "file_patterns": [
                "_ATTRIBUTE_"
            ],
            "load_strategy": "cache",
            "load_strategy_param": {
                "global_cache": false,
                "direct_io": true,
                "cache_size": 20480
            },
            "remote": false,
            "deploy": true
        },
        {
            "file_patterns": [
                "/summary/data"
            ],
            "load_strategy": "cache",
            "load_strategy_param": {
                "global_cache": false,
                "direct_io": true,
                "cache_size": 4096
            },
            "remote": false,
            "deploy": true
        },
        {
            "file_patterns": [
                ".*"
            ],
            "warmup_strategy": "none",
            "load_strategy": "mmap",
            "load_strategy_param": {
                "lock": false
            }
        }
    ]
}

All attribute fields use a 20 GB block cache with Direct I/O. Summary data files use a separate 4 GB block cache. Everything else falls through to the catch-all mmap rule.

Storage-computing separation

To enable storage-computing separation, set the need_read_remote_index parameter to true. Index files with remote: true are read directly from the remote distributed storage system; files with deploy: false are not copied to local disk.

{
    "load_config": [
        {
            "file_patterns": [
                "/index/title/.*"
            ],
            "load_strategy": "mmap",
            "load_strategy_param": {
                "lock": true,
                "partial_lock": true,
                "slice": 4194304,
                "interval": 2
            },
            "remote": false,
            "deploy": true,
            "warmup_strategy": "sequential"
        },
        {
            "file_patterns": [
                "_ATTRIBUTE_"
            ],
            "load_strategy": "cache",
            "load_strategy_param": {
                "global_cache": false,
                "direct_io": true,
                "cache_size": 20480
            },
            "remote": true,
            "deploy": false
        },
        {
            "file_patterns": [
                "/summary/data"
            ],
            "load_strategy": "cache",
            "load_strategy_param": {
                "global_cache": false,
                "direct_io": true,
                "cache_size": 4096
            },
            "remote": true,
            "deploy": false
        },
        {
            "file_patterns": [
                ".*"
            ],
            "warmup_strategy": "none",
            "load_strategy": "mmap",
            "load_strategy_param": {
                "lock": false
            }
        }
    ]
}

The title inverted index is deployed locally and locked in memory — it stays on the local disk for low-latency access. Attribute and summary files are read from remote storage on demand with a block cache, keeping local disk usage minimal.

Index file directory structure

Understanding the directory structure helps you write accurate file_patterns expressions. All paths in file_patterns are relative to the segment directory (for example, segment_0).

generation_0/
  partition_0_65535/
    index_format_version
    index_partition_meta
    schema.json
    segment_0/
      attribute/
        <attribute_name>/
          data
      deletionmap/
      deploy_index/
      index/
        <index_name>/
          bitmap_dictionary
          bitmap_posting
          dictionary
          posting
        <vector_index_name>/
          aitheta.index
          aitheta.index.addr
      summary/
        data
        offset
      segment_info
    adaptive_bitmap__meta/
      deploy_index
      dictionary_name
    truncate_meta/
      deploy_index
      truncate_meta_file
    version.0
ItemDescription
generationIdentifies the version of a full index in OpenSearch Retrieval Engine Edition.
partitionThe basic unit for a Searcher worker to load indexes. Splitting large datasets across multiple partitions keeps each Searcher worker's load manageable.
segmentThe basic unit of an index. Stores inverted and forward index data. The builder creates a segment for each index dump; segments can be merged according to the merge policy.
indexThe basic unit of an inverted index.
attributeThe basic unit of a forward index.
deletionmapTracks deleted documents.
index_format_versionThe index version, used to verify binary compatibility.
index_partition_metaGlobal sorting information for the index, including sort fields and sort order (ascending or descending).
schema.jsonThe index configuration file. Contains field, index, attribute, and summary definitions. OpenSearch Retrieval Engine Edition reads this file when loading indexes.
version.0The index file version number. Lists the segments to load and the timestamp of the most recent document in the partition. Used to filter out outdated documents when building indexes for real-time data.
segment_infoSummary information for a segment: document count, merge status, locator, and the timestamp of the most recent document.
dictionaryThe dictionary of an inverted index.
postingThe posting lists of an inverted index.
bitmap_dictionaryThe dictionary of high-frequency words, present when a bitmap index is created for high-frequency words.
bitmap_postingThe posting lists of high-frequency words, present when a bitmap index is created for high-frequency words.
aitheta.indexVector index files.
aitheta.index.addrMetadata for vector indexes.