Index structure and usage - OpenSearch - Alibaba Cloud Documentation Center

Index table structure

Each document contains multiple fields, and each field holds a collection of terms. Indexes accelerate searches.

Field: Defines the name and type of a field in an index table.
Inverted index: Quickly locates documents containing specific keywords.
Forward index (attribute): Stores mappings from document IDs to fields, such as DocID --> (term1, term2, ..., termN). Forward indexes can be single-value or multi-value. Single-value attributes have a fixed length (except for the STRING type), which allows for high-performance lookups and supports updates. Multi-value attributes store a variable number of items for a field, resulting in slower lookups than single-value attributes and prohibiting updates.

Forward indexes are typically used after a document is found to quickly retrieve its attribute values for statistics, sorting, and filtering. The basic field types supported for forward indexes include:

INT8 (8-bit signed integer), UINT8 (8-bit unsigned integer),

INT16 (16-bit signed integer),

UINT16 (16-bit unsigned integer),

INTEGER (32-bit signed integer),

UINT32 (32-bit unsigned integer), INT64 (64-bit signed integer),

UINT64 (64-bit unsigned integer),

FLOAT (32-bit floating-point number),

DOUBLE (64-bit floating-point number),

STRING (string)
Summary: Stored in a format similar to an attribute, a summary stores multiple fields for a single document together and maps document IDs to their summary content for quick retrieval. Summaries are primarily used to display results. Because summary content is generally large, retrieve only the summaries for documents whose results need to be displayed. The engine supports zlib compression for summaries—configure it in the schema to compress summaries at write time and decompress them at read time.

Note

For more details about index table configuration, see Index Table Configuration.

Example index schema:

{
  "file_compress": [
    {
      "name": "file_compressor",
      "type": "zstd"
    },
    {
      "name": "no_compressor",
      "type": ""
    }
  ],
  "table_name": "test",
  "summarys": {
    "summary_fields": [
      "id",
      "fb_boolean",
      "fb_datetime",
      "fb_string",
      "fb_decimal",
      "fb_bigint",
      "fb_text"
    ],
    "parameter": {
      "file_compressor": "zstd"
    }
  },
  "indexs": [
    {
      "index_name": "id",
      "index_type": "PRIMARYKEY64",
      "index_fields": "id",
      "has_primary_key_attribute": true,
      "is_primary_key_sorted": false
    },
    {
      "index_name": "fb_boolean",
      "index_type": "STRING",
      "index_fields": "fb_boolean",
      "file_compress": "file_compressor",
      "format_version_id": 1
    },
    {
      "index_name": "fb_datetime",
      "index_type": "STRING",
      "index_fields": "fb_datetime",
      "file_compress": "file_compressor",
      "format_version_id": 1
    },
    {
      "index_name": "fb_string",
      "index_type": "STRING",
      "index_fields": "fb_string"
    },
    {
      "index_name": "fb_text",
      "index_type": "TEXT",
      "index_fields": "fb_text"
    }
  ],
  "attributes": [
    {
      "field_name": "id",
      "file_compress": "no_compressor"
    },
    {
      "field_name": "fb_boolean",
      "file_compress": "file_compressor"
    },
    {
      "field_name": "fb_datetime",
      "file_compress": "no_compressor"
    },
    {
      "field_name": "fb_string",
      "file_compress": "file_compressor"
    },
    {
      "field_name": "fb_decimal",
      "file_compress": "no_compressor"
    },
    {
      "field_name": "fb_bigint",
      "file_compress": "no_compressor"
    }
  ],
  "fields": [
    {
      "user_defined_param": {},
      "field_name": "id",
      "field_type": "INT64",
      "compress_type": "equal"
    },
    {
      "field_name": "fb_boolean",
      "field_type": "STRING",
      "compress_type": "uniq"
    },
    {
      "field_name": "fb_datetime",
      "field_type": "STRING",
      "compress_type": "uniq"
    },
    {
      "user_defined_param": {
        "multi_value_sep": ","
      },
      "field_name": "fb_string",
      "field_type": "STRING",
      "compress_type": "equal",
      "multi_value": true
    },
    {
      "field_name": "fb_decimal",
      "field_type": "DOUBLE"
    },
    {
      "field_name": "fb_bigint",
      "field_type": "INT64",
      "compress_type": "equal"
    },
    {
      "field_name": "fb_text",
      "field_type": "TEXT",
      "analyzer": "chn_standard"
    }
  ]
}

Add an index table

On the instance management page, navigate to configuration center > index structure and click Add Index Table.

Configure the index table, select a data source, and set the data shard:

In the index table section, set the index table name. In the data source section, select a configured data source. In the data shard section, set the number of shards. In the Select Template area, you can choose a General Template, Vector: Image Search, or Vector: Text Semantic Search template.

Configure fields. In the field configuration area, you can configure properties for each field, including its name and type, whether it is a multi-value field, whether it is an attribute field, and the settings for its index type and analyzer.

Multi-value Field Delimiter Settings:

By default, the ha3 delimiter ^] is used. You can also customize the delimiter based on your business needs.

Attribute and Field Content Compression:

You can enable or disable compression for attribute fields. Compression is disabled by default. Select file_compressor to enable compression.
You can enable or disable compression for field content. Compression is disabled by default. By default, uniq is selected for multi-value and STRING types, and equal is selected for single-value numeric types.

Note

If you enable attribute compression, we recommend editing the index loading method in Deployment Management > Data Node > Online Table Configuration to reduce the performance impact.

Note

Index settings are displayed in a table with the following columns: Index name, Index type (such as PRIMARYKEY64, STRING, and TEXT), Included fields, Data compression (compressed/uncompressed), Advanced configuration (View/Modify), and Actions (Delete). You can add a new index by clicking the + button at the bottom.

Index Field Compression Settings:

You can choose whether to compress index fields. Compression is disabled by default. Select file_compressor to enable compression.

Note

Primary key indexes do not support compression.
If you enable index compression, we recommend editing the index loading method in Deployment Management > Data Node > Online Table Configuration to reduce the performance impact.

After you finish the configuration, click Save Version. In the dialog box that appears, enter remarks (optional) and click Publish.

In this example, the index table name is set to vector_index, and the editing mode is administrator mode. In the field settings, two fields are added: id (INT64 type, set as the primary key, with the Attribute Field and Display in Search Results options selected) and embedding (STRING type, with the Display in Search Results option selected).

After the index table is successfully added, you can view its topology in Deployment Management.

The topology diagram shows a top-down architecture: the query service connects to a cluster (each cluster contains data nodes and query nodes), the cluster connects to the index table, and the index table connects to the data source. Arrows indicate the data flow between each layer.

To apply the new index table to the cluster, you must manually trigger a configuration update and full indexing in O&M Management. In the Update Configuration operation, select Push Configuration and Trigger Index Rebuild.

For Configuration Type, select offline configuration. In the Index Structure Version area, select the target data source, select the corresponding index table and its version, and then select the desired Dictionary Configuration Version, Offline Configuration Version, and Offline Plug-in Version as needed. Then, click OK.

Select a Data Source Configuration Version, select the Target Cluster, select Push Configuration and Trigger Index Rebuild, and then click OK.

During the index rebuild, you can monitor the progress of the full indexing in Change History under Data Source Changes:

An icon indicates the current execution status of each stage.

After the index rebuild is complete, you can query the new index table.

Important

Field settings must include one and only one primary key.
In field settings, at least one field must be selected for display in search results.
TEXT fields require an analysis method and do not support multi-value.
Index settings must include one and only one primary key index.
In addition to the default delimiter, custom multi-value delimiters must be single characters and cannot be full-width characters.
When setting the number of data shards, ensure the Number of Data Nodes exceeds the number of replicas multiplied by the number of data shards. For example, a cluster with 2 replicas and 2 data shards requires more than 4 data nodes.
Follow these rules when setting the number of shards: A single shard should not contain more than 600 million documents (with a maximum of 2.1 billion). The index size of a single shard should not exceed 300 GB. If you require real-time updates, the transactions per second (TPS) for data updates on a single shard should not exceed 4,000 for add commands. For update commands, the TPS can reach 10,000.

Edit an index table

Introduction to index table versions:

A newly created index table has two versions by default:

index_config_v1: The initial version of the index table. Its status changes to In Use after you push the configuration and complete an index rebuild. Otherwise, its status is Not in Use.
index_config_edit: The version currently being edited. Its status is always Editing.

Each time you publish, the version name increments sequentially (for example, index_config_v2, index_config_v3). Remarks are required for each version to distinguish them.

Edit and publish a new index table version:

Find the version with the status Editing and click Edit.

Expand the row for the index_config_edit version, find test_main_schema.json in the file name list, and click Edit in its row.

Note

Additional information about cluster.json configuration:

The platform allows you to configure index merge strategies. You can configure customized_merge_config and segment_customize_metrics_updater (supported only on new instances).

In the left-side navigation pane, choose configuration center > index structure. Open the main_cluster.json file, find the customized_merge_config configuration item in the key-value table, and click Edit in the Actions column.

For parameter details, see Offline cluster configuration.

After making changes, click Save Version.

In the index structure editor for administrator mode (main_index_schema.json), in the Field settings section, you can configure properties such as field name, field type, primary key, multi-value, analysis method, attribute field, and display in search results. In the Index settings section, you can manage the index name, index type, and included fields.

You can also switch to developer mode to manually edit the schema:

In the JSON editor for developer mode, the summarys section sets compress to false, and summary_fields includes five fields: id, age, name, address, and info. The indexs array defines three indexes: name (type STRING), address (type STRING), and info (type TEXT). After you finish editing, click Save Version.

Find the version with the status Editing, click Publish, enter Remarks, and click OK.

The system then generates a new index table version with the status Not in Use.

To apply the new index table version to the cluster, you must select Push Configuration and Trigger Index Rebuild in O&M Management > Update Configuration.

On the Update Configuration page, select offline configuration for Configuration Type. In the Index Structure Version section, select the target index table and the version to push (for example, index_config_v4). For Whether to trigger index rebuild, select Push configuration and trigger index rebuild, and then click OK.

Delete an index table version:

You can directly delete index table versions with the status Not in Use.

In the confirmation dialog, confirm the deletion—a message will note the action is irreversible—and click OK.

View an index table version:

After you click View, you are redirected to the read-only configuration page for the index table version:

In administrator mode, the Field settings section displays the field configuration of the index table in a table. The columns include Field name, Field type, Primary key, Multi-value, Attribute field, Display in search results, Data compression, Analysis method, and Advanced configuration, providing a clear overview of each field's definition.

Developer mode.

In developer mode, the index table version is displayed in JSON format in an editor with line numbers. The JSON structure includes indexs (index definitions, such as the PRIMARYKEY64 primary key index, which contains parameters like index_fields and is_primary_key_sorted), attributes (a list of attributes), and fields (field definitions, where each field specifies a field_name and field_type, such as INT64 or STRING).

Delete an index table

If an index table has no versions with the status In Use, you can delete it directly from the index structure page.

On the index structure page for the index table, click the Delete link next to the index table name to delete it.

If the index table has a version with the status In Use:

In Deployment Management, click the index table, and unsubscribe from it.

Select the target index table. In the panel that appears, select the Online tab, and click Unsubscribe in the Actions column for the Subscription Status row.
Then, in configuration center > index structure, click the Delete link next to the target index table name to delete the corresponding index table.

Warning

If you unsubscribe from an index table in deployment management, you must delete the corresponding index table in the index structure to avoid disrupting the online cluster.

Usage notes

Adding an index table requires a data source. If none exists, create a data source first.
The index table name cannot be modified after it is created.
You cannot directly delete an index table if it has a version with the status In Use.
Each index table can have only one version with an Editing status.