All Products
Search
Document Center

PolarDB:Full-text search acceleration (rum)

Last Updated:Mar 30, 2026

GIN indexes perform poorly when queries require relevance sorting, phrase matching, or sorting by an attached column such as a timestamp: they lack position information and cannot store data from other columns, so every sort requires an extra table scan. The rum extension solves all three problems by storing word positions and optional extra-column data directly in the index. Use it when full-text searches need fast relevance ranking, phrase search, or sorting by a secondary field.

Prerequisites

Before you begin, ensure that:

  • Your PolarDB for PostgreSQL cluster runs one of the following minor engine versions or later: You can view the minor engine version in the console or by running SHOW polardb_version;. If your version does not meet the requirement, upgrade the minor engine version. For more information, see View the minor engine version.

    PostgreSQL version Minimum minor engine version
    PostgreSQL 18 2.0.18.1.2.0
    PostgreSQL 17 2.0.17.6.4.0
    PostgreSQL 16 2.0.16.8.3.0
    PostgreSQL 15 2.0.15.7.1.1
    PostgreSQL 14 2.0.14.5.3.0
  • The smlar extension is not active in the same database schema. The % operator of rum conflicts with the % operator of smlar. If both extensions are required, install them in separate schemas.

GIN vs. RUM: when to use each

The RUM index trades a larger index and slower writes for significantly faster sorting, phrase search, and attached-column ordering. The table below summarizes the trade-offs.

Dimension GIN index RUM index
Relevance sorting Slow; requires table lookups Fast; sorts directly within the index
Phrase search Slow; requires table lookups for word positions Fast; positions stored in the index
Attached-column sorting Not supported Supported; attach a column (e.g., timestamp) to the index
Write performance Faster Slower; more complex index schema
Index size Smaller Larger; stores extra position and column data
Prefix search Supported Supported by rum_tsvector_ops; not supported by rum_*_hash_* operator classes

Choose the RUM index for workloads that frequently sort full-text search results by relevance, time, price, or any other attached column. For write-intensive tables or simple keyword matching without sorting, the built-in GIN index is more cost-effective.

Install the extension

Connect to your database and run:

CREATE EXTENSION rum;

To uninstall:

DROP EXTENSION rum;

Operators and operator classes

An operator class defines how a RUM index processes a specific data type. Choosing the correct operator class determines which operators PostgreSQL can use to accelerate queries with the index. For background, see Operator classes and operator families.

Operators

Operator Input types Return type Description
A @@ B Left: tsvector, right: tsquery bool Returns true if the full-text vector matches the query
A <=> B Left: tsvector, right: tsquery float4 Returns the distance between the vector and the query; a smaller value means higher relevance
A <=> B timestamp, timestamptz, int2, int4, int8, float4, float8, money, oid float8 Returns the absolute difference between two values; time differences are in seconds (microsecond precision); money differences are in cents
A <=| B timestamp, timestamptz, int2, int4, int8, float4, float8, money, oid float8 Returns B - A if A ≤ B; otherwise returns infinity
A |=> B timestamp, timestamptz, int2, int4, int8, float4, float8, money, oid float8 Returns A - B if A > B; otherwise returns infinity

Operator classes for tsvector

Operator class Supported operators Notes
rum_tsvector_ops WHERE: @@; ORDER BY: <=> Stores lexemes and their positions. Supports full-text search, prefix search, and relevance sorting. The most common choice for full-text search.
rum_tsvector_hash_ops WHERE: @@; ORDER BY: <=> Stores hash values and positions instead of raw lexemes. Supports full-text search and relevance sorting, but not prefix search. The index may be smaller than rum_tsvector_ops, but hash collisions can require a recheck.
rum_tsvector_addon_ops WHERE: @@ Attaches data from an additional column (such as a timestamp) to the tsvector index entries. Enables full-text search on the primary column and efficient ORDER BY on the attached column in a single index scan. The attached column's type must have a corresponding rum_<TYPE>_ops class, and ORDER BY must use <=>, <=|, or |=>.
rum_tsvector_hash_addon_ops WHERE: @@ Same as rum_tsvector_addon_ops but stores hash values of lexemes. Does not support prefix search. The index may be smaller, but hash collisions can require a recheck and may make searches slower than rum_tsvector_addon_ops.
rum_tsquery_ops WHERE: @@ Indexes a tsquery column for inverted matching: quickly finds which stored query conditions match a given document.

Operator classes for arrays

Operator class Applicable types Supported operators Notes
rum_anyarray_ops anyarray (e.g., int[], text[], varchar[]) WHERE: &&, @>, <@, =, %; ORDER BY: <=> Supports array overlap, containment, equality, and similarity checks (similarity is computed against a threshold; arrays exceeding the threshold are considered similar), plus sorting by distance between arrays.
rum_anyarray_addon_ops anyarray (e.g., int[], text[], varchar[]) WHERE: &&, @>, <@, =, %; ORDER BY: <=> Same as rum_anyarray_ops with an attached column. The % operator uses the same threshold-based similarity check. The attached column's type must have a corresponding rum_<TYPE>_ops class, and ORDER BY must use <=>, <=|, or |=>.

Operator classes for other data types

Operator class Applicable types Supported operators
rum_<TYPE>_ops int2, int4, int8, float4, float8, money, oid, time, timetz, date, interval, macaddr, inet, cidr, text, varchar, char, bytea, bit, varbit, numeric, timestamp, timestamptz WHERE: <, <=, =, >=, >; ORDER BY (int2, int4, int8, float4, float8, money, oid, timestamp, timestamptz only): <=>, <=|, |=>

Use rum_<TYPE>_ops for range queries and distance sorting on non-text, non-array columns.

Examples

Sort full-text search results by relevance

This example creates a tsvector column with a RUM index and queries it ordered by relevance score.

  1. Create the table and insert test data:

    CREATE TABLE t1(
      t text,
      t_vec tsvector GENERATED ALWAYS AS (to_tsvector('pg_catalog.english', t)) STORED
    );
    
    INSERT INTO t1(t) VALUES ('The situation is most beautiful');
    INSERT INTO t1(t) VALUES ('It is a beautiful');
    INSERT INTO t1(t) VALUES ('It looks like a beautiful place');
  2. Create a RUM index using the rum_tsvector_ops operator class:

    CREATE INDEX t1_t_vec_idx ON t1 USING rum (t_vec rum_tsvector_ops);
  3. Query and sort by relevance. The <=> operator returns a distance value — smaller means more relevant:

    SET enable_seqscan TO off;
    
    SELECT t, t_vec <=> to_tsquery('english', 'beautiful | place') AS rank
    FROM t1
    WHERE t_vec @@ to_tsquery('english', 'beautiful | place')
    ORDER BY t_vec <=> to_tsquery('english', 'beautiful | place');

    Result:

    t                                |  rank
    ---------------------------------+---------
     It looks like a beautiful place | 8.22467
     The situation is most beautiful | 16.4493
     It is a beautiful               | 16.4493

Sort by timestamp with full-text filtering

This example shows how to attach a timestamp column to a tsvector index. Both full-text filtering and timestamp-based sorting complete in a single index scan.

  1. Create the table and insert sample data:

    CREATE TABLE tsts (id int, t tsvector, d timestamp);
    
    INSERT INTO tsts VALUES
    (354, to_tsvector('wr qh'), '2016-05-16 14:21:22.326724'),
    (355, to_tsvector('wr qh'), '2016-05-16 13:21:22.326724'),
    (356, to_tsvector('ts op'), '2016-05-16 18:21:22.326724'),
    (358, to_tsvector('ts op'), '2016-05-16 23:21:22.326724'),
    (371, to_tsvector('wr qh'), '2016-05-17 06:21:22.326724'),
    (406, to_tsvector('wr qh'), '2016-05-18 17:21:22.326724'),
    (415, to_tsvector('wr qh'), '2016-05-19 02:21:22.326724');
  2. Create a RUM index with the d column attached to the t index entries. The WITH (attach = 'd', to = 't') clause stores values from d alongside the tsvector entries for t, so sorting on d requires no separate table lookup:

    CREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_addon_ops, d) WITH (attach = 'd', to = 't');
  3. Query records matching the full-text filter and sort by proximity to a target timestamp:

    SET enable_seqscan TO off;
    
    EXPLAIN (costs off)
    SELECT id, d, d <=> '2016-05-16 14:21:25' AS distance
    FROM tsts
    WHERE t @@ 'wr&qh'
    ORDER BY d <=> '2016-05-16 14:21:25'
    LIMIT 5;

    The execution plan confirms that both filtering and sorting use a single index scan:

    QUERY PLAN
    ------------------------------------------------------------------------------
     Limit
       ->  Index Scan using tsts_idx on tsts
             Index Cond: (t @@ '''wr'' & ''qh'''::tsquery)
             Order By: (d <=> '2016-05-16 14:21:25'::timestamp without time zone)

    Result:

    id  |             d              |   distance
    -----+----------------------------+---------------
     354 | 2016-05-16 14:21:22.326724 |      2.673276
     355 | 2016-05-16 13:21:22.326724 |   3602.673276
     371 | 2016-05-17 06:21:22.326724 |  57597.326724
     406 | 2016-05-18 17:21:22.326724 | 183597.326724
     415 | 2016-05-19 02:21:22.326724 | 215997.326724

Query and sort arrays by similarity

This example indexes an integer array column and retrieves rows that overlap with a target array, ordered by similarity.

  1. Create the table and insert sample data:

    CREATE TABLE test_array (id serial, i int2[]);
    INSERT INTO test_array(i) VALUES ('{}'), ('{0}'), ('{1,2,3,4}'), ('{1,2,3}'), ('{1,2}'), ('{1}');
  2. Create a RUM index using the rum_anyarray_ops operator class:

    CREATE INDEX idx_array ON test_array USING rum (i rum_anyarray_ops);
  3. Find rows that contain the element 1 and sort by similarity to {1}. The && operator checks for array overlap, and <=> returns the distance — smaller means more similar:

    SELECT *
    FROM test_array
    WHERE i && '{1}'
    ORDER BY i <=> '{1}' ASC;

    Result:

    i
    -----------
     {1}
     {1,2}
     {1,2,3}
     {1,2,3,4}

Match documents against stored query rules

This example indexes a tsquery column to perform inverted matching: given a new document, find all stored query rules that the document satisfies. This pattern is useful for subscription or alert systems.

  1. Create the query rule table and insert rules:

    CREATE TABLE query (id serial, q tsquery, tag text);
    
    INSERT INTO query (q, tag) VALUES
    ('supernova & star', 'sn'),
    ('black', 'color'),
    ('big & bang & black & hole', 'bang'),
    ('spiral & galaxy', 'shape'),
    ('black & hole', 'color');
  2. Create a RUM index on the tsquery column:

    CREATE INDEX query_idx ON query USING rum(q rum_tsquery_ops);
  3. Match a new document against all stored rules:

    SELECT *
    FROM query
    WHERE to_tsvector('black holes never exists before we think about them') @@ q;

    Result:

    id | q       | tag
    ---+---------+-------
     2 | 'black' | color

Limitations

  • Write performance and index size: The RUM index stores extra information, including word positions and attached-column data. This makes it larger than a GIN index and increases the overhead of INSERT and UPDATE operations. Evaluate the trade-off carefully for write-intensive tables where storage is a concern.

  • No prefix search with hash operator classes: Indexes created with rum_tsvector_hash_ops or rum_tsvector_hash_addon_ops do not support prefix search. These operator classes store hash values of lexemes rather than the original text.

References