GIN indexes perform poorly when queries require relevance sorting, phrase matching, or sorting by an attached column such as a timestamp: they lack position information and cannot store data from other columns, so every sort requires an extra table scan. The rum extension solves all three problems by storing word positions and optional extra-column data directly in the index. Use it when full-text searches need fast relevance ranking, phrase search, or sorting by a secondary field.
Prerequisites
Before you begin, ensure that:
-
Your PolarDB for PostgreSQL cluster runs one of the following minor engine versions or later: You can view the minor engine version in the console or by running
SHOW polardb_version;. If your version does not meet the requirement, upgrade the minor engine version. For more information, see View the minor engine version.PostgreSQL version Minimum minor engine version PostgreSQL 18 2.0.18.1.2.0 PostgreSQL 17 2.0.17.6.4.0 PostgreSQL 16 2.0.16.8.3.0 PostgreSQL 15 2.0.15.7.1.1 PostgreSQL 14 2.0.14.5.3.0 -
The
smlarextension is not active in the same database schema. The%operator ofrumconflicts with the%operator ofsmlar. If both extensions are required, install them in separate schemas.
GIN vs. RUM: when to use each
The RUM index trades a larger index and slower writes for significantly faster sorting, phrase search, and attached-column ordering. The table below summarizes the trade-offs.
| Dimension | GIN index | RUM index |
|---|---|---|
| Relevance sorting | Slow; requires table lookups | Fast; sorts directly within the index |
| Phrase search | Slow; requires table lookups for word positions | Fast; positions stored in the index |
| Attached-column sorting | Not supported | Supported; attach a column (e.g., timestamp) to the index |
| Write performance | Faster | Slower; more complex index schema |
| Index size | Smaller | Larger; stores extra position and column data |
| Prefix search | Supported | Supported by rum_tsvector_ops; not supported by rum_*_hash_* operator classes |
Choose the RUM index for workloads that frequently sort full-text search results by relevance, time, price, or any other attached column. For write-intensive tables or simple keyword matching without sorting, the built-in GIN index is more cost-effective.
Install the extension
Connect to your database and run:
CREATE EXTENSION rum;
To uninstall:
DROP EXTENSION rum;
Operators and operator classes
An operator class defines how a RUM index processes a specific data type. Choosing the correct operator class determines which operators PostgreSQL can use to accelerate queries with the index. For background, see Operator classes and operator families.
Operators
| Operator | Input types | Return type | Description |
|---|---|---|---|
A @@ B |
Left: tsvector, right: tsquery |
bool |
Returns true if the full-text vector matches the query |
A <=> B |
Left: tsvector, right: tsquery |
float4 |
Returns the distance between the vector and the query; a smaller value means higher relevance |
A <=> B |
timestamp, timestamptz, int2, int4, int8, float4, float8, money, oid |
float8 |
Returns the absolute difference between two values; time differences are in seconds (microsecond precision); money differences are in cents |
A <=| B |
timestamp, timestamptz, int2, int4, int8, float4, float8, money, oid |
float8 |
Returns B - A if A ≤ B; otherwise returns infinity |
A |=> B |
timestamp, timestamptz, int2, int4, int8, float4, float8, money, oid |
float8 |
Returns A - B if A > B; otherwise returns infinity |
Operator classes for tsvector
| Operator class | Supported operators | Notes |
|---|---|---|
rum_tsvector_ops |
WHERE: @@; ORDER BY: <=> |
Stores lexemes and their positions. Supports full-text search, prefix search, and relevance sorting. The most common choice for full-text search. |
rum_tsvector_hash_ops |
WHERE: @@; ORDER BY: <=> |
Stores hash values and positions instead of raw lexemes. Supports full-text search and relevance sorting, but not prefix search. The index may be smaller than rum_tsvector_ops, but hash collisions can require a recheck. |
rum_tsvector_addon_ops |
WHERE: @@ |
Attaches data from an additional column (such as a timestamp) to the tsvector index entries. Enables full-text search on the primary column and efficient ORDER BY on the attached column in a single index scan. The attached column's type must have a corresponding rum_<TYPE>_ops class, and ORDER BY must use <=>, <=|, or |=>. |
rum_tsvector_hash_addon_ops |
WHERE: @@ |
Same as rum_tsvector_addon_ops but stores hash values of lexemes. Does not support prefix search. The index may be smaller, but hash collisions can require a recheck and may make searches slower than rum_tsvector_addon_ops. |
rum_tsquery_ops |
WHERE: @@ |
Indexes a tsquery column for inverted matching: quickly finds which stored query conditions match a given document. |
Operator classes for arrays
| Operator class | Applicable types | Supported operators | Notes |
|---|---|---|---|
rum_anyarray_ops |
anyarray (e.g., int[], text[], varchar[]) |
WHERE: &&, @>, <@, =, %; ORDER BY: <=> |
Supports array overlap, containment, equality, and similarity checks (similarity is computed against a threshold; arrays exceeding the threshold are considered similar), plus sorting by distance between arrays. |
rum_anyarray_addon_ops |
anyarray (e.g., int[], text[], varchar[]) |
WHERE: &&, @>, <@, =, %; ORDER BY: <=> |
Same as rum_anyarray_ops with an attached column. The % operator uses the same threshold-based similarity check. The attached column's type must have a corresponding rum_<TYPE>_ops class, and ORDER BY must use <=>, <=|, or |=>. |
Operator classes for other data types
| Operator class | Applicable types | Supported operators |
|---|---|---|
rum_<TYPE>_ops |
int2, int4, int8, float4, float8, money, oid, time, timetz, date, interval, macaddr, inet, cidr, text, varchar, char, bytea, bit, varbit, numeric, timestamp, timestamptz |
WHERE: <, <=, =, >=, >; ORDER BY (int2, int4, int8, float4, float8, money, oid, timestamp, timestamptz only): <=>, <=|, |=> |
Use rum_<TYPE>_ops for range queries and distance sorting on non-text, non-array columns.
Examples
Sort full-text search results by relevance
This example creates a tsvector column with a RUM index and queries it ordered by relevance score.
-
Create the table and insert test data:
CREATE TABLE t1( t text, t_vec tsvector GENERATED ALWAYS AS (to_tsvector('pg_catalog.english', t)) STORED ); INSERT INTO t1(t) VALUES ('The situation is most beautiful'); INSERT INTO t1(t) VALUES ('It is a beautiful'); INSERT INTO t1(t) VALUES ('It looks like a beautiful place'); -
Create a RUM index using the
rum_tsvector_opsoperator class:CREATE INDEX t1_t_vec_idx ON t1 USING rum (t_vec rum_tsvector_ops); -
Query and sort by relevance. The
<=>operator returns a distance value — smaller means more relevant:SET enable_seqscan TO off; SELECT t, t_vec <=> to_tsquery('english', 'beautiful | place') AS rank FROM t1 WHERE t_vec @@ to_tsquery('english', 'beautiful | place') ORDER BY t_vec <=> to_tsquery('english', 'beautiful | place');Result:
t | rank ---------------------------------+--------- It looks like a beautiful place | 8.22467 The situation is most beautiful | 16.4493 It is a beautiful | 16.4493
Sort by timestamp with full-text filtering
This example shows how to attach a timestamp column to a tsvector index. Both full-text filtering and timestamp-based sorting complete in a single index scan.
-
Create the table and insert sample data:
CREATE TABLE tsts (id int, t tsvector, d timestamp); INSERT INTO tsts VALUES (354, to_tsvector('wr qh'), '2016-05-16 14:21:22.326724'), (355, to_tsvector('wr qh'), '2016-05-16 13:21:22.326724'), (356, to_tsvector('ts op'), '2016-05-16 18:21:22.326724'), (358, to_tsvector('ts op'), '2016-05-16 23:21:22.326724'), (371, to_tsvector('wr qh'), '2016-05-17 06:21:22.326724'), (406, to_tsvector('wr qh'), '2016-05-18 17:21:22.326724'), (415, to_tsvector('wr qh'), '2016-05-19 02:21:22.326724'); -
Create a RUM index with the
dcolumn attached to thetindex entries. TheWITH (attach = 'd', to = 't')clause stores values fromdalongside thetsvectorentries fort, so sorting ondrequires no separate table lookup:CREATE INDEX tsts_idx ON tsts USING rum (t rum_tsvector_addon_ops, d) WITH (attach = 'd', to = 't'); -
Query records matching the full-text filter and sort by proximity to a target timestamp:
SET enable_seqscan TO off; EXPLAIN (costs off) SELECT id, d, d <=> '2016-05-16 14:21:25' AS distance FROM tsts WHERE t @@ 'wr&qh' ORDER BY d <=> '2016-05-16 14:21:25' LIMIT 5;The execution plan confirms that both filtering and sorting use a single index scan:
QUERY PLAN ------------------------------------------------------------------------------ Limit -> Index Scan using tsts_idx on tsts Index Cond: (t @@ '''wr'' & ''qh'''::tsquery) Order By: (d <=> '2016-05-16 14:21:25'::timestamp without time zone)Result:
id | d | distance -----+----------------------------+--------------- 354 | 2016-05-16 14:21:22.326724 | 2.673276 355 | 2016-05-16 13:21:22.326724 | 3602.673276 371 | 2016-05-17 06:21:22.326724 | 57597.326724 406 | 2016-05-18 17:21:22.326724 | 183597.326724 415 | 2016-05-19 02:21:22.326724 | 215997.326724
Query and sort arrays by similarity
This example indexes an integer array column and retrieves rows that overlap with a target array, ordered by similarity.
-
Create the table and insert sample data:
CREATE TABLE test_array (id serial, i int2[]); INSERT INTO test_array(i) VALUES ('{}'), ('{0}'), ('{1,2,3,4}'), ('{1,2,3}'), ('{1,2}'), ('{1}'); -
Create a RUM index using the
rum_anyarray_opsoperator class:CREATE INDEX idx_array ON test_array USING rum (i rum_anyarray_ops); -
Find rows that contain the element
1and sort by similarity to{1}. The&&operator checks for array overlap, and<=>returns the distance — smaller means more similar:SELECT * FROM test_array WHERE i && '{1}' ORDER BY i <=> '{1}' ASC;Result:
i ----------- {1} {1,2} {1,2,3} {1,2,3,4}
Match documents against stored query rules
This example indexes a tsquery column to perform inverted matching: given a new document, find all stored query rules that the document satisfies. This pattern is useful for subscription or alert systems.
-
Create the query rule table and insert rules:
CREATE TABLE query (id serial, q tsquery, tag text); INSERT INTO query (q, tag) VALUES ('supernova & star', 'sn'), ('black', 'color'), ('big & bang & black & hole', 'bang'), ('spiral & galaxy', 'shape'), ('black & hole', 'color'); -
Create a RUM index on the
tsquerycolumn:CREATE INDEX query_idx ON query USING rum(q rum_tsquery_ops); -
Match a new document against all stored rules:
SELECT * FROM query WHERE to_tsvector('black holes never exists before we think about them') @@ q;Result:
id | q | tag ---+---------+------- 2 | 'black' | color
Limitations
-
Write performance and index size: The RUM index stores extra information, including word positions and attached-column data. This makes it larger than a GIN index and increases the overhead of
INSERTandUPDATEoperations. Evaluate the trade-off carefully for write-intensive tables where storage is a concern. -
No prefix search with hash operator classes: Indexes created with
rum_tsvector_hash_opsorrum_tsvector_hash_addon_opsdo not support prefix search. These operator classes store hash values of lexemes rather than the original text.
References
-
rum on GitHub — official documentation and source code