The RUM extension adds positional and timestamp information to PostgreSQL's full-text search indexes, eliminating the extra heap scans that slow down ranking, phrase search, and timestamp-based ordering.
Prerequisites
Before you begin, ensure that:
The RDS instance runs PostgreSQL 10 or later
If the instance runs PostgreSQL 14 or PostgreSQL 15, the minor engine version is 20221030 or later. For more information, see Update the minor engine version of an ApsaraDB RDS for PostgreSQL instance
GIN limitations and RUM solutions
Generalized Inverted Index (GIN) supports full-text search using the tsvector and tsquery types, but it does not store positional or timestamp information in its indexes. This causes three performance issues:
| GIN limitation | Impact |
|---|---|
| Word locations not stored | Ranking requires an extra heap scan after the index scan to retrieve lexeme positions |
| Word locations not stored | Phrase search requires an extra heap scan to verify phrase boundaries |
| Timestamp information not stored | Ordering by timestamp requires an extra heap scan because indexes with morphemes do not store related information |
RUM solves all three by storing word locations and timestamp information directly in the index, so no extra heap scan is needed for ranking, phrase search, or timestamp ordering.
RUM indexes take longer to build and insert than GIN indexes. RUM generates indexes based on write-ahead logging (WAL) and stores more information per entry than GIN.
Enable or disable the extension
Enable the extension:
CREATE EXTENSION rum;Disable the extension:
DROP EXTENSION rum;Operators
The RUM extension provides the following operators:
| Operator | Returns | Description |
|---|---|---|
tsvector <=> tsquery | float4 | Returns the distance between a tsvector value and a tsquery value. |
timestamp <=> timestamp | float8 | Returns the distance between two timestamps. |
timestamp <=| timestamp | float8 | Returns the distance to the left-side timestamp only. |
timestamp |=> timestamp | float8 | Returns the distance to the right-side timestamp only. |
The last three operators also work with these types: timestamptz, int2, int4, int8, float4, float8, money, and oid.
References
The RUM extension follows the same usage method as the open source extension. For the full function reference and additional examples, see the RUM extension documentation.