All Products
Search
Document Center

ApsaraDB RDS:Use the RUM extension

Last Updated:Jul 05, 2023

This topic describes how to use the RUM extension of ApsaraDB RDS for PostgreSQL to run full-text searches.

Prerequisites

The RDS instance runs PostgreSQL 10 or later.

Note

If the RDS instance runs PostgreSQL 14 or PostgreSQL 15, the minor engine version of the RDS instance must be 20221030 or later. For more information about how to view and update the minor engine version of your RDS instance, see Update the minor engine version of an ApsaraDB RDS for PostgreSQL instance.

Background information

Generalized Inverted Index (GIN) allows you to run full-text searches by using the tsvector and tsquery data types. However, this may produce the following issues:

  • Slow sorting

    ApsaraDB RDS for PostgreSQL can sort words only after it obtains the locations of the words. However, GIN does not store word locations. After ApsaraDB RDS for PostgreSQL runs a scan based on a GIN index, it must run another scan to retrieve the word locations.

  • Slow queries for phrases

    ApsaraDB RDS for PostgreSQL can search for phrases based on GIN indexes only after it obtains the locations of the phrases.

  • Slow sorting of timestamps

    GIN does not store related information in indexes that contain morphemes. Therefore, an additional scan is required.

The RUM extension is designed based on GIN. It allows you to store word or timestamp locations in RUM indexes.

However, the RUM extension requires more time than GIN to construct and insert indexes. This is because the RUM extension generates indexes based on write-ahead logging (WAL) logs and the generated RUM indexes contain more information than the keys that are used for encryption.

Enable or disable the extension

  • Enable the extension

    CREATE EXTENSION rum;
  • Disable the extension

    DROP EXTENSION rum;

Universal operators

The following table describes the operators provided by the RUM extension.

Operator

Data type

Description

tsvector <=> tsquery

float4

Returns the distance between the data object of the tsvector type and that of the tsquery type.

timestamp <=> timestamp

float8

Returns the distance between two timestamps.

timestamp <=| timestamp

float8

Returns only the distance to the left-side timestamp.

timestamp |=> timestamp

float8

Returns only the distance to the right-side timestamp.

Note

The last three operators are also supported for the following data types: timestamptz, int2, int4, int8, float4, float8, money, and oid.

For more information about the functions that are provided by the RUM extension, visit the official website.

References

The method to use the RUM extension is the same as the method to use the open source extension. For more information, see official documentation.