pg_bigm is an extension for Alibaba Cloud RDS for PostgreSQL that provides full-text search capabilities. It lets you create a 2-gram Generalized Inverted Index (GIN) to speed up the search process.
Prerequisites
Your RDS for PostgreSQL instance must meet the following requirements:
Major instance version
Minor engine version
PostgreSQL 17
20250830 or later
PostgreSQL 16
No minor version restrictions
PostgreSQL 10 to 15
20230830 or later
ImportantThis extension was supported on minor engine versions earlier than 20230830. However, to standardize extension management and enhance security, RDS for PostgreSQL has optimized extensions with security risks in subsequent engine version iterations. You can no longer create this extension on instances that run a minor engine version earlier than 20230830. For more information, see Restrictions on creating extensions.
If your instance runs a minor engine version earlier than 20230830 and you already use this extension, its functionality is not affected.
If you are creating this extension for the first time or recreating it, you must upgrade the minor engine version to the latest version.
You have set the instance parameters and added pg_bigm to the Value of the shared_preload_libraries parameter. For example, set the Value to
'pg_stat_statements,auto_explain,pg_bigm'.
Comparison with pg_trgm
pg_trgm is another extension for RDS for PostgreSQL that uses a 3-gram model for full-text search. The pg_bigm extension is based on pg_trgm. The differences between them are as follows.
Features and attributes | pg_trgm | pg_bigm |
Phrase match method for full-text search | 3-gram | 2-gram |
Supported index types | GIN and GIST | GIN |
Supported full-text search operators |
|
|
Full-text search for non-alphabetic languages | Not supported | Supported |
Full-text search for keywords with 1 to 2 characters | Slow | Fast |
Similarity search | Supported | Supported |
Maximum size of an indexable column | 238,609,291 bytes (about 228 MB) | 107,374,180 bytes (about 102 MB) |
Notes
The length of a column for which a GIN index is created cannot exceed 107,374,180 bytes (about 102 MB).
If the content stored in the database is non-ASCII, you must change the database codec to UTF8.
NoteTo check the codec of the current database, run the following command:
select pg_encoding_to_char(encoding) from pg_database where datname = current_database();.
Basic operations
Create the extension
postgres=> create extension pg_bigm; CREATE EXTENSIONCreate an index
postgres=> CREATE TABLE pg_tools (tool text, description text); CREATE TABLE postgres=> INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer HINT to PostgreSQL'); INSERT 0 1 postgres=> INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL'); INSERT 0 1 postgres=> INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL'); INSERT 0 1 postgres=> INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL'); INSERT 0 1 postgres=> CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops); CREATE INDEX postgres=> CREATE INDEX pg_tools_multi_idx ON pg_tools USING gin (tool gin_bigm_ops, description gin_bigm_ops) WITH (FASTUPDATE = off); CREATE INDEXPerform a full-text search
postgres=> SELECT * FROM pg_tools WHERE description LIKE '%search%'; tool | description ---------+--------------------------------------------------------------------- pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL (2 rows)Perform a similarity search using the
=%operatorpostgres=> SET pg_bigm.similarity_limit TO 0.2; SET postgres=> SELECT tool FROM pg_tools WHERE tool =% 'bigm'; tool --------- pg_bigm pg_trgm (2 rows)Uninstall the extension
postgres=> drop extension pg_bigm; DROP EXTENSION
Common extension functions
likequery function
Function: Generates a string that can be recognized by the LIKE keyword.
Parameter: A single parameter of the string type.
Return value: A search string that can be used with the LIKE keyword.
How it works:
Adds the
%symbol before and after the keyword.It uses
\to automatically escape the%character.
Example:
postgres=> SELECT likequery('pg_bigm has improved the full text search performance by 200%'); likequery ------------------------------------------------------------------- %pg\_bigm has improved the full text search performance by 200\%% (1 row) postgres=> SELECT * FROM pg_tools WHERE description LIKE likequery('search'); tool | description ---------+--------------------------------------------------------------------- pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL (2 rows)
show_bigm function
Function: Returns a collection of all 2-gram elements for a given string.
Parameter: A single parameter of the string type.
Return value: An array that contains all 2-gram elements.
How it works:
Adds a space character before and after the string.
Calculates all 2-gram substrings.
Example:
postgres=> SELECT show_bigm('full text search'); show_bigm ------------------------------------------------------------------ {" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt} (1 row)
bigm_similarity function
Function: Calculates the degree of similarity between two strings.
Parameters: Two parameters of the string type.
Return value: A floating-point number that indicates the degree of similarity.
How it works:
Counts the common 2-gram elements between the two strings.
The similarity degree ranges from 0 to 1. A value of 0 indicates that the two strings are completely different. A value of 1 indicates that the two strings are identical.
NoteBecause spaces are added before and after the strings when 2-grams are calculated, the similarity between
ABCandBis 0, and the similarity betweenABCandAis 0.25.The `bigm_similarity` function is case-sensitive. For example, the similarity between
ABCandabcis 0.
Example:
postgres=> SELECT bigm_similarity('full text search', 'text similarity search'); bigm_similarity ----------------- 0.5714286 (1 row) postgres=> SELECT bigm_similarity('ABC', 'A'); bigm_similarity ----------------- 0.25 (1 row) postgres=> SELECT bigm_similarity('ABC', 'B'); bigm_similarity ----------------- 0 (1 row) postgres=> SELECT bigm_similarity('ABC', 'abc'); bigm_similarity ----------------- 0 (1 row)
pg_gin_pending_stats function
Function: Returns the number of pages and tuples in the pending list of a GIN index.
Parameter: The name or OID of the GIN index.
Return values: The number of pages and the number of tuples in the pending list.
NoteIf the FASTUPDATE parameter is set to False when the GIN index is created, the GIN index does not have a pending list. In this case, the function returns 0.
Example:
postgres=> SELECT * FROM pg_gin_pending_stats('pg_tools_idx'); pages | tuples -------+-------- 0 | 0 (1 row)
Control extension behavior
pg_bigm.last_update
The last update date of this extension. This is a read-only parameter and cannot be modified.
Example:
SHOW pg_bigm.last_update;pg_bigm.enable_recheck
Determines whether to perform a recheck.
NoteWe recommend that you keep the default value (ON) to ensure correct results.
Example:
postgres=> CREATE TABLE tbl (doc text); CREATE TABLE postgres=> INSERT INTO tbl VALUES('He is awaiting trial'); INSERT 0 1 postgres=> INSERT INTO tbl VALUES('It was a trivial mistake'); INSERT 0 1 postgres=> CREATE INDEX tbl_idx ON tbl USING gin (doc gin_bigm_ops); CREATE INDEX postgres=> SET enable_seqscan TO off; SET postgres=> EXPLAIN ANALYZE SELECT * FROM tbl WHERE doc LIKE likequery('trial'); QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on tbl (cost=20.00..24.01 rows=1 width=32) (actual time=0.020..0.021 rows=1 loops=1) Recheck Cond: (doc ~~ '%trial%'::text) Rows Removed by Index Recheck: 1 Heap Blocks: exact=1 -> Bitmap Index Scan on tbl_idx (cost=0.00..20.00 rows=1 width=0) (actual time=0.013..0.013 rows=2 loops=1) Index Cond: (doc ~~ '%trial%'::text) Planning Time: 0.117 ms Execution Time: 0.043 ms (8 rows) postgres=> postgres=> SELECT * FROM tbl WHERE doc LIKE likequery('trial'); doc ---------------------- He is awaiting trial (1 row) postgres=> SET pg_bigm.enable_recheck = off; SET postgres=> SELECT * FROM tbl WHERE doc LIKE likequery('trial'); doc -------------------------- He is awaiting trial It was a trivial mistake (2 rows)pg_bigm.gin_key_limit
Limits the maximum number of 2-gram elements used for full-text search. The default value is 0, which means all 2-gram elements are used.
NoteIf you find that using all 2-gram elements degrades performance, you can adjust this parameter to limit the number of 2-gram elements and improve performance.
pg_bigm.similarity_limit
Sets the similarity threshold. Tuples with a similarity degree that exceeds this threshold are returned by a similarity search.