pg_bigm is an open source third-party extension supported by AnalyticDB for PostgreSQL. It provides full-text search capabilities by creating a 2-gram Generalized Inverted Index (GIN) index to accelerate the search process.
The extension extracts 2-character fragments (bigrams) from text. Fragments shorter than two characters are padded with a leading or trailing space:
postgres=> SELECT show_bigm('full text search');
show_bigm
------------------------------------------------------------------
{" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt}
(1 row)Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for PostgreSQL instance in elastic storage mode running V6.6.2.1 or later (V6.0) or V7.0.6.1 or later (V7.0)
The pg_bigm extension installed (see Install, update, and uninstall extensions)
Usage notes
The GIN index column size limit is 107,374,180 bytes (approximately 102 MB). Inserting data that exceeds this limit returns an out-of-memory error:
postgres=> CREATE TABLE t1 (description text); CREATE TABLE postgres=> CREATE INDEX t1_idx ON t1 USING gin (description gin_bigm_ops); CREATE INDEX postgres=> INSERT INTO t1 SELECT repeat('A', 107374181); ERROR:OUT OF memoryFor databases that store non-ASCII data, change the encoding to UTF-8. Run the following statement to check the current encoding:
SELECT pg_encoding_to_char(encoding) FROM pg_database WHERE datname = current_database();
Install the extension
Install pg_bigm on the Extensions page of your AnalyticDB for PostgreSQL instance. For details, see Install, update, and uninstall extensions.
Create an index
Create a table and a GIN index using the gin_bigm_ops operator class:
CREATE TABLE pg_tools (tool text, description text);
INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer hint to PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL');
-- Single-column index
CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops);
-- Multi-column index with fastupdate disabled
CREATE INDEX pg_tools_multi_idx ON pg_tools USING gin (tool gin_bigm_ops, description gin_bigm_ops) WITH (FASTUPDATE = off);Perform full-text search
Use a LIKE query against an indexed column. pg_bigm uses the GIN index automatically:
postgres=> SELECT * FROM pg_tools WHERE description LIKE '%search%';
tool | description
---------+---------------------------------------------------------------------
pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)Perform similarity search
Use the =% operator to find strings whose similarity to the search term meets the threshold set by pg_bigm.similarity_limit:
postgres=> SET pg_bigm.similarity_limit TO 0.2;
SET
postgres=> SELECT tool FROM pg_tools WHERE tool =% 'bigm';
tool
---------
pg_bigm
pg_trgm
(2 rows)Functions
likequery()
likequery(text) converts a search keyword into a pattern string that LIKE can handle. It wraps the keyword with % and escapes any %, _, or \ characters in the input:
postgres=> SELECT likequery('pg_bigm has improved the full text search performance by 200%');
likequery
-------------------------------------------------------------------
%pg\_bigm has improved the full text search performance by 200\%%
(1 row)
postgres=> SELECT * FROM pg_tools WHERE description LIKE likequery('search');
tool | description
---------+---------------------------------------------------------------------
pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)show_bigm()
show_bigm(text) returns an array of all 2-gram elements in a string:
postgres=> SELECT show_bigm('full text search');
show_bigm
------------------------------------------------------------------
{" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt}
(1 row)bigm_similarity()
bigm_similarity(text, text) returns a floating-point number from 0 to 1 representing the greatest similarity between the 2-gram sets of two strings. A value of 0 means completely dissimilar; a value of 1 means identical.
Keep the following behavior in mind when using this function:
bigm_similarity()is case-sensitive. The similarity between'ABC'and'abc'is 0.The 2-gram algorithm pads short strings with spaces. As a result, the similarity between
'ABC'and'B'is 0, while the similarity between'ABC'and'A'is 0.25.
postgres=> SELECT bigm_similarity('full text search', 'text similarity search');
bigm_similarity
-----------------
0.5714286
(1 row)
postgres=> SELECT bigm_similarity('ABC', 'A');
bigm_similarity
-----------------
0.25
(1 row)
postgres=> SELECT bigm_similarity('ABC', 'B');
bigm_similarity
-----------------
0
(1 row)
postgres=> SELECT bigm_similarity('ABC', 'abc');
bigm_similarity
-----------------
0
(1 row)pg_gin_pending_stats()
pg_gin_pending_stats(index) returns the number of pages and tuples in the pending list of a GIN index. This is useful when fastupdate is enabled (the default), because GIN indexes batch new entries in a pending list before merging them into the main index:
postgres=> SELECT * FROM pg_gin_pending_stats('pg_tools_idx');
pages | tuples
-------+--------
0 | 0
(1 row)When an index is created with fastupdate = off, there is no pending list and the function always returns 0:
CREATE INDEX ON your_table_name (your_column_name gin_bigm_ops) WITH (fastupdate = off);GUC parameters
pg_bigm exposes the following Grand Unified Configuration (GUC) parameters. Adjust them based on your workload.
pg_bigm.last_update
Read-only. The date on which the pg_bigm extension was last updated.
pg_bigm.enable_recheck
Default: on
Controls whether the database performs a recheck after retrieving candidates from the GIN index. Leave this at the default value on.
Why this matters: GIN full-text search works in two phases. First, a bitmap index scan retrieves candidate tuples based on 2-gram matching. Then, a recheck (heap scan) verifies that each candidate actually matches the LIKE pattern—because 2-gram indexing can produce false positives. Setting enable_recheck to off skips this step, which may return incorrect results.
pg_bigm.gin_key_limit
Default: 0 (use all 2-gram elements)
Sets the maximum number of 2-gram elements used during a full-text search query. The default value of 0 means all 2-gram elements of the search keyword are used.
When to change this: For very long search keywords, using all 2-gram elements can slow down queries significantly. Set this parameter to a small positive value to limit the number of 2-gram elements used and improve performance. Note that using fewer elements increases the number of false-positive candidates that the recheck phase must filter out.
pg_bigm.similarity_limit
Sets the minimum similarity threshold for the =% similarity search operator. Only tuples whose similarity exceeds this threshold are returned.
When to change this: Increase the value to narrow results when similarity queries return too many loosely matching rows. Decrease it to broaden results for shorter or less precise search terms.
Appendix: operators and index operator classes
Operators
| Operator | Return type | Description |
|---|---|---|
text % text | Boolean | Returns true if the first string is similar to the second string. Uses GIN index to accelerate the search. |
text =% text | Boolean | Returns true if the similarity between the two strings meets the threshold set by pg_bigm.similarity_limit. |
Index operator classes
| Operator class | Description |
|---|---|
gin_bigm_ops | Converts text into a 2-gram set and stores it in a GIN index. |
Function reference
| Function | Return type | Description |
|---|---|---|
likequery(text) | text | Converts a search keyword into a LIKE-compatible pattern string by wrapping it with % and escaping special characters. |
show_bigm(text) | text[] | Returns an array of all 2-gram elements in the input string. |
bigm_similarity(text, text) | real | Returns a floating-point number from 0 to 1 representing the greatest similarity between the 2-gram sets of two strings. |
pg_gin_pending_stats(index) | (pages int, tuples bigint) | Returns the number of pages and tuples in the pending list of the specified GIN index. |