The pg_bigm extension in PolarDB for PostgreSQL creates a 2-gram Generalized Inverted Index (GIN) index that is used to accelerate full-text search.
Prerequisites
Before you begin, ensure that you have:
A PolarDB for PostgreSQL cluster running one of the following engine versions:
PostgreSQL 14 (revision version 14.5.2.0 or later)
PostgreSQL 11 (revision version 1.1.28 or later)
To check your revision version, run the appropriate statement for your engine version:
PostgreSQL 14:
SELECT version();PostgreSQL 11:
SHOW polar_version;
pg_bigm vs. pg_trgm
PolarDB for PostgreSQL also includes pg_trgm, which uses a 3-gram model. The table below summarizes the differences to help you choose between the two extensions.
| Feature | pg_trgm | pg_bigm |
|---|---|---|
| Phrase matching model | 3-gram | 2-gram |
| Index types | GIN and Generalized Search Tree (GiST) | GIN |
| Operators | LIKE, ILIKE, ~, ~* | LIKE |
| Non-alphabetic full-text search | Not supported | Supported |
| Full-text search with 1–2 character keywords | Slow | Fast |
| Similarity search | Supported | Supported |
| Max indexed column size | 238,609,291 bytes (~228 MB) | 107,374,180 bytes (~102 MB) |
Limitations
Indexed column size
The column on which you create a GIN index cannot exceed 107,374,180 bytes (~102 MB). Sample statement:
CREATE TABLE t1 (description text);
CREATE INDEX t1_idx ON t1 USING gin (description gin_bigm_ops);
INSERT INTO t1 SELECT repeat('A', 107374181);Usage notes
If your data contains non-ASCII characters, use UTF-8 encoding for accurate 2-gram tokenization. To check the current database encoding:
SELECT pg_encoding_to_char(encoding) FROM pg_database WHERE datname = current_database();
Enable pg_bigm
CREATE EXTENSION pg_bigm;To remove the extension:
DROP EXTENSION pg_bigm;Create GIN indexes
When creating a GIN index with pg_bigm, specify the gin_bigm_ops operator class.
-- Create a table and insert sample data
CREATE TABLE pg_tools (tool text, description text);
INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer HINT to PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL');
-- Single-column GIN index
CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops);
-- Multi-column GIN index with FASTUPDATE disabled
CREATE INDEX pg_tools_multi_idx ON pg_tools USING gin (tool gin_bigm_ops, description gin_bigm_ops) WITH (FASTUPDATE = off);Run full-text searches
Use the LIKE operator to search indexed columns:
SELECT * FROM pg_tools WHERE description LIKE '%search%';Result:
tool | description
----------+---------------------------------------------------------------------
pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)Use the =% operator to run similarity searches:
SELECT tool FROM pg_tools WHERE tool =% 'bigm';Result:
tool
---------
pg_bigm
(1 row)Functions
likequery
In pg_bigm, full-text search uses LIKE pattern matching, which means the search keyword must be wrapped in % delimiters and any literal % in the keyword must be escaped. Client applications normally handle this escaping themselves. The likequery function saves that effort by converting a keyword into a LIKE-compatible pattern string automatically.
Syntax: likequery(keyword text) → text
The function adds % before and after the keyword and escapes any literal % characters with a backslash.
Examples:
SELECT likequery('pg_bigm has improved the full text search performance by 200%');Result:
likequery
-------------------------------------------------------------------
%pg\_bigm has improved the full text search performance by 200\%%
(1 row)SELECT * FROM pg_tools WHERE description LIKE likequery('search');Result:
tool | description
----------+---------------------------------------------------------------------
pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)show_bigm
Returns all 2-gram elements of a string as an array. The function pads the input with a leading and trailing space before extracting 2-gram elements.
Syntax: show_bigm(string text) → text[]
Example:
SELECT show_bigm('full text search');Result:
show_bigm
------------------------------------------------------------------
{" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt}
(1 row)bigm_similarity
Returns a floating-point similarity score between two strings, ranging from 0 (completely different) to 1 (identical). The score is calculated from the 2-gram elements shared between the two strings. The function pads each string with a leading and trailing space before comparison.
Syntax: bigm_similarity(string1 text, string2 text) → float4
This function is case-sensitive. For example, this function determines that the similarity between the ABC string and the abc string is 0. Also, because the function pads each string with spaces, bigm_similarity('ABC', 'B') returns 0, while bigm_similarity('ABC', 'A') returns 0.25.
Examples:
SELECT bigm_similarity('full text search', 'text similarity search');
-- Result: 0.571429
SELECT bigm_similarity('ABC', 'A');
-- Result: 0.25
SELECT bigm_similarity('ABC', 'B');
-- Result: 0
SELECT bigm_similarity('ABC', 'abc');
-- Result: 0 (case-sensitive)pg_gin_pending_stats
Returns the number of pages and tuples in the pending list of a GIN index.
Syntax: pg_gin_pending_stats(index regclass) → (pages int, tuples int)
If FASTUPDATE is set to false for the index, the index has no pending list and this function returns (0, 0).
Example:
SELECT * FROM pg_gin_pending_stats('pg_tools_idx');Result:
pages | tuples
-------+--------
0 | 0
(1 row)Configuration parameters
pg_bigm.enable_recheck
Controls whether pg_bigm performs a recheck step after retrieving candidates from the GIN index.
How recheck works: Full-text search with pg_bigm runs in two phases:
Candidate retrieval — the GIN index returns all rows whose indexed 2-gram elements overlap with the query keyword's 2-grams.
Recheck — each candidate is re-evaluated against the original
LIKEpattern to filter out false positives.
False positives arise because unrelated strings can share the same 2-grams. For example, searching for trial produces the 2-grams tr, ri, ia, and al. The string "It was a trivial mistake" contains all four 2-grams and is returned as a candidate, even though it does not match %trial%. Recheck eliminates it.
| Value | Behavior |
|---|---|
on (default) | Recheck is performed; results are accurate |
off | Recheck is skipped; false positives may appear in results |
Keep pg_bigm.enable_recheck set to on to get accurate results.
Example — recheck enabled (default):
CREATE TABLE tbl (doc text);
INSERT INTO tbl VALUES('He is awaiting trial');
INSERT INTO tbl VALUES('It was a trivial mistake');
CREATE INDEX tbl_idx ON tbl USING gin (doc gin_bigm_ops);
SET enable_seqscan TO off;
EXPLAIN ANALYZE SELECT * FROM tbl WHERE doc LIKE likequery('trial');Result:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl (cost=20.00..24.01 rows=1 width=32) (actual time=0.020..0.021 rows=1 loops=1)
Recheck Cond: (doc ~~ '%trial%'::text)
Rows Removed by Index Recheck: 1
Heap Blocks: exact=1
-> Bitmap Index Scan on tbl_idx (cost=0.00..20.00 rows=1 width=0) (actual time=0.013..0.013 rows=2 loops=1)
Index Cond: (doc ~~ '%trial%'::text)
Planning Time: 0.117 ms
Execution Time: 0.043 ms
(8 rows)The index returns 2 candidates; recheck eliminates the false positive, returning only the correct row:
SELECT * FROM tbl WHERE doc LIKE likequery('trial');Result:
doc
----------------------
He is awaiting trial
(1 row)Example — recheck disabled:
SET pg_bigm.enable_recheck = off;
SELECT * FROM tbl WHERE doc LIKE likequery('trial');Result:
doc
--------------------------
He is awaiting trial
It was a trivial mistake
(2 rows)With recheck disabled, the false positive "It was a trivial mistake" appears in the results.
pg_bigm.gin_key_limit
Sets the maximum number of 2-gram elements used when searching the GIN index. The default value is 0, which means all 2-gram elements of the query keyword are used.
If long keywords degrade query performance, reduce this value to limit the number of 2-grams evaluated during index lookup.
pg_bigm.similarity_limit
Sets the minimum similarity score threshold for similarity searches using the =% operator. Only rows with a bigm_similarity score above this threshold are returned.