The pg_bigm extension creates a 2-gram Generalized Inverted Index (GIN) index to accelerate full-text search in PolarDB for PostgreSQL (Compatible with Oracle). It supports non-alphabet languages and delivers faster results for short keywords (1–2 characters) than the built-in pg_trgm extension.
Prerequisites
Before you begin, ensure that you have:
A PolarDB for PostgreSQL (Compatible with Oracle) cluster running one of the following revision versions:
Version 2.0: revision version 2.0.14.2.0 or later
Version 1.0: revision version 1.1.28 or later
To check the revision version of your cluster, run:
SHOW polar_version;pg_bigm vs pg_trgm
pg_bigm is developed based on pg_trgm. The table below summarizes the key differences to help you choose.
| Functionality | pg_trgm | pg_bigm |
|---|---|---|
| Phrase matching model | 3-gram | 2-gram |
| Index types | GIN and GiST (Generalized Search Tree) | GIN only |
| Operators | LIKE, ILIKE, ~, ~* | LIKE only |
| Non-alphabet full-text search | Not supported | Supported |
| Full-text search with 1–2 character keywords | Slow (1) | Fast |
| Similarity search | Supported | Supported |
| Maximum indexed column size | 238,609,291 bytes (~228 MB) | 107,374,180 bytes (~102 MB) |
(1) pg_trgm falls back to a full index scan rather than a normal index scan for keywords of 1–2 characters, which is significantly slower.
Choose pg_bigm when you need to search non-alphabet text (such as CJK characters) or frequently search with short keywords of 1–2 characters.
Usage notes
The GIN index column size limit is 107,374,180 bytes (~102 MB). Creating an index on a larger column fails. For example:
CREATE TABLE t1 (description text); CREATE INDEX t1_idx ON t1 USING gin (description gin_bigm_ops); INSERT INTO t1 SELECT repeat('A', 107374181); -- fails: exceeds the size limitIf your data is not ASCII-encoded, use UTF-8 encoding. To check the current encoding:
SELECT pg_encoding_to_char(encoding) FROM pg_database WHERE datname = current_database();
Basic operations
Enable the extension
CREATE EXTENSION pg_bigm;Create a GIN index
When creating a GIN index with pg_bigm, specify the gin_bigm_ops operator class.
CREATE TABLE pg_tools (tool text, description text);
INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer HINT to PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL');
-- Single-column index
CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops);
-- Multi-column index with FASTUPDATE disabled
CREATE INDEX pg_tools_multi_idx ON pg_tools USING gin (tool gin_bigm_ops, description gin_bigm_ops) WITH (FASTUPDATE = off);Run a full-text search
Use the LIKE operator to search indexed columns.
SELECT * FROM pg_tools WHERE description LIKE '%search%';Result:
tool | description
----------+---------------------------------------------------------------------
pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)Run a similarity search
Use the =% operator to find strings similar to a keyword.
SELECT tool FROM pg_tools WHERE tool =% 'bigm';Result:
tool
---------
pg_bigm
(1 row)Disable the extension
DROP EXTENSION pg_bigm;Built-in functions
likequery
Generates a LIKE-compatible search string by wrapping the keyword in % and escaping any % characters in the input.
Syntax: likequery(keyword text) → text
Example 1: Escape a literal % in the input.
SELECT likequery('pg_bigm has improved the full text search performance by 200%');Result:
likequery
-------------------------------------------------------------------
%pg\_bigm has improved the full text search performance by 200\%%
(1 row)Example 2: Use likequery in a WHERE clause to avoid manually writing % patterns.
SELECT * FROM pg_tools WHERE description LIKE likequery('search');Result:
tool | description
----------+---------------------------------------------------------------------
pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)show_bigm
Returns an array of all 2-gram elements extracted from a string. The function pads the input with a leading and trailing space before extraction.
Syntax: show_bigm(string text) → text[]
Example:
SELECT show_bigm('full text search');Result:
show_bigm
------------------------------------------------------------------
{" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt}
(1 row)bigm_similarity
Returns a floating-point similarity score between two strings, based on their shared 2-gram elements. The score ranges from 0 (completely different) to 1 (identical).
Syntax: bigm_similarity(string1 text, string2 text) → float4
The function pads each string with a leading and trailing space before comparison, and the comparison is case-sensitive.
Examples:
SELECT bigm_similarity('full text search', 'text similarity search');Result:
bigm_similarity
-----------------
0.571429
(1 row)-- Space padding means 'A' shares the " A" bigram with 'ABC' but not the "AB" or "BC" bigrams
SELECT bigm_similarity('ABC', 'A');Result:
bigm_similarity
-----------------
0.25
(1 row)-- 'B' has no overlapping bigrams with 'ABC' after space padding
SELECT bigm_similarity('ABC', 'B');Result:
bigm_similarity
-----------------
0
(1 row)-- Case-sensitive: 'abc' and 'ABC' share no bigrams
SELECT bigm_similarity('ABC', 'abc');Result:
bigm_similarity
-----------------
0
(1 row)pg_gin_pending_stats
Returns the number of pages and tuples in the pending list of a GIN index.
Syntax: pg_gin_pending_stats(index regclass) → (pages int, tuples int)
IfFASTUPDATEis set tofalsefor the index, the GIN index has no pending list, and this function returns0, 0.
Example:
SELECT * FROM pg_gin_pending_stats('pg_tools_idx');Result:
pages | tuples
-------+--------
0 | 0
(1 row)Configuration parameters
pg_bigm.enable_recheck
Controls whether to apply a recheck step after retrieving candidates from the GIN index.
Default: on
How full-text search works internally: pg_bigm splits the search keyword into 2-gram elements and uses the GIN index to retrieve candidate rows. Because the index lookup is approximate, the candidate set may include false positives — rows that contain the matching 2-grams but not the actual keyword. The recheck step filters out these false positives by evaluating each candidate against the original search condition.
For example, when searching for trial, both "He is awaiting trial" and "It was a trivial mistake" are retrieved as candidates because trivial contains all the 2-grams of trial (tr, ri, ia, al). With recheck enabled, only the correct result is returned.
Example — recheck enabled (default):
CREATE TABLE tbl (doc text);
INSERT INTO tbl VALUES('He is awaiting trial');
INSERT INTO tbl VALUES('It was a trivial mistake');
CREATE INDEX tbl_idx ON tbl USING gin (doc gin_bigm_ops);
SET enable_seqscan TO off;
EXPLAIN ANALYZE SELECT * FROM tbl WHERE doc LIKE likequery('trial');Result:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl (cost=20.00..24.01 rows=1 width=32) (actual time=0.020..0.021 rows=1 loops=1)
Recheck Cond: (doc ~~ '%trial%'::text)
Rows Removed by Index Recheck: 1
Heap Blocks: exact=1
-> Bitmap Index Scan on tbl_idx (cost=0.00..20.00 rows=1 width=0) (actual time=0.013..0.013 rows=2 loops=1)
Index Cond: (doc ~~ '%trial%'::text)
Planning Time: 0.117 ms
Execution Time: 0.043 ms
(8 rows)The index returns 2 candidates; the recheck removes the false positive, leaving 1 correct result:
SELECT * FROM tbl WHERE doc LIKE likequery('trial');Result:
doc
----------------------
He is awaiting trial
(1 row)Example — recheck disabled:
With pg_bigm.enable_recheck = off, the recheck step is skipped and false positives are included in the results.
SET pg_bigm.enable_recheck = off;
SELECT * FROM tbl WHERE doc LIKE likequery('trial');Result:
doc
--------------------------
He is awaiting trial
It was a trivial mistake
(2 rows)Keep pg_bigm.enable_recheck at its default value (on) to get accurate results.
pg_bigm.gin_key_limit
Sets the maximum number of 2-gram elements used during a GIN index scan.
Default: 0 (all 2-gram elements are used)
Using all 2-grams produces the most accurate candidate set, but long keywords generate many 2-grams and can increase the cost of the GIN index scan. Reduce this value to limit the number of 2-grams used and speed up the scan.
However, using fewer 2-grams increases the number of false positives in the candidate set, which in turn increases the workload on the recheck step. Tune this parameter only when GIN index scan performance is a bottleneck, and test the impact on result accuracy.
pg_bigm.similarity_limit
Sets the similarity threshold for similarity search. Only rows whose similarity score meets or exceeds this threshold are returned by the =% operator.