pg_bigm is an extension for Alibaba Cloud RDS for PostgreSQL that enables full-text search using a 2-gram Generalized Inverted Index (GIN). It is effective for non-alphabetic languages such as Japanese and Chinese, and for short keywords of 1–2 characters.
When to use pg_bigm
pg_bigm and pg_trgm are both full-text search extensions for RDS for PostgreSQL. Use the following table to choose the right one.
| Feature | pg_trgm | pg_bigm |
|---|---|---|
| Phrase match method | 3-gram | 2-gram |
| Supported index types | GIN and GiST | GIN |
| Supported full-text search operators | LIKE, ILIKE, ~, and ~* | LIKE |
| Full-text search for non-alphabetic languages | Not supported | Supported |
| Full-text search for keywords with 1–2 characters | Slow | Fast |
| Similarity search | Supported | Supported |
| Maximum size of an indexable column | 238,609,291 bytes (~228 MB) | 107,374,180 bytes (~102 MB) |
Choose pg_bigm when your data includes non-alphabetic content or when you frequently search for very short keywords. Use pg_trgm when you need case-insensitive search (ILIKE) or regular expression operators (~, ~*).
Prerequisites
Before you begin, make sure you have:
An RDS for PostgreSQL instance that meets the following version requirements:
Importantpg_bigm cannot be created on instances running a minor engine version earlier than 20230830. If your instance runs an older version and you already use this extension, its functionality is not affected. To install or reinstall the extension, upgrade the minor engine version to the latest version. For details, see Restrictions on creating extensions.
Major version Minor engine version PostgreSQL 17 20250830 or later PostgreSQL 16 No restrictions PostgreSQL 10–15 20230830 or later pg_bigmadded to the Value of theshared_preload_librariesparameter. For example:'pg_stat_statements,auto_explain,pg_bigm'. For instructions, see Set instance parameters.
Limitations
The GIN index column size cannot exceed 107,374,180 bytes (~102 MB).
If the database contains non-ASCII content, set the database character encoding to UTF8. To check the current character encoding, run:
SELECT pg_encoding_to_char(encoding) FROM pg_database WHERE datname = current_database();
Basic operations
Create the extension
postgres=> CREATE EXTENSION pg_bigm;
CREATE EXTENSIONCreate a GIN index
Create a table, insert data, and build a GIN index using gin_bigm_ops:
postgres=> CREATE TABLE pg_tools (tool text, description text);
CREATE TABLE
postgres=> INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer HINT to PostgreSQL');
INSERT 0 1
postgres=> INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL');
INSERT 0 1
postgres=> INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL');
INSERT 0 1
postgres=> INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL');
INSERT 0 1
-- Single-column GIN index
postgres=> CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops);
CREATE INDEX
-- Multi-column GIN index with FASTUPDATE disabled
postgres=> CREATE INDEX pg_tools_multi_idx ON pg_tools USING gin (tool gin_bigm_ops, description gin_bigm_ops) WITH (FASTUPDATE = off);
CREATE INDEXPerform a full-text search
Use the LIKE operator with the %keyword% pattern:
postgres=> SELECT * FROM pg_tools WHERE description LIKE '%search%';
tool | description
---------+---------------------------------------------------------------------
pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)Perform a similarity search
Use the =% operator with pg_bigm.similarity_limit to control the match threshold:
postgres=> SET pg_bigm.similarity_limit TO 0.2;
SET
postgres=> SELECT tool FROM pg_tools WHERE tool =% 'bigm';
tool
---------
pg_bigm
pg_trgm
(2 rows)Uninstall the extension
postgres=> DROP EXTENSION pg_bigm;
DROP EXTENSIONExtension functions
likequery
In pg_bigm, full-text search uses LIKE pattern matching, which requires the search keyword to be wrapped with % and any literal % characters to be escaped. Use likequery to handle this conversion automatically instead of implementing the escaping logic in your application.
Parameter: A single string.
Return value: A
LIKE-compatible search string with%added before and after the keyword, and literal%characters escaped with\.
postgres=> SELECT likequery('pg_bigm has improved the full text search performance by 200%');
likequery
-------------------------------------------------------------------
%pg\_bigm has improved the full text search performance by 200\%%
(1 row)
postgres=> SELECT * FROM pg_tools WHERE description LIKE likequery('search');
tool | description
---------+---------------------------------------------------------------------
pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)show_bigm
Returns an array of all 2-gram elements for a given string. The function adds a space before and after the string before computing 2-gram substrings.
Parameter: A single string.
Return value: An array of all 2-gram substrings.
postgres=> SELECT show_bigm('full text search');
show_bigm
------------------------------------------------------------------
{" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt}
(1 row)bigm_similarity
Calculates the similarity between two strings based on their common 2-gram elements.
Parameters: Two strings.
Return value: A floating-point number between 0 and 1, where 0 means the strings are completely different and 1 means they are identical.
Because spaces are added before and after the strings during 2-gram calculation, the similarity between ABC and B is 0, and the similarity between ABC and A is 0.25. The function is case-sensitive: the similarity between ABC and abc is 0.
postgres=> SELECT bigm_similarity('full text search', 'text similarity search');
bigm_similarity
-----------------
0.5714286
(1 row)
postgres=> SELECT bigm_similarity('ABC', 'A');
bigm_similarity
-----------------
0.25
(1 row)
postgres=> SELECT bigm_similarity('ABC', 'B');
bigm_similarity
-----------------
0
(1 row)
postgres=> SELECT bigm_similarity('ABC', 'abc');
bigm_similarity
-----------------
0
(1 row)pg_gin_pending_stats
Returns the number of pages and tuples in the pending list of a GIN index.
Parameter: The name or OID of the GIN index.
Return values: The number of pages and the number of tuples in the pending list.
If the GIN index was created with FASTUPDATE = false, no pending list exists and the function returns 0.
postgres=> SELECT * FROM pg_gin_pending_stats('pg_tools_idx');
pages | tuples
-------+--------
0 | 0
(1 row)Parameters
| Parameter | Description | Default |
|---|---|---|
pg_bigm.last_update | The last update date of the extension. Read-only. | — |
pg_bigm.enable_recheck | Controls whether a recheck is performed after a GIN index scan. Keep this set to on to return accurate results. | on |
pg_bigm.gin_key_limit | The maximum number of 2-gram elements used during a full-text search. Set to 0 to use all elements. Reduce this value if using all elements degrades performance. | 0 |
pg_bigm.similarity_limit | The similarity threshold for similarity search. Only tuples with a similarity score above this threshold are returned. | — |
pg_bigm.last_update
SHOW pg_bigm.last_update;pg_bigm.enable_recheck
When enable_recheck is on (the default), the GIN index scan results are rechecked against the actual query condition, filtering out false positives.
The following example demonstrates the difference. With enable_recheck = off, the word "trivial" is incorrectly returned when searching for "trial" because both share the same 2-gram elements:
postgres=> CREATE TABLE tbl (doc text);
CREATE TABLE
postgres=> INSERT INTO tbl VALUES('He is awaiting trial');
INSERT 0 1
postgres=> INSERT INTO tbl VALUES('It was a trivial mistake');
INSERT 0 1
postgres=> CREATE INDEX tbl_idx ON tbl USING gin (doc gin_bigm_ops);
CREATE INDEX
postgres=> SET enable_seqscan TO off;
SET
postgres=> EXPLAIN ANALYZE SELECT * FROM tbl WHERE doc LIKE likequery('trial');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl (cost=20.00..24.01 rows=1 width=32) (actual time=0.020..0.021 rows=1 loops=1)
Recheck Cond: (doc ~~ '%trial%'::text)
Rows Removed by Index Recheck: 1
Heap Blocks: exact=1
-> Bitmap Index Scan on tbl_idx (cost=0.00..20.00 rows=1 width=0) (actual time=0.013..0.013 rows=2 loops=1)
Index Cond: (doc ~~ '%trial%'::text)
Planning Time: 0.117 ms
Execution Time: 0.043 ms
(8 rows)
-- With enable_recheck = on (default): correct result
postgres=> SELECT * FROM tbl WHERE doc LIKE likequery('trial');
doc
----------------------
He is awaiting trial
(1 row)
-- With enable_recheck = off: false positive included
postgres=> SET pg_bigm.enable_recheck = off;
SET
postgres=> SELECT * FROM tbl WHERE doc LIKE likequery('trial');
doc
--------------------------
He is awaiting trial
It was a trivial mistake
(2 rows)pg_bigm.gin_key_limit
pg_bigm.similarity_limit
-- Return matches with similarity above 0.2
SET pg_bigm.similarity_limit TO 0.2;