All Products
Search
Document Center

PolarDB:pg_bigm

Last Updated:Mar 28, 2026

The pg_bigm extension in PolarDB for PostgreSQL creates a 2-gram Generalized Inverted Index (GIN) index that is used to accelerate full-text search.

Prerequisites

Before you begin, ensure that you have:

  • A PolarDB for PostgreSQL cluster running one of the following engine versions:

    • PostgreSQL 14 (revision version 14.5.2.0 or later)

    • PostgreSQL 11 (revision version 1.1.28 or later)

Note

To check your revision version, run the appropriate statement for your engine version:

  • PostgreSQL 14: SELECT version();

  • PostgreSQL 11: SHOW polar_version;

pg_bigm vs. pg_trgm

PolarDB for PostgreSQL also includes pg_trgm, which uses a 3-gram model. The table below summarizes the differences to help you choose between the two extensions.

Featurepg_trgmpg_bigm
Phrase matching model3-gram2-gram
Index typesGIN and Generalized Search Tree (GiST)GIN
OperatorsLIKE, ILIKE, ~, ~*LIKE
Non-alphabetic full-text searchNot supportedSupported
Full-text search with 1–2 character keywordsSlowFast
Similarity searchSupportedSupported
Max indexed column size238,609,291 bytes (~228 MB)107,374,180 bytes (~102 MB)

Limitations

Indexed column size

The column on which you create a GIN index cannot exceed 107,374,180 bytes (~102 MB). Sample statement:

CREATE TABLE t1 (description text);
CREATE INDEX t1_idx ON t1 USING gin (description gin_bigm_ops);

INSERT INTO t1 SELECT repeat('A', 107374181);

Usage notes

  • If your data contains non-ASCII characters, use UTF-8 encoding for accurate 2-gram tokenization. To check the current database encoding:

    SELECT pg_encoding_to_char(encoding)
    FROM pg_database
    WHERE datname = current_database();

Enable pg_bigm

CREATE EXTENSION pg_bigm;

To remove the extension:

DROP EXTENSION pg_bigm;

Create GIN indexes

When creating a GIN index with pg_bigm, specify the gin_bigm_ops operator class.

-- Create a table and insert sample data
CREATE TABLE pg_tools (tool text, description text);

INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer HINT to PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL');

-- Single-column GIN index
CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops);

-- Multi-column GIN index with FASTUPDATE disabled
CREATE INDEX pg_tools_multi_idx ON pg_tools USING gin (tool gin_bigm_ops, description gin_bigm_ops) WITH (FASTUPDATE = off);

Run full-text searches

Use the LIKE operator to search indexed columns:

SELECT * FROM pg_tools WHERE description LIKE '%search%';

Result:

   tool   |                             description
----------+---------------------------------------------------------------------
 pg_bigm  | Tool that provides 2-gram full text search capability in PostgreSQL
 pg_trgm  | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)

Use the =% operator to run similarity searches:

SELECT tool FROM pg_tools WHERE tool =% 'bigm';

Result:

   tool
---------
 pg_bigm
(1 row)

Functions

likequery

In pg_bigm, full-text search uses LIKE pattern matching, which means the search keyword must be wrapped in % delimiters and any literal % in the keyword must be escaped. Client applications normally handle this escaping themselves. The likequery function saves that effort by converting a keyword into a LIKE-compatible pattern string automatically.

Syntax: likequery(keyword text) → text

The function adds % before and after the keyword and escapes any literal % characters with a backslash.

Examples:

SELECT likequery('pg_bigm has improved the full text search performance by 200%');

Result:

                             likequery
-------------------------------------------------------------------
 %pg\_bigm has improved the full text search performance by 200\%%
(1 row)
SELECT * FROM pg_tools WHERE description LIKE likequery('search');

Result:

   tool   |                             description
----------+---------------------------------------------------------------------
 pg_bigm  | Tool that provides 2-gram full text search capability in PostgreSQL
 pg_trgm  | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)

show_bigm

Returns all 2-gram elements of a string as an array. The function pads the input with a leading and trailing space before extracting 2-gram elements.

Syntax: show_bigm(string text) → text[]

Example:

SELECT show_bigm('full text search');

Result:

                            show_bigm
------------------------------------------------------------------
 {" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt}
(1 row)

bigm_similarity

Returns a floating-point similarity score between two strings, ranging from 0 (completely different) to 1 (identical). The score is calculated from the 2-gram elements shared between the two strings. The function pads each string with a leading and trailing space before comparison.

Syntax: bigm_similarity(string1 text, string2 text) → float4

Note

This function is case-sensitive. For example, this function determines that the similarity between the ABC string and the abc string is 0. Also, because the function pads each string with spaces, bigm_similarity('ABC', 'B') returns 0, while bigm_similarity('ABC', 'A') returns 0.25.

Examples:

SELECT bigm_similarity('full text search', 'text similarity search');
-- Result: 0.571429

SELECT bigm_similarity('ABC', 'A');
-- Result: 0.25

SELECT bigm_similarity('ABC', 'B');
-- Result: 0

SELECT bigm_similarity('ABC', 'abc');
-- Result: 0 (case-sensitive)

pg_gin_pending_stats

Returns the number of pages and tuples in the pending list of a GIN index.

Syntax: pg_gin_pending_stats(index regclass) → (pages int, tuples int)

Note

If FASTUPDATE is set to false for the index, the index has no pending list and this function returns (0, 0).

Example:

SELECT * FROM pg_gin_pending_stats('pg_tools_idx');

Result:

 pages | tuples
-------+--------
     0 |      0
(1 row)

Configuration parameters

pg_bigm.enable_recheck

Controls whether pg_bigm performs a recheck step after retrieving candidates from the GIN index.

How recheck works: Full-text search with pg_bigm runs in two phases:

  1. Candidate retrieval — the GIN index returns all rows whose indexed 2-gram elements overlap with the query keyword's 2-grams.

  2. Recheck — each candidate is re-evaluated against the original LIKE pattern to filter out false positives.

False positives arise because unrelated strings can share the same 2-grams. For example, searching for trial produces the 2-grams tr, ri, ia, and al. The string "It was a trivial mistake" contains all four 2-grams and is returned as a candidate, even though it does not match %trial%. Recheck eliminates it.

ValueBehavior
on (default)Recheck is performed; results are accurate
offRecheck is skipped; false positives may appear in results

Keep pg_bigm.enable_recheck set to on to get accurate results.

Example — recheck enabled (default):

CREATE TABLE tbl (doc text);

INSERT INTO tbl VALUES('He is awaiting trial');
INSERT INTO tbl VALUES('It was a trivial mistake');

CREATE INDEX tbl_idx ON tbl USING gin (doc gin_bigm_ops);

SET enable_seqscan TO off;
EXPLAIN ANALYZE SELECT * FROM tbl WHERE doc LIKE likequery('trial');

Result:

                                                          QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbl  (cost=20.00..24.01 rows=1 width=32) (actual time=0.020..0.021 rows=1 loops=1)
   Recheck Cond: (doc ~~ '%trial%'::text)
   Rows Removed by Index Recheck: 1
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on tbl_idx  (cost=0.00..20.00 rows=1 width=0) (actual time=0.013..0.013 rows=2 loops=1)
         Index Cond: (doc ~~ '%trial%'::text)
 Planning Time: 0.117 ms
 Execution Time: 0.043 ms
(8 rows)

The index returns 2 candidates; recheck eliminates the false positive, returning only the correct row:

SELECT * FROM tbl WHERE doc LIKE likequery('trial');

Result:

         doc
----------------------
 He is awaiting trial
(1 row)

Example — recheck disabled:

SET pg_bigm.enable_recheck = off;
SELECT * FROM tbl WHERE doc LIKE likequery('trial');

Result:

           doc
--------------------------
 He is awaiting trial
 It was a trivial mistake
(2 rows)

With recheck disabled, the false positive "It was a trivial mistake" appears in the results.

pg_bigm.gin_key_limit

Sets the maximum number of 2-gram elements used when searching the GIN index. The default value is 0, which means all 2-gram elements of the query keyword are used.

If long keywords degrade query performance, reduce this value to limit the number of 2-grams evaluated during index lookup.

pg_bigm.similarity_limit

Sets the minimum similarity score threshold for similarity searches using the =% operator. Only rows with a bigm_similarity score above this threshold are returned.