All Products
Search
Document Center

AnalyticDB:pg_bigm

Last Updated:Mar 28, 2026

pg_bigm is an open source third-party extension supported by AnalyticDB for PostgreSQL. It provides full-text search capabilities by creating a 2-gram Generalized Inverted Index (GIN) index to accelerate the search process.

The extension extracts 2-character fragments (bigrams) from text. Fragments shorter than two characters are padded with a leading or trailing space:

postgres=> SELECT show_bigm('full text search');
                           show_bigm
------------------------------------------------------------------
 {" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt}
(1 row)

Prerequisites

Before you begin, ensure that you have:

Usage notes

  • The GIN index column size limit is 107,374,180 bytes (approximately 102 MB). Inserting data that exceeds this limit returns an out-of-memory error:

    postgres=> CREATE TABLE t1 (description text);
    CREATE TABLE
    
    postgres=> CREATE INDEX t1_idx ON t1 USING gin (description gin_bigm_ops);
    CREATE INDEX
    
    postgres=> INSERT INTO t1 SELECT repeat('A', 107374181);
    ERROR:OUT OF memory
  • For databases that store non-ASCII data, change the encoding to UTF-8. Run the following statement to check the current encoding:

    SELECT pg_encoding_to_char(encoding)
    FROM pg_database
    WHERE datname = current_database();

Install the extension

Install pg_bigm on the Extensions page of your AnalyticDB for PostgreSQL instance. For details, see Install, update, and uninstall extensions.

Create an index

Create a table and a GIN index using the gin_bigm_ops operator class:

CREATE TABLE pg_tools (tool text, description text);

INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer hint to PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL');
INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL');

-- Single-column index
CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops);

-- Multi-column index with fastupdate disabled
CREATE INDEX pg_tools_multi_idx ON pg_tools USING gin (tool gin_bigm_ops, description gin_bigm_ops) WITH (FASTUPDATE = off);

Perform full-text search

Use a LIKE query against an indexed column. pg_bigm uses the GIN index automatically:

postgres=> SELECT * FROM pg_tools WHERE description LIKE '%search%';
  tool   |                      description
---------+---------------------------------------------------------------------
 pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
 pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)

Perform similarity search

Use the =% operator to find strings whose similarity to the search term meets the threshold set by pg_bigm.similarity_limit:

postgres=> SET pg_bigm.similarity_limit TO 0.2;
SET
postgres=> SELECT tool FROM pg_tools WHERE tool =% 'bigm';
  tool
---------
 pg_bigm
 pg_trgm
(2 rows)

Functions

likequery()

likequery(text) converts a search keyword into a pattern string that LIKE can handle. It wraps the keyword with % and escapes any %, _, or \ characters in the input:

postgres=> SELECT likequery('pg_bigm has improved the full text search performance by 200%');
                            likequery
-------------------------------------------------------------------
 %pg\_bigm has improved the full text search performance by 200\%%
(1 row)

postgres=> SELECT * FROM pg_tools WHERE description LIKE likequery('search');
  tool   |                      description
---------+---------------------------------------------------------------------
 pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
 pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)

show_bigm()

show_bigm(text) returns an array of all 2-gram elements in a string:

postgres=> SELECT show_bigm('full text search');
                           show_bigm
------------------------------------------------------------------
 {" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt}
(1 row)

bigm_similarity()

bigm_similarity(text, text) returns a floating-point number from 0 to 1 representing the greatest similarity between the 2-gram sets of two strings. A value of 0 means completely dissimilar; a value of 1 means identical.

Important

Keep the following behavior in mind when using this function:

  • bigm_similarity() is case-sensitive. The similarity between 'ABC' and 'abc' is 0.

  • The 2-gram algorithm pads short strings with spaces. As a result, the similarity between 'ABC' and 'B' is 0, while the similarity between 'ABC' and 'A' is 0.25.

postgres=> SELECT bigm_similarity('full text search', 'text similarity search');
 bigm_similarity
-----------------
       0.5714286
(1 row)

postgres=> SELECT bigm_similarity('ABC', 'A');
 bigm_similarity
-----------------
            0.25
(1 row)

postgres=> SELECT bigm_similarity('ABC', 'B');
 bigm_similarity
-----------------
               0
(1 row)

postgres=> SELECT bigm_similarity('ABC', 'abc');
 bigm_similarity
-----------------
               0
(1 row)

pg_gin_pending_stats()

pg_gin_pending_stats(index) returns the number of pages and tuples in the pending list of a GIN index. This is useful when fastupdate is enabled (the default), because GIN indexes batch new entries in a pending list before merging them into the main index:

postgres=> SELECT * FROM pg_gin_pending_stats('pg_tools_idx');
 pages | tuples
-------+--------
     0 |      0
(1 row)

When an index is created with fastupdate = off, there is no pending list and the function always returns 0:

CREATE INDEX ON your_table_name (your_column_name gin_bigm_ops) WITH (fastupdate = off);

GUC parameters

pg_bigm exposes the following Grand Unified Configuration (GUC) parameters. Adjust them based on your workload.

pg_bigm.last_update

Read-only. The date on which the pg_bigm extension was last updated.

pg_bigm.enable_recheck

Default: on

Controls whether the database performs a recheck after retrieving candidates from the GIN index. Leave this at the default value on.

Why this matters: GIN full-text search works in two phases. First, a bitmap index scan retrieves candidate tuples based on 2-gram matching. Then, a recheck (heap scan) verifies that each candidate actually matches the LIKE pattern—because 2-gram indexing can produce false positives. Setting enable_recheck to off skips this step, which may return incorrect results.

pg_bigm.gin_key_limit

Default: 0 (use all 2-gram elements)

Sets the maximum number of 2-gram elements used during a full-text search query. The default value of 0 means all 2-gram elements of the search keyword are used.

When to change this: For very long search keywords, using all 2-gram elements can slow down queries significantly. Set this parameter to a small positive value to limit the number of 2-gram elements used and improve performance. Note that using fewer elements increases the number of false-positive candidates that the recheck phase must filter out.

pg_bigm.similarity_limit

Sets the minimum similarity threshold for the =% similarity search operator. Only tuples whose similarity exceeds this threshold are returned.

When to change this: Increase the value to narrow results when similarity queries return too many loosely matching rows. Decrease it to broaden results for shorter or less precise search terms.

Appendix: operators and index operator classes

Operators

OperatorReturn typeDescription
text % textBooleanReturns true if the first string is similar to the second string. Uses GIN index to accelerate the search.
text =% textBooleanReturns true if the similarity between the two strings meets the threshold set by pg_bigm.similarity_limit.

Index operator classes

Operator classDescription
gin_bigm_opsConverts text into a 2-gram set and stores it in a GIN index.

Function reference

FunctionReturn typeDescription
likequery(text)textConverts a search keyword into a LIKE-compatible pattern string by wrapping it with % and escaping special characters.
show_bigm(text)text[]Returns an array of all 2-gram elements in the input string.
bigm_similarity(text, text)realReturns a floating-point number from 0 to 1 representing the greatest similarity between the 2-gram sets of two strings.
pg_gin_pending_stats(index)(pages int, tuples bigint)Returns the number of pages and tuples in the pending list of the specified GIN index.