All Products
Search
Document Center

ApsaraDB RDS:Fuzzy query (pg_bigm)

Last Updated:Mar 28, 2026

pg_bigm is an extension for Alibaba Cloud RDS for PostgreSQL that enables full-text search using a 2-gram Generalized Inverted Index (GIN). It is effective for non-alphabetic languages such as Japanese and Chinese, and for short keywords of 1–2 characters.

When to use pg_bigm

pg_bigm and pg_trgm are both full-text search extensions for RDS for PostgreSQL. Use the following table to choose the right one.

Featurepg_trgmpg_bigm
Phrase match method3-gram2-gram
Supported index typesGIN and GiSTGIN
Supported full-text search operatorsLIKE, ILIKE, ~, and ~*LIKE
Full-text search for non-alphabetic languagesNot supportedSupported
Full-text search for keywords with 1–2 charactersSlowFast
Similarity searchSupportedSupported
Maximum size of an indexable column238,609,291 bytes (~228 MB)107,374,180 bytes (~102 MB)

Choose pg_bigm when your data includes non-alphabetic content or when you frequently search for very short keywords. Use pg_trgm when you need case-insensitive search (ILIKE) or regular expression operators (~, ~*).

Prerequisites

Before you begin, make sure you have:

  • An RDS for PostgreSQL instance that meets the following version requirements:

    Important

    pg_bigm cannot be created on instances running a minor engine version earlier than 20230830. If your instance runs an older version and you already use this extension, its functionality is not affected. To install or reinstall the extension, upgrade the minor engine version to the latest version. For details, see Restrictions on creating extensions.

    Major versionMinor engine version
    PostgreSQL 1720250830 or later
    PostgreSQL 16No restrictions
    PostgreSQL 10–1520230830 or later
  • pg_bigm added to the Value of the shared_preload_libraries parameter. For example: 'pg_stat_statements,auto_explain,pg_bigm'. For instructions, see Set instance parameters.

Limitations

  • The GIN index column size cannot exceed 107,374,180 bytes (~102 MB).

  • If the database contains non-ASCII content, set the database character encoding to UTF8. To check the current character encoding, run:

    SELECT pg_encoding_to_char(encoding) FROM pg_database WHERE datname = current_database();

Basic operations

Create the extension

postgres=> CREATE EXTENSION pg_bigm;
CREATE EXTENSION

Create a GIN index

Create a table, insert data, and build a GIN index using gin_bigm_ops:

postgres=> CREATE TABLE pg_tools (tool text, description text);
CREATE TABLE
postgres=> INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer HINT to PostgreSQL');
INSERT 0 1
postgres=> INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL');
INSERT 0 1
postgres=> INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL');
INSERT 0 1
postgres=> INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL');
INSERT 0 1

-- Single-column GIN index
postgres=> CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops);
CREATE INDEX

-- Multi-column GIN index with FASTUPDATE disabled
postgres=> CREATE INDEX pg_tools_multi_idx ON pg_tools USING gin (tool gin_bigm_ops, description gin_bigm_ops) WITH (FASTUPDATE = off);
CREATE INDEX

Perform a full-text search

Use the LIKE operator with the %keyword% pattern:

postgres=> SELECT * FROM pg_tools WHERE description LIKE '%search%';
  tool   |                             description
---------+---------------------------------------------------------------------
 pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
 pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)

Perform a similarity search

Use the =% operator with pg_bigm.similarity_limit to control the match threshold:

postgres=> SET pg_bigm.similarity_limit TO 0.2;
SET
postgres=> SELECT tool FROM pg_tools WHERE tool =% 'bigm';
  tool
---------
 pg_bigm
 pg_trgm
(2 rows)

Uninstall the extension

postgres=> DROP EXTENSION pg_bigm;
DROP EXTENSION

Extension functions

likequery

In pg_bigm, full-text search uses LIKE pattern matching, which requires the search keyword to be wrapped with % and any literal % characters to be escaped. Use likequery to handle this conversion automatically instead of implementing the escaping logic in your application.

  • Parameter: A single string.

  • Return value: A LIKE-compatible search string with % added before and after the keyword, and literal % characters escaped with \.

postgres=> SELECT likequery('pg_bigm has improved the full text search performance by 200%');
                             likequery
-------------------------------------------------------------------
 %pg\_bigm has improved the full text search performance by 200\%%
(1 row)

postgres=> SELECT * FROM pg_tools WHERE description LIKE likequery('search');
  tool   |                             description
---------+---------------------------------------------------------------------
 pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
 pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)

show_bigm

Returns an array of all 2-gram elements for a given string. The function adds a space before and after the string before computing 2-gram substrings.

  • Parameter: A single string.

  • Return value: An array of all 2-gram substrings.

postgres=> SELECT show_bigm('full text search');
                            show_bigm
------------------------------------------------------------------
 {" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt}
(1 row)

bigm_similarity

Calculates the similarity between two strings based on their common 2-gram elements.

  • Parameters: Two strings.

  • Return value: A floating-point number between 0 and 1, where 0 means the strings are completely different and 1 means they are identical.

Note

Because spaces are added before and after the strings during 2-gram calculation, the similarity between ABC and B is 0, and the similarity between ABC and A is 0.25. The function is case-sensitive: the similarity between ABC and abc is 0.

postgres=> SELECT bigm_similarity('full text search', 'text similarity search');
 bigm_similarity
-----------------
       0.5714286
(1 row)

postgres=> SELECT bigm_similarity('ABC', 'A');
 bigm_similarity
-----------------
            0.25
(1 row)

postgres=> SELECT bigm_similarity('ABC', 'B');
 bigm_similarity
-----------------
               0
(1 row)

postgres=> SELECT bigm_similarity('ABC', 'abc');
 bigm_similarity
-----------------
               0
(1 row)

pg_gin_pending_stats

Returns the number of pages and tuples in the pending list of a GIN index.

  • Parameter: The name or OID of the GIN index.

  • Return values: The number of pages and the number of tuples in the pending list.

Note

If the GIN index was created with FASTUPDATE = false, no pending list exists and the function returns 0.

postgres=> SELECT * FROM pg_gin_pending_stats('pg_tools_idx');
 pages | tuples
-------+--------
     0 |      0
(1 row)

Parameters

ParameterDescriptionDefault
pg_bigm.last_updateThe last update date of the extension. Read-only.
pg_bigm.enable_recheckControls whether a recheck is performed after a GIN index scan. Keep this set to on to return accurate results.on
pg_bigm.gin_key_limitThe maximum number of 2-gram elements used during a full-text search. Set to 0 to use all elements. Reduce this value if using all elements degrades performance.0
pg_bigm.similarity_limitThe similarity threshold for similarity search. Only tuples with a similarity score above this threshold are returned.

pg_bigm.last_update

SHOW pg_bigm.last_update;

pg_bigm.enable_recheck

When enable_recheck is on (the default), the GIN index scan results are rechecked against the actual query condition, filtering out false positives.

The following example demonstrates the difference. With enable_recheck = off, the word "trivial" is incorrectly returned when searching for "trial" because both share the same 2-gram elements:

postgres=> CREATE TABLE tbl (doc text);
CREATE TABLE
postgres=> INSERT INTO tbl VALUES('He is awaiting trial');
INSERT 0 1
postgres=> INSERT INTO tbl VALUES('It was a trivial mistake');
INSERT 0 1
postgres=> CREATE INDEX tbl_idx ON tbl USING gin (doc gin_bigm_ops);
CREATE INDEX
postgres=> SET enable_seqscan TO off;
SET

postgres=> EXPLAIN ANALYZE SELECT * FROM tbl WHERE doc LIKE likequery('trial');
                                                           QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbl  (cost=20.00..24.01 rows=1 width=32) (actual time=0.020..0.021 rows=1 loops=1)
   Recheck Cond: (doc ~~ '%trial%'::text)
   Rows Removed by Index Recheck: 1
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on tbl_idx  (cost=0.00..20.00 rows=1 width=0) (actual time=0.013..0.013 rows=2 loops=1)
         Index Cond: (doc ~~ '%trial%'::text)
 Planning Time: 0.117 ms
 Execution Time: 0.043 ms
(8 rows)

-- With enable_recheck = on (default): correct result
postgres=> SELECT * FROM tbl WHERE doc LIKE likequery('trial');
         doc
----------------------
 He is awaiting trial
(1 row)

-- With enable_recheck = off: false positive included
postgres=> SET pg_bigm.enable_recheck = off;
SET
postgres=> SELECT * FROM tbl WHERE doc LIKE likequery('trial');
           doc
--------------------------
 He is awaiting trial
 It was a trivial mistake
(2 rows)

pg_bigm.gin_key_limit

pg_bigm.similarity_limit

-- Return matches with similarity above 0.2
SET pg_bigm.similarity_limit TO 0.2;