All Products
Search
Document Center

ApsaraDB RDS:Fuzzy query (pg_bigm)

Last Updated:Sep 17, 2025

pg_bigm is an extension for Alibaba Cloud RDS for PostgreSQL that provides full-text search capabilities. It lets you create a 2-gram Generalized Inverted Index (GIN) to speed up the search process.

Prerequisites

  • Your RDS for PostgreSQL instance must meet the following requirements:

    Major instance version

    Minor engine version

    PostgreSQL 17

    20250830 or later

    PostgreSQL 16

    No minor version restrictions

    PostgreSQL 10 to 15

    20230830 or later

    Important

    This extension was supported on minor engine versions earlier than 20230830. However, to standardize extension management and enhance security, RDS for PostgreSQL has optimized extensions with security risks in subsequent engine version iterations. You can no longer create this extension on instances that run a minor engine version earlier than 20230830. For more information, see Restrictions on creating extensions.

    • If your instance runs a minor engine version earlier than 20230830 and you already use this extension, its functionality is not affected.

    • If you are creating this extension for the first time or recreating it, you must upgrade the minor engine version to the latest version.

  • You have set the instance parameters and added pg_bigm to the Value of the shared_preload_libraries parameter. For example, set the Value to 'pg_stat_statements,auto_explain,pg_bigm'.

Comparison with pg_trgm

pg_trgm is another extension for RDS for PostgreSQL that uses a 3-gram model for full-text search. The pg_bigm extension is based on pg_trgm. The differences between them are as follows.

Features and attributes

pg_trgm

pg_bigm

Phrase match method for full-text search

3-gram

2-gram

Supported index types

GIN and GIST

GIN

Supported full-text search operators

LIKE, ILIKE, ~, and ~*

LIKE

Full-text search for non-alphabetic languages

Not supported

Supported

Full-text search for keywords with 1 to 2 characters

Slow

Fast

Similarity search

Supported

Supported

Maximum size of an indexable column

238,609,291 bytes (about 228 MB)

107,374,180 bytes (about 102 MB)

Notes

  • The length of a column for which a GIN index is created cannot exceed 107,374,180 bytes (about 102 MB).

  • If the content stored in the database is non-ASCII, you must change the database codec to UTF8.

    Note

    To check the codec of the current database, run the following command: select pg_encoding_to_char(encoding) from pg_database where datname = current_database();.

Basic operations

  • Create the extension

    postgres=> create extension pg_bigm;
    CREATE EXTENSION
  • Create an index

    postgres=> CREATE TABLE pg_tools (tool text, description text);
    CREATE TABLE
    postgres=> INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer HINT to PostgreSQL');
    INSERT 0 1
    postgres=> INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL');
    INSERT 0 1
    postgres=> INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL');
    INSERT 0 1
    postgres=> INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL');
    INSERT 0 1
    postgres=> CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops);
    CREATE INDEX
    postgres=> CREATE INDEX pg_tools_multi_idx ON pg_tools USING gin (tool gin_bigm_ops, description gin_bigm_ops) WITH (FASTUPDATE = off);
    CREATE INDEX
  • Perform a full-text search

    postgres=> SELECT * FROM pg_tools WHERE description LIKE '%search%';
      tool   |                             description
    ---------+---------------------------------------------------------------------
     pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
     pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
    (2 rows)
  • Perform a similarity search using the =% operator

    postgres=> SET pg_bigm.similarity_limit TO 0.2;
    SET
    postgres=> SELECT tool FROM pg_tools WHERE tool =% 'bigm';
      tool
    ---------
     pg_bigm
     pg_trgm
    (2 rows)
  • Uninstall the extension

    postgres=> drop extension pg_bigm;
    DROP EXTENSION

Common extension functions

  • likequery function

    • Function: Generates a string that can be recognized by the LIKE keyword.

    • Parameter: A single parameter of the string type.

    • Return value: A search string that can be used with the LIKE keyword.

    • How it works:

      • Adds the % symbol before and after the keyword.

      • It uses \ to automatically escape the % character.

    • Example:

      postgres=> SELECT likequery('pg_bigm has improved the full text search performance by 200%');
                                   likequery
      -------------------------------------------------------------------
       %pg\_bigm has improved the full text search performance by 200\%%
      (1 row)
      
      postgres=> SELECT * FROM pg_tools WHERE description LIKE likequery('search');
        tool   |                             description
      ---------+---------------------------------------------------------------------
       pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
       pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
      (2 rows)
  • show_bigm function

    • Function: Returns a collection of all 2-gram elements for a given string.

    • Parameter: A single parameter of the string type.

    • Return value: An array that contains all 2-gram elements.

    • How it works:

      • Adds a space character before and after the string.

      • Calculates all 2-gram substrings.

    • Example:

      postgres=> SELECT show_bigm('full text search');
                                  show_bigm
      ------------------------------------------------------------------
       {" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt}
      (1 row)
  • bigm_similarity function

    • Function: Calculates the degree of similarity between two strings.

    • Parameters: Two parameters of the string type.

    • Return value: A floating-point number that indicates the degree of similarity.

    • How it works:

      • Counts the common 2-gram elements between the two strings.

      • The similarity degree ranges from 0 to 1. A value of 0 indicates that the two strings are completely different. A value of 1 indicates that the two strings are identical.

      Note
      • Because spaces are added before and after the strings when 2-grams are calculated, the similarity between ABC and B is 0, and the similarity between ABC and A is 0.25.

      • The `bigm_similarity` function is case-sensitive. For example, the similarity between ABC and abc is 0.

    • Example:

      postgres=> SELECT bigm_similarity('full text search', 'text similarity search');
       bigm_similarity
      -----------------
             0.5714286
      (1 row)
      
      postgres=> SELECT bigm_similarity('ABC', 'A');
       bigm_similarity
      -----------------
                  0.25
      (1 row)
      
      postgres=> SELECT bigm_similarity('ABC', 'B');
       bigm_similarity
      -----------------
                     0
      (1 row)
      
      postgres=> SELECT bigm_similarity('ABC', 'abc');
       bigm_similarity
      -----------------
                     0
      (1 row)
  • pg_gin_pending_stats function

    • Function: Returns the number of pages and tuples in the pending list of a GIN index.

    • Parameter: The name or OID of the GIN index.

    • Return values: The number of pages and the number of tuples in the pending list.

      Note

      If the FASTUPDATE parameter is set to False when the GIN index is created, the GIN index does not have a pending list. In this case, the function returns 0.

    • Example:

      postgres=> SELECT * FROM pg_gin_pending_stats('pg_tools_idx');
       pages | tuples
      -------+--------
           0 |      0
      (1 row)

Control extension behavior

  • pg_bigm.last_update

    The last update date of this extension. This is a read-only parameter and cannot be modified.

    Example:

    SHOW pg_bigm.last_update;
  • pg_bigm.enable_recheck

    Determines whether to perform a recheck.

    Note

    We recommend that you keep the default value (ON) to ensure correct results.

    Example:

    postgres=> CREATE TABLE tbl (doc text);
    CREATE TABLE
    postgres=> INSERT INTO tbl VALUES('He is awaiting trial');
    INSERT 0 1
    postgres=> INSERT INTO tbl VALUES('It was a trivial mistake');
    INSERT 0 1
    postgres=> CREATE INDEX tbl_idx ON tbl USING gin (doc gin_bigm_ops);
    CREATE INDEX
    postgres=> SET enable_seqscan TO off;
    SET
    postgres=> EXPLAIN ANALYZE SELECT * FROM tbl WHERE doc LIKE likequery('trial');
                                                       QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on tbl  (cost=20.00..24.01 rows=1 width=32) (actual time=0.020..0.021 rows=1 loops=1)
       Recheck Cond: (doc ~~ '%trial%'::text)
       Rows Removed by Index Recheck: 1
       Heap Blocks: exact=1
       ->  Bitmap Index Scan on tbl_idx  (cost=0.00..20.00 rows=1 width=0) (actual time=0.013..0.013 rows=2 loops=1)
             Index Cond: (doc ~~ '%trial%'::text)
     Planning Time: 0.117 ms
     Execution Time: 0.043 ms
    (8 rows)
    
    postgres=>
    postgres=> SELECT * FROM tbl WHERE doc LIKE likequery('trial');
             doc
    ----------------------
     He is awaiting trial
    (1 row)
    
    postgres=> SET pg_bigm.enable_recheck = off;
    SET
    postgres=> SELECT * FROM tbl WHERE doc LIKE likequery('trial');
               doc
    --------------------------
     He is awaiting trial
     It was a trivial mistake
    (2 rows)
  • pg_bigm.gin_key_limit

    Limits the maximum number of 2-gram elements used for full-text search. The default value is 0, which means all 2-gram elements are used.

    Note

    If you find that using all 2-gram elements degrades performance, you can adjust this parameter to limit the number of 2-gram elements and improve performance.

  • pg_bigm.similarity_limit

    Sets the similarity threshold. Tuples with a similarity degree that exceeds this threshold are returned by a similarity search.