The pg_bigm extension in PolarDB for PostgreSQL creates a 2-gram Generalized Inverted Index (GIN) that is used to accelerate full-text searches.

Prerequisites

The extension is supported on the PolarDB for PostgreSQL clusters that run the following engine:
  • PostgreSQL 14 (revision version 14.5.2.0 or later)
  • PostgreSQL 11 (revision version 1.1.28 or later)
Note You can execute one of the following statements to view the revision version of a PolarDB for PostgreSQL cluster:
  • PostgreSQL 14
    select version();
  • PostgreSQL 11
    show polar_version;

Comparison between the pg_bigm extension and the pg_trgm extension

The pg_trgm extension in PolarDB for PostgreSQL uses the 3-gram model to implement full-text search. The pg_bigm extension is developed based on the pg_trgm extension. The following table describes the differences between the two extensions.

Functionalitypg_trgmpg_bigm
Phrase matching model3-gram2-gram
Index typesGIN and GiSTGIN
OperatorsLIKE | ILIKE | ~ | ~*LIKE
Non-alphabet full-text searchNot supported.Supported.
Full-text search by using keywords that contain 1 to 2 charactersSlowHigh latency sensitivity
Similarity searchSupported.Supported.
Maximum length of an indexed column238,609,291 bytes, which are approximately 228 MB107,374,180 bytes, which are approximately 102 MB

Precautions

  • The length of the column on which you create a GIN index cannot exceed 107,374,180 bytes, which are approximately 102 MB. Example:
    CREATE TABLE t1 (description text);
    
    CREATE INDEX t1_idx ON t1 USING gin (description gin_bigm_ops);
    
    INSERT INTO t1 SELECT repeat('A', 107374181);
  • If the data in your PolarDB cluster is not encoded in the ASCII format, we recommend that you change the encoding format to UTF8. Execute the following statement to query the encoding mode of the current database:
    SELECT pg_encoding_to_char(encoding)
    FROM pg_database
    WHERE datname = current_database();

Basic operations

  • Enable the pg_bigm extension
    CREATE EXTENSION pg_bigm;
  • Load the extension to memory
    LOAD 'pg_bigm';
    Note This statement is valid only for the current session.
  • Create an index
    CREATE TABLE pg_tools (tool text, description text);
    
    INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer HINT to PostgreSQL');
    INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL');
    INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL');
    INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL');
    
    CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops);
    CREATE INDEX pg_tools_multi_idx ON pg_tools USING gin (tool gin_bigm_ops, description gin_bigm_ops) WITH (FASTUPDATE = off);
  • Run a full-text search query.
    SELECT * FROM pg_tools WHERE description LIKE '%search%';
    Sample result:
      tool   |                             description
    ---------+---------------------------------------------------------------------
     pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
     pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
    (2 rows)
  • Use the =% operator to run a similarity search query.
    SET pg_bigm.similarity_limit TO 0.2;
    # Sample result:
    SET
    
    SELECT tool FROM pg_tools WHERE tool =% 'bigm';
    # Sample result:
      tool
    ---------
     pg_bigm
     pg_trgm
    (2 rows)
  • Disable the pg_bigm extension
    DROP EXTENSION pg_bigm;

Basic functions

  • likequery
    • Purpose: This function is used to generate a string that can be identified based on the LIKE keyword.
    • Request parameters: This function contains one request parameter. The data type for this parameter is STRING.
    • Return value: This function returns a string that can be identified based on the LIKE keyword.
    • Implementation:
      • Add a percent sign (%) preceding and following the keyword.
      • Use a backward slash (\) to escape the percent sign (%).
    • Examples:
      SELECT likequery('pg_bigm has improved the full text search performance by 200%');
      # Sample result:
                                   likequery
      -------------------------------------------------------------------
       %pg\_bigm has improved the full text search performance by 200\%%
      (1 row)
      
      SELECT * FROM pg_tools WHERE description LIKE likequery('search');
      # Sample result:
        tool   |                             description
      ---------+---------------------------------------------------------------------
       pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
       pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
      (2 rows)
  • show_bigm
    • Purpose: This function is used to obtain all 2-gram elements of a string.
    • Request parameters: This function contains one request parameter. The data type for this parameter is STRING.
    • Return value: This parameter returns an array that consists of all 2-gram elements of a string.
    • Implementation:
      • Add a space preceding and following the string.
      • Identify all 2-gram elements in the string.
    • Example:
      SELECT show_bigm('full text search');
      Sample result:
                                  show_bigm
      ------------------------------------------------------------------
       {" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt}
      (1 row)
  • bigm_similarity
    • Purpose: This function is used to obtain the similarity between two strings.
    • Request parameters: This function contains two request parameters. The data types for these parameters are STRING.
    • Return value: This function returns a floating-point number, which indicates the similarity between the two strings.
    • Implementation:
      • Identify the 2-gram elements that are included in both of the two strings.
      • The return value is within the range of 0 to 1. The value 0 indicates that the two strings are different. The value 1 indicates that the two strings are the same.
      Note
      • This function adds a space preceding and following each string. Therefore, the similarity between the ABC string and the B string is 0, and the similarity between the ABC string and the A string is 0.25.
      • This function distinguishes between uppercase letters and lowercase letters. For example, this function determines that the similarity between the ABC string and the abc string is 0.
    • Examples:
      SELECT bigm_similarity('full text search', 'text similarity search');
      # Sample result:
       bigm_similarity
      -----------------
             0.5714286
      (1 row)
      
      SELECT bigm_similarity('ABC', 'A');
      # Sample result:
       bigm_similarity
      -----------------
                  0.25
      (1 row)
      
      SELECT bigm_similarity('ABC', 'B');
      # Sample result:
       bigm_similarity
      -----------------
                     0
      (1 row)
      
      SELECT bigm_similarity('ABC', 'abc');
      # Sample result:
       bigm_similarity
      -----------------
                     0
      (1 row)
  • pg_gin_pending_stats
    • Purpose: This function is used to obtain the number of pages and the number of tuples in the pending list of a GIN index.
    • Request parameters: This function contains one parameter. This parameter specifies the name or OID of the GIN index.
    • Return value: This function returns two values: the number of pages and the number of tuples in the pending list of the GIN index.
      Note If you set the FASTUPDATE parameter to False for a GIN index, the GIN index does not have a pending list. In this case, this function returns two values, 0 and 0.
    • Example:
      SELECT * FROM pg_gin_pending_stats('pg_tools_idx');
      Sample result:
       pages | tuples
      -------+--------
           0 |      0
      (1 row)

Behavior control parameters

  • pg_bigm.last_update

    This parameter indicates the date on which the most recent update was made to the pg_bigm extension. You can only view this parameter. You cannot change the value of this parameter.

    Example:

    SHOW pg_bigm.last_update;
  • pg_bigm.enable_recheck

    This parameter specifies whether to perform a recheck.

    Note We recommend that you retain the default value ON. This way, you can obtain accurate query results.

    Examples:

    CREATE TABLE tbl (doc text);
    # Sample result:
    CREATE TABLE
    
    INSERT INTO tbl VALUES('He is awaiting trial');
    # Sample result:
    INSERT 0 1
    
    INSERT INTO tbl VALUES('It was a trivial mistake');
    # Sample result:
    INSERT 0 1
    
    CREATE INDEX tbl_idx ON tbl USING gin (doc gin_bigm_ops);
    # Sample result:
    CREATE INDEX
    
    SET enable_seqscan TO off;
    # Sample result:
    SET
    
    EXPLAIN ANALYZE SELECT * FROM tbl WHERE doc LIKE likequery('trial');
    # Sample result:
                                                       QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on tbl  (cost=20.00..24.01 rows=1 width=32) (actual time=0.020..0.021 rows=1 loops=1)
       Recheck Cond: (doc ~~ '%trial%'::text)
       Rows Removed by Index Recheck: 1
       Heap Blocks: exact=1
       ->  Bitmap Index Scan on tbl_idx  (cost=0.00..20.00 rows=1 width=0) (actual time=0.013..0.013 rows=2 loops=1)
             Index Cond: (doc ~~ '%trial%'::text)
     Planning Time: 0.117 ms
     Execution Time: 0.043 ms
    (8 rows)
    
    SELECT * FROM tbl WHERE doc LIKE likequery('trial');
    # Sample result:
             doc
    ----------------------
     He is awaiting trial
    (1 row)
    
    SET pg_bigm.enable_recheck = off;
    # Sample result:
    SET
    
    SELECT * FROM tbl WHERE doc LIKE likequery('trial');
    # Sample result:
               doc
    --------------------------
     He is awaiting trial
     It was a trivial mistake
    (2 rows)
                        
  • pg_bigm.gin_key_limit

    This parameter specifies the maximum number of 2-gram elements that can be used to perform a full-text search query. The default value is 0, which indicates that all 2-gram elements are used.

    Note If query performance decreases due to the use of all 2-gram elements, you can decrease the value of this parameter.
  • pg_bigm.similarity_limit

    This parameter specifies the threshold for similarity. The tuples between which the similarity exceeds the specified threshold are returned as similarity search results.