The pg_bigm extension that is provided by ApsaraDB RDS for PostgreSQL supports full-text search. You can use this extension to create 2-gram Generalized Inverted Index (GIN) indexes. These indexes help expedite full-text search queries.

Prerequisites

Comparison between the pg_bigm extension and the pg_trgm extension

The pg_trgm extension is also provided by ApsaraDB RDS for PostgreSQL. The pg_trgm extension uses a 3-gram model to perform full-text search. The pg_bigm extension is developed based on the pg_trgm extension. The following table describes the differences between the two extensions.

Functionalitypg_trgmpg_bigm
Phrase matching model3-gram2-gram
Index typesGIN and GiSTGIN
OperatorsLIKE | ILIKE | ~ | ~*LIKE
Non-alphabet full-text searchNot supportedSupported
Full-text search by using keywords that contain 1 to 2 charactersSlowFast
Similarity searchSupportedSupported
Maximum length of an indexed column238,609,291 bytes, which are approximately 228 MB107,374,180 bytes, which are approximately 102 MB

Precautions

  • The length of the column on which you create a GIN index cannot exceed 107,374,180 bytes, which are approximately 102 MB.
  • If the data in your RDS instance is not encoded in the ASCII format, we recommend that you change the encoding format to UTF8.
    Note You can run the select pg_encoding_to_char(encoding) from pg_database where datname = current_database(); command to query the encoding format of your RDS instance.

Basic operations

  • Enable the pg_bigm extension.
    postgres=> create extension pg_bigm;
    CREATE EXTENSION
  • Create a GIN index.
    postgres=> CREATE TABLE pg_tools (tool text, description text);
    CREATE TABLE
    postgres=> INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer HINT to PostgreSQL');
    INSERT 0 1
    postgres=> INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL');
    INSERT 0 1
    postgres=> INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL');
    INSERT 0 1
    postgres=> INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL');
    INSERT 0 1
    postgres=> CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops);
    CREATE INDEX
    postgres=> CREATE INDEX pg_tools_multi_idx ON pg_tools USING gin (tool gin_bigm_ops, description gin_bigm_ops) WITH (FASTUPDATE = off);
    CREATE INDEX
  • Run a full-text search query.
    postgres=> SELECT * FROM pg_tools WHERE description LIKE '%search%';
      tool   |                             description
    ---------+---------------------------------------------------------------------
     pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
     pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
    (2 rows)
  • Use the =% operator to run a similarity search query.
    postgres=> SET pg_bigm.similarity_limit TO 0.2;
    SET
    postgres=> SELECT tool FROM pg_tools WHERE tool =% 'bigm';
      tool
    ---------
     pg_bigm
     pg_trgm
    (2 rows)
  • Disable the pg_bigm extension.
    postgres=> drop extension pg_bigm;
    DROP EXTENSION

Basic functions

  • likequery
    • Purpose: This function is used to generate a string that can be identified based on the LIKE keyword.
    • Request parameters: This function contains one request parameter. The data type for this parameter is STRING.
    • Return value: This function returns a string that can be identified based on the LIKE keyword.
    • Implementation:
      • Add a percent sign (%) preceding and following the keyword.
      • Use a backward slash (\) to escape the percent sign (%).
    • Example:
      postgres=> SELECT likequery('pg_bigm has improved the full text search performance by 200%');
                                   likequery
      -------------------------------------------------------------------
       %pg\_bigm has improved the full text search performance by 200\%%
      (1 row)
      
      postgres=> SELECT * FROM pg_tools WHERE description LIKE likequery('search');
        tool   |                             description
      ---------+---------------------------------------------------------------------
       pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
       pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
      (2 rows)
  • show_bigm
    • Purpose: This function is used to obtain all 2-gram elements of a string.
    • Request parameters: This function contains one request parameter. The data type for this parameter is STRING.
    • Return value: This parameter returns an array that consists of all 2-gram elements of a string.
    • Implementation:
      • Add a space preceding and following the string.
      • Identify all 2-gram elements in the string.
    • Example:
      postgres=> SELECT show_bigm('full text search');
                                  show_bigm
      ------------------------------------------------------------------
       {" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt}
      (1 row)
  • bigm_similarity
    • Purpose: This function is used to obtain the similarity between two strings.
    • Request parameters: This function contains two request parameters. The data types for these parameters are STRING.
    • Return value: This function returns a floating-point number, which indicates the similarity between the two strings.
    • Implementation:
      • Identify the 2-gram elements that are included in both of the two strings.
      • The return value is within the range of 0 to 1. The value 0 indicates that the two strings are different. The value 1 indicates that the two strings are the same.
      Note
      • This function adds a space preceding and following each string. Therefore, the similarity between the ABC string and the B string is 0, and the similarity between the ABC string and the A string is 0.25.
      • This function distinguishes between uppercase letters and lowercase letters. For example, this function determines that the similarity between the ABC string and the abc string is 0.
    • Example:
      postgres=> SELECT bigm_similarity('full text search', 'text similarity search');
       bigm_similarity
      -----------------
             0.5714286
      (1 row)
      
      postgres=> SELECT bigm_similarity('ABC', 'A');
       bigm_similarity
      -----------------
                  0.25
      (1 row)
      
      postgres=> SELECT bigm_similarity('ABC', 'B');
       bigm_similarity
      -----------------
                     0
      (1 row)
      
      postgres=> SELECT bigm_similarity('ABC', 'abc');
       bigm_similarity
      -----------------
                     0
      (1 row)
  • pg_gin_pending_stats
    • Purpose: This function is used to obtain the number of pages and the number of tuples in the pending list of a GIN index.
    • Request parameters: This function contains one parameter. This parameter specifies the name or OID of the GIN index.
    • Return value: This function returns two values: the number of pages and the number of tuples in the pending list of the GIN index.
      Note If you set the FASTUPDATE parameter to False for a GIN index, the GIN index does not have a pending list. In this case, this function returns two values, 0 and 0.
    • Example:
      postgres=> SELECT * FROM pg_gin_pending_stats('pg_tools_idx');
       pages | tuples
      -------+--------
           0 |      0
      (1 row)

Behavior control

  • pg_bigm.last_update

    This parameter indicates the date on which the most recent update was made to the pg_bigm extension. You can only view this parameter. You cannot change the value of this parameter.

    Example:

    SHOW pg_bigm.last_update;
  • pg_bigm.enable_recheck

    This parameter specifies whether to perform a recheck.

    Note We recommend that you retain the default value ON. This way, you can obtain accurate query results.

    Example:

    postgres=> CREATE TABLE tbl (doc text);
    CREATE TABLE
    postgres=> INSERT INTO tbl VALUES('He is awaiting trial');
    INSERT 0 1
    postgres=> INSERT INTO tbl VALUES('It was a trivial mistake');
    INSERT 0 1
    postgres=> CREATE INDEX tbl_idx ON tbl USING gin (doc gin_bigm_ops);
    CREATE INDEX
    postgres=> SET enable_seqscan TO off;
    SET
    postgres=> EXPLAIN ANALYZE SELECT * FROM tbl WHERE doc LIKE likequery('trial');
                                                       QUERY PLAN
    -----------------------------------------------------------------------------------------------------------------
     Bitmap Heap Scan on tbl  (cost=20.00..24.01 rows=1 width=32) (actual time=0.020..0.021 rows=1 loops=1)
       Recheck Cond: (doc ~~ '%trial%'::text)
       Rows Removed by Index Recheck: 1
       Heap Blocks: exact=1
       ->  Bitmap Index Scan on tbl_idx  (cost=0.00..20.00 rows=1 width=0) (actual time=0.013..0.013 rows=2 loops=1)
             Index Cond: (doc ~~ '%trial%'::text)
     Planning Time: 0.117 ms
     Execution Time: 0.043 ms
    (8 rows)
    
    postgres=>
    postgres=> SELECT * FROM tbl WHERE doc LIKE likequery('trial');
             doc
    ----------------------
     He is awaiting trial
    (1 row)
    
    postgres=> SET pg_bigm.enable_recheck = off;
    SET
    postgres=> SELECT * FROM tbl WHERE doc LIKE likequery('trial');
               doc
    --------------------------
     He is awaiting trial
     It was a trivial mistake
    (2 rows)
  • pg_bigm.gin_key_limit

    This parameter specifies the maximum number of 2-gram elements that can be used to perform a full-text search query. The default value is 0, which indicates that all 2-gram elements are used.

    Note If query performance decreases due to the use of all 2-gram elements, you can decrease the value of this parameter.
  • pg_bigm.similarity_limit

    This parameter specifies the threshold for similarity. The tuples between which the similarity exceeds the specified threshold are returned as similarity search results.