How to use fuzzy query in RDS - ApsaraDB RDS - Alibaba Cloud Documentation Center

pg_bigm is an extension for Alibaba Cloud RDS for PostgreSQL that provides full-text search capabilities. It lets you create a 2-gram Generalized Inverted Index (GIN) to speed up the search process.

Prerequisites

Your RDS for PostgreSQL instance must meet the following requirements:
Major instance version
Minor engine version
PostgreSQL 17
20250830 or later
PostgreSQL 16
No minor version restrictions
PostgreSQL 10 to 15
20230830 or later
Important
This extension was supported on minor engine versions earlier than 20230830. However, to standardize extension management and enhance security, RDS for PostgreSQL has optimized extensions with security risks in subsequent engine version iterations. You can no longer create this extension on instances that run a minor engine version earlier than 20230830. For more information, see Restrictions on creating extensions.
- If your instance runs a minor engine version earlier than 20230830 and you already use this extension, its functionality is not affected.
- If you are creating this extension for the first time or recreating it, you must upgrade the minor engine version to the latest version.
You have set the instance parameters and added pg_bigm to the Value of the shared_preload_libraries parameter. For example, set the Value to 'pg_stat_statements,auto_explain,pg_bigm'.

Comparison with pg_trgm

pg_trgm is another extension for RDS for PostgreSQL that uses a 3-gram model for full-text search. The pg_bigm extension is based on pg_trgm. The differences between them are as follows.

Features and attributes	pg_trgm	pg_bigm
Phrase match method for full-text search	3-gram	2-gram
Supported index types	GIN and GIST	GIN
Supported full-text search operators	`LIKE`, `ILIKE`, `~`, and `~*`	`LIKE`
Full-text search for non-alphabetic languages	Not supported	Supported
Full-text search for keywords with 1 to 2 characters	Slow	Fast
Similarity search	Supported	Supported
Maximum size of an indexable column	238,609,291 bytes (about 228 MB)	107,374,180 bytes (about 102 MB)

Notes

The length of a column for which a GIN index is created cannot exceed 107,374,180 bytes (about 102 MB).
If the content stored in the database is non-ASCII, you must change the database codec to UTF8.
Note
To check the codec of the current database, run the following command: select pg_encoding_to_char(encoding) from pg_database where datname = current_database();.

Basic operations

Create the extension

postgres=> create extension pg_bigm;
CREATE EXTENSION

Create an index

postgres=> CREATE TABLE pg_tools (tool text, description text);
CREATE TABLE
postgres=> INSERT INTO pg_tools VALUES ('pg_hint_plan', 'Tool that allows a user to specify an optimizer HINT to PostgreSQL');
INSERT 0 1
postgres=> INSERT INTO pg_tools VALUES ('pg_dbms_stats', 'Tool that allows a user to stabilize planner statistics in PostgreSQL');
INSERT 0 1
postgres=> INSERT INTO pg_tools VALUES ('pg_bigm', 'Tool that provides 2-gram full text search capability in PostgreSQL');
INSERT 0 1
postgres=> INSERT INTO pg_tools VALUES ('pg_trgm', 'Tool that provides 3-gram full text search capability in PostgreSQL');
INSERT 0 1
postgres=> CREATE INDEX pg_tools_idx ON pg_tools USING gin (description gin_bigm_ops);
CREATE INDEX
postgres=> CREATE INDEX pg_tools_multi_idx ON pg_tools USING gin (tool gin_bigm_ops, description gin_bigm_ops) WITH (FASTUPDATE = off);
CREATE INDEX

Perform a full-text search

postgres=> SELECT * FROM pg_tools WHERE description LIKE '%search%';
  tool   |                             description
---------+---------------------------------------------------------------------
 pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
 pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)

Perform a similarity search using the =% operator

postgres=> SET pg_bigm.similarity_limit TO 0.2;
SET
postgres=> SELECT tool FROM pg_tools WHERE tool =% 'bigm';
  tool
---------
 pg_bigm
 pg_trgm
(2 rows)

Uninstall the extension

postgres=> drop extension pg_bigm;
DROP EXTENSION

Common extension functions

likequery function

Function: Generates a string that can be recognized by the LIKE keyword.
Parameter: A single parameter of the string type.
Return value: A search string that can be used with the LIKE keyword.
How it works:
- Adds the % symbol before and after the keyword.
- It uses \ to automatically escape the % character.

Example:

postgres=> SELECT likequery('pg_bigm has improved the full text search performance by 200%');
                             likequery
-------------------------------------------------------------------
 %pg\_bigm has improved the full text search performance by 200\%%
(1 row)

postgres=> SELECT * FROM pg_tools WHERE description LIKE likequery('search');
  tool   |                             description
---------+---------------------------------------------------------------------
 pg_bigm | Tool that provides 2-gram full text search capability in PostgreSQL
 pg_trgm | Tool that provides 3-gram full text search capability in PostgreSQL
(2 rows)

show_bigm function
- Function: Returns a collection of all 2-gram elements for a given string.
- Parameter: A single parameter of the string type.
- Return value: An array that contains all 2-gram elements.
- How it works:
  - Adds a space character before and after the string.
  - Calculates all 2-gram substrings.
- Example:
```
postgres=> SELECT show_bigm('full text search');
                            show_bigm
------------------------------------------------------------------
 {" f"," s"," t",ar,ch,ea,ex,fu,"h ","l ",ll,rc,se,"t ",te,ul,xt}
(1 row)
```
bigm_similarity function
- Function: Calculates the degree of similarity between two strings.
- Parameters: Two parameters of the string type.
- Return value: A floating-point number that indicates the degree of similarity.
- How it works:
  - Counts the common 2-gram elements between the two strings.
  - The similarity degree ranges from 0 to 1. A value of 0 indicates that the two strings are completely different. A value of 1 indicates that the two strings are identical.
  Note
  - Because spaces are added before and after the strings when 2-grams are calculated, the similarity between ABC and B is 0, and the similarity between ABC and A is 0.25.
  - The `bigm_similarity` function is case-sensitive. For example, the similarity between ABC and abc is 0.
- Example:
```
postgres=> SELECT bigm_similarity('full text search', 'text similarity search');
 bigm_similarity
-----------------
       0.5714286
(1 row)

postgres=> SELECT bigm_similarity('ABC', 'A');
 bigm_similarity
-----------------
            0.25
(1 row)

postgres=> SELECT bigm_similarity('ABC', 'B');
 bigm_similarity
-----------------
               0
(1 row)

postgres=> SELECT bigm_similarity('ABC', 'abc');
 bigm_similarity
-----------------
               0
(1 row)
```
pg_gin_pending_stats function
- Function: Returns the number of pages and tuples in the pending list of a GIN index.
- Parameter: The name or OID of the GIN index.
- Return values: The number of pages and the number of tuples in the pending list.
  Note
  If the FASTUPDATE parameter is set to False when the GIN index is created, the GIN index does not have a pending list. In this case, the function returns 0.
- Example:
```
postgres=> SELECT * FROM pg_gin_pending_stats('pg_tools_idx');
 pages | tuples
-------+--------
     0 |      0
(1 row)
```

Control extension behavior

pg_bigm.last_update
The last update date of this extension. This is a read-only parameter and cannot be modified.
Example:
```
SHOW pg_bigm.last_update;
```

pg_bigm.enable_recheck

Determines whether to perform a recheck.

Note

We recommend that you keep the default value (ON) to ensure correct results.

Example:

postgres=> CREATE TABLE tbl (doc text);
CREATE TABLE
postgres=> INSERT INTO tbl VALUES('He is awaiting trial');
INSERT 0 1
postgres=> INSERT INTO tbl VALUES('It was a trivial mistake');
INSERT 0 1
postgres=> CREATE INDEX tbl_idx ON tbl USING gin (doc gin_bigm_ops);
CREATE INDEX
postgres=> SET enable_seqscan TO off;
SET
postgres=> EXPLAIN ANALYZE SELECT * FROM tbl WHERE doc LIKE likequery('trial');
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tbl  (cost=20.00..24.01 rows=1 width=32) (actual time=0.020..0.021 rows=1 loops=1)
   Recheck Cond: (doc ~~ '%trial%'::text)
   Rows Removed by Index Recheck: 1
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on tbl_idx  (cost=0.00..20.00 rows=1 width=0) (actual time=0.013..0.013 rows=2 loops=1)
         Index Cond: (doc ~~ '%trial%'::text)
 Planning Time: 0.117 ms
 Execution Time: 0.043 ms
(8 rows)

postgres=>
postgres=> SELECT * FROM tbl WHERE doc LIKE likequery('trial');
         doc
----------------------
 He is awaiting trial
(1 row)

postgres=> SET pg_bigm.enable_recheck = off;
SET
postgres=> SELECT * FROM tbl WHERE doc LIKE likequery('trial');
           doc
--------------------------
 He is awaiting trial
 It was a trivial mistake
(2 rows)

pg_bigm.gin_key_limit
Limits the maximum number of 2-gram elements used for full-text search. The default value is 0, which means all 2-gram elements are used.
Note
If you find that using all 2-gram elements degrades performance, you can adjust this parameter to limit the number of 2-gram elements and improve performance.
pg_bigm.similarity_limit
Sets the similarity threshold. Tuples with a similarity degree that exceeds this threshold are returned by a similarity search.

Major instance version	Minor engine version
PostgreSQL 17	20250830 or later
PostgreSQL 16	No minor version restrictions
PostgreSQL 10 to 15	20230830 or later