The pg_trgm extension adds text similarity search to AnalyticDB for PostgreSQL. It is particularly useful when:
A search term may be misspelled or imprecise
You need to rank results by how closely they match a query string
You want to accelerate
LIKE '%...%'fuzzy searches with an index
Version requirements
AnalyticDB for PostgreSQL V6.0 instances of V6.3.8.9 or later support the
pg_trgmextension.AnalyticDB for PostgreSQL V7.0 instances of V7.0.2 or later support the
pg_trgmextension.
To check your instance's minor version, see View the minor engine version.
Install the extension
Install pg_trgm on the Extensions page of your instance before use. For instructions, see Install, update, and uninstall extensions.
Key concepts
Trigrams
A trigram is a group of three consecutive characters extracted from a string. pg_trgm measures string similarity by counting the number of trigrams that two strings share.
A trigram with fewer than three characters is padded with up to two leading spaces and one trailing space.
For example:
SELECT show_trgm('abc');Output:
show_trgm
-------------------------
{" a"," ab","abc","bc "}
(1 row)Similarity score
The similarity() function returns a value between 0 and 1:
0— the strings share no trigrams1— the strings are identical
Distance
Distance is the complement of similarity: distance = 1 − similarity(). The <-> operator returns the distance between two strings, so a lower value means a closer match.
Perform a text similarity search
After installing pg_trgm, use trigram matching to find strings similar to a query term and rank results from best match to worst.
Create a test table:
CREATE TABLE test_trgm (t text);
INSERT INTO test_trgm values('word'), ('This is a pg_trgm test'), ('word test'), ('w0rd'), ('test word');Method 1: Filter and rank by similarity
Use the similarity() function and the % operator to find rows whose similarity to the query term exceeds the pg_trgm.similarity_threshold (default: 0.3), then sort by score.
SELECT t, similarity(t, 'word') AS sml
FROM test_trgm
WHERE t % 'word'
ORDER BY sml DESC, t;Output:
t | sml
-----------+-----
word | 1
test word | 0.5
word test | 0.5
(3 rows)Method 2: Order by distance
Use the <-> operator to sort all rows by distance from the query term, with the closest match first.
SELECT t, t <-> 'word' AS dist
FROM test_trgm
ORDER BY dist LIMIT 10;Output:
t | dist
------------------------+------
word | 0
word test | 0.5
test word | 0.5
w0rd | 0.75
This is a pg_trgm test | 1
(5 rows)Use Method 1 when you want to filter out low-relevance results. Use Method 2 when you want a ranked list of all rows regardless of similarity score.
Accelerate fuzzy searches with an index
The LIKE operator does not use indexes, so fuzzy searches on large tables require a full sequential scan. pg_trgm provides Generalized Search Tree (GiST) and Generalized Inverted Index (GIN) index operators that let you create an index on a text column to speed up LIKE '%...%' queries.
Without an index, a LIKE '%abcd%' query on a table with 1,000,000 rows performs a sequential scan:
CREATE TABLE test_trgm (t text);
INSERT INTO test_trgm
SELECT md5(random()::text) FROM generate_series(1,1000000) i;EXPLAIN ANALYZE SELECT * FROM test_trgm WHERE t LIKE '%abcd%'; QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=0.00..259.63 rows=422 width=32) (actual time=137.606..179.674 rows=431 loops=1)
-> Seq Scan on test_trgm (cost=0.00..254.00 rows=141 width=32) (actual time=0.961..136.977 rows=146 loops=1)
Filter: (t ~~ '%abcd%'::text)
Rows Removed by Filter: 333458
Optimizer: Postgres-based planner
Planning Time: 0.328 ms
(slice0) Executor memory: 37K bytes.
(slice1) Executor memory: 36K bytes avg x 3 workers, 36K bytes max (seg0).
Memory used: 128000kB
Execution Time: 180.533 ms
(10 rows)With a GIN index, the same query uses a Bitmap Index Scan and runs significantly faster:
CREATE INDEX trgm_idx ON test_trgm USING GIN (t gin_trgm_ops);EXPLAIN ANALYZE SELECT * FROM test_trgm WHERE t LIKE '%abcd%'; QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3) (cost=48.67..882.67 rows=8000 width=32) (actual time=4.353..4.550 rows=431 loops=1)
-> Bitmap Heap Scan on test_trgm (cost=48.67..776.00 rows=2667 width=32) (actual time=1.612..3.091 rows=146 loops=1)
Recheck Cond: (t ~~ '%abcd%'::text)
Rows Removed by Index Recheck: 10
-> Bitmap Index Scan on trgm_idx (cost=0.00..48.00 rows=2667 width=0) (actual time=1.536..1.537 rows=163 loops=1)
Index Cond: (t ~~ '%abcd%'::text)
Optimizer: Postgres-based planner
Planning Time: 1.353 ms
(slice0) Executor memory: 44K bytes.
(slice1) Executor memory: 2438K bytes avg x 3 workers, 2438K bytes max (seg0).
Memory used: 128000kB
Execution Time: 5.385 ms
(12 rows)Execution time drops from 180.533 ms to 5.385 ms — a reduction of more than 33x.
Reference
GUC parameters
Grand Unified Configuration (GUC) parameters control the similarity thresholds used by pg_trgm operators.
`pg_trgm.similarity_threshold`
Controls the minimum similarity score required for the % operator to return true. Default: 0.3. Valid range: 0 to 1.
Raise this value to require a closer match before rows are included in results. Lower it to cast a wider net and tolerate more variation.
`pg_trgm.word_similarity_threshold`
Controls the minimum word similarity score required for the <% and %> operators to return true. Default: 0.6. Valid range: 0 to 1.
Raise this value to require a closer word-level match. Lower it to match more loosely against partial words.
Functions
| Function | Return type | Description |
|---|---|---|
similarity(text, text) | real | Returns a similarity score from 0 (no shared trigrams) to 1 (identical strings). |
show_trgm(text) | text[] | Returns an array of all trigrams in the string. Primarily useful for debugging. |
word_similarity(text, text) | real | Returns the greatest similarity between the trigrams in the first string and any continuous extent of an ordered trigram set in the second string. Returns 0 (no match) to 1 (exact match). |
Operators
| Operator | Return type | Description |
|---|---|---|
text % text | boolean | Returns true if the similarity score exceeds pg_trgm.similarity_threshold. |
text <% text | boolean | Returns true if the word similarity of the first string against the second exceeds pg_trgm.word_similarity_threshold. |
text %> text | boolean | Commutator of <%. |
text <-> text | real | Returns the distance between two strings (1 − similarity()). |
text <<-> text | real | Returns the word distance between two strings (1 − word_similarity()). |
text <->> text | real | Commutator of <<->. |
Index operators
| Operator | Description |
|---|---|
gist_trgm_ops | Converts text to a trigram set and stores it in a GiST index. |
gin_trgm_ops | Converts text to a trigram set and stores it in a GIN index. |
What's next
For the full set of pg_trgm capabilities supported in ApsaraDB RDS for PostgreSQL, see Supported extensions.