The fuzzystrmatch extension lets you measure how similar two strings are — useful for handling misspellings, name variants, and phonetic matches in your database queries. ApsaraDB RDS for PostgreSQL supports four algorithms: Soundex, Levenshtein, Metaphone, and Double Metaphone.
Enable or disable the extension
-- Enable
CREATE EXTENSION fuzzystrmatch;
-- Disable
DROP EXTENSION fuzzystrmatch;
Choose an algorithm
| Algorithm | How it works | Best for |
|---|---|---|
| Soundex | Encodes strings by sound; returns a 4-character code | Phonetic matching of English names |
| Levenshtein | Counts the minimum edits (insert, delete, replace) to transform one string into another | Typo detection and fuzzy search |
| Metaphone | Encodes strings by English pronunciation; more accurate than Soundex | English word phonetic matching |
| Double Metaphone | Generates a primary and alternate phonetic code | Non-English names and words with pronunciation variants |
Soundex
Soundex encodes similar-sounding English words to the same 4-character code, letting you match strings by pronunciation rather than exact spelling.
Functions:
soundex(text) returns text
difference(text, text) returns int
-
soundex(text)returns the Soundex code for a string (for example,A550). -
difference(text, text)compares the Soundex codes of two strings and returns the number of matching character positions — a value from 0 (no match) to 4 (exact match). Despite the name, a higher value means more similarity.
Examples:
SELECT soundex('hello world!');
SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew');
SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret');
CREATE TABLE s (nm text);
INSERT INTO s VALUES ('john');
INSERT INTO s VALUES ('joan');
INSERT INTO s VALUES ('wobbly');
INSERT INTO s VALUES ('jack');
-- Exact phonetic match
SELECT * FROM s WHERE soundex(nm) = soundex('john');
-- Similar-sounding names (difference score > 2)
SELECT * FROM s WHERE difference(s.nm, 'john') > 2;
Levenshtein
Levenshtein distance measures the minimum number of single-character edits needed to change one string into another. A lower distance means more similarity.
For example, changing kitten to sitting requires 3 edits:
-
Substitute
kwiths:kitten→sitten -
Substitute
ewithi:sitten→sittin -
Insert
gat the end:sittin→sitting
The result is a Levenshtein distance of 3.
Functions:
levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int
levenshtein(text source, text target) returns int
levenshtein_less_equal(text source, text target, int ins_cost, int del_cost, int sub_cost, int max_d) returns int
levenshtein_less_equal(text source, text target, int max_d) returns int
Parameters:
| Parameter | Description |
|---|---|
source |
The source string. Cannot be empty; maximum 255 characters. |
target |
The target string. Cannot be empty; maximum 255 characters. |
ins_cost |
The cost of inserting a character. |
del_cost |
The cost of deleting a character. |
sub_cost |
The cost of substituting a character. |
max_d |
The maximum distance to consider. |
levenshtein_less_equal is an optimized version for short distances:
-
If the actual distance is less than or equal to
max_d, it returns the exact distance. -
If the actual distance exceeds
max_d, it returns a random value greater thanmax_d. -
If
max_dis negative, it behaves identically tolevenshtein.
Examples:
SELECT levenshtein('GUMBO', 'GAMBOL');
SELECT levenshtein('GUMBO', 'GAMBOL', 2, 1, 1);
SELECT levenshtein_less_equal('extensive', 'exhaustive', 2);
SELECT levenshtein_less_equal('extensive', 'exhaustive', 4);
End-to-end example — fuzzy name search:
CREATE TABLE customers (id INT, name TEXT);
INSERT INTO customers VALUES
(1, 'John Smith'),
(2, 'Jon Smythe'),
(3, 'Jane Doe'),
(4, 'Jonathan Smithson');
-- Find customers whose names are close to 'Jon Smithe' (distance <= 3)
SELECT * FROM customers
WHERE levenshtein(lower(name), lower('Jon Smithe')) <= 3;
Metaphone
Metaphone constructs a phonetic code for each English string. Two strings with the same code are considered phonetically similar. It is more accurate than Soundex for English words.
Function:
metaphone(text source, int max_output_length) returns text
Parameters:
| Parameter | Description |
|---|---|
source |
The source string. Cannot be empty; maximum 255 characters. |
max_output_length |
The maximum length of the returned Metaphone code. If the code exceeds this length, it is truncated. |
Example:
SELECT metaphone('GUMBO', 4);
Double Metaphone
Double Metaphone generates two phonetic codes for a string — a primary code and a secondary code. In most cases they are identical. They may differ for non-English words where pronunciation varies.
Functions:
dmetaphone(text source) returns text -- primary code
dmetaphone_alt(text source) returns text -- secondary code
Examples:
SELECT dmetaphone('gumbo');
SELECT dmetaphone_alt('gumbo');