All Products
Search
Document Center

ApsaraDB RDS:String similarity matching (fuzzystrmatch)

Last Updated:Mar 30, 2026

The fuzzystrmatch extension lets you measure how similar two strings are — useful for handling misspellings, name variants, and phonetic matches in your database queries. ApsaraDB RDS for PostgreSQL supports four algorithms: Soundex, Levenshtein, Metaphone, and Double Metaphone.

Enable or disable the extension

-- Enable
CREATE EXTENSION fuzzystrmatch;

-- Disable
DROP EXTENSION fuzzystrmatch;

Choose an algorithm

Algorithm How it works Best for
Soundex Encodes strings by sound; returns a 4-character code Phonetic matching of English names
Levenshtein Counts the minimum edits (insert, delete, replace) to transform one string into another Typo detection and fuzzy search
Metaphone Encodes strings by English pronunciation; more accurate than Soundex English word phonetic matching
Double Metaphone Generates a primary and alternate phonetic code Non-English names and words with pronunciation variants

Soundex

Soundex encodes similar-sounding English words to the same 4-character code, letting you match strings by pronunciation rather than exact spelling.

Functions:

soundex(text) returns text
difference(text, text) returns int
  • soundex(text) returns the Soundex code for a string (for example, A550).

  • difference(text, text) compares the Soundex codes of two strings and returns the number of matching character positions — a value from 0 (no match) to 4 (exact match). Despite the name, a higher value means more similarity.

Examples:

SELECT soundex('hello world!');
SELECT soundex('Anne'), soundex('Andrew'), difference('Anne', 'Andrew');
SELECT soundex('Anne'), soundex('Margaret'), difference('Anne', 'Margaret');

CREATE TABLE s (nm text);
INSERT INTO s VALUES ('john');
INSERT INTO s VALUES ('joan');
INSERT INTO s VALUES ('wobbly');
INSERT INTO s VALUES ('jack');

-- Exact phonetic match
SELECT * FROM s WHERE soundex(nm) = soundex('john');

-- Similar-sounding names (difference score > 2)
SELECT * FROM s WHERE difference(s.nm, 'john') > 2;

Levenshtein

Levenshtein distance measures the minimum number of single-character edits needed to change one string into another. A lower distance means more similarity.

For example, changing kitten to sitting requires 3 edits:

  1. Substitute k with s: kittensitten

  2. Substitute e with i: sittensittin

  3. Insert g at the end: sittinsitting

The result is a Levenshtein distance of 3.

Functions:

levenshtein(text source, text target, int ins_cost, int del_cost, int sub_cost) returns int
levenshtein(text source, text target) returns int
levenshtein_less_equal(text source, text target, int ins_cost, int del_cost, int sub_cost, int max_d) returns int
levenshtein_less_equal(text source, text target, int max_d) returns int

Parameters:

Parameter Description
source The source string. Cannot be empty; maximum 255 characters.
target The target string. Cannot be empty; maximum 255 characters.
ins_cost The cost of inserting a character.
del_cost The cost of deleting a character.
sub_cost The cost of substituting a character.
max_d The maximum distance to consider.

levenshtein_less_equal is an optimized version for short distances:

  • If the actual distance is less than or equal to max_d, it returns the exact distance.

  • If the actual distance exceeds max_d, it returns a random value greater than max_d.

  • If max_d is negative, it behaves identically to levenshtein.

Examples:

SELECT levenshtein('GUMBO', 'GAMBOL');
SELECT levenshtein('GUMBO', 'GAMBOL', 2, 1, 1);
SELECT levenshtein_less_equal('extensive', 'exhaustive', 2);
SELECT levenshtein_less_equal('extensive', 'exhaustive', 4);
Levenshtein

End-to-end example — fuzzy name search:

CREATE TABLE customers (id INT, name TEXT);
INSERT INTO customers VALUES
  (1, 'John Smith'),
  (2, 'Jon Smythe'),
  (3, 'Jane Doe'),
  (4, 'Jonathan Smithson');

-- Find customers whose names are close to 'Jon Smithe' (distance <= 3)
SELECT * FROM customers
WHERE levenshtein(lower(name), lower('Jon Smithe')) <= 3;

Metaphone

Metaphone constructs a phonetic code for each English string. Two strings with the same code are considered phonetically similar. It is more accurate than Soundex for English words.

Function:

metaphone(text source, int max_output_length) returns text

Parameters:

Parameter Description
source The source string. Cannot be empty; maximum 255 characters.
max_output_length The maximum length of the returned Metaphone code. If the code exceeds this length, it is truncated.

Example:

SELECT metaphone('GUMBO', 4);

Double Metaphone

Double Metaphone generates two phonetic codes for a string — a primary code and a secondary code. In most cases they are identical. They may differ for non-English words where pronunciation varies.

Functions:

dmetaphone(text source) returns text      -- primary code
dmetaphone_alt(text source) returns text  -- secondary code

Examples:

SELECT dmetaphone('gumbo');
SELECT dmetaphone_alt('gumbo');