All Products
Search
Document Center

PolarDB:smlar

Last Updated:Mar 28, 2026

smlar is an open-source extension for PolarDB for PostgreSQL that calculates array similarity inside the database. It supports GiST and GIN indexes for fast similarity searches, and works with all built-in PostgreSQL data types.

Three similarity algorithms are available: consine (default), TF-IDF, and overlap. Choose the algorithm based on your use case — cosine is a general-purpose vector similarity measure, TF-IDF weights terms by frequency and rarity (useful for text-like arrays), and overlap counts shared elements without normalization.

Supported versions

smlar requires one of the following PolarDB for PostgreSQL versions:

  • PolarDB for PostgreSQL 16 (revision version 2.0.16.9.6.0 or later)

  • PolarDB for PostgreSQL 14 (revision version 2.0.14.5.1.0 or later)

  • PolarDB for PostgreSQL 11 (revision version 2.0.11.9.28.0 or later)

To check your revision version, go to the PolarDB console or run SHOW polardb_version;. If the revision version does not meet the requirements, update it.

Install and remove smlar

Install:

CREATE EXTENSION smlar;
Note

The % operator in smlar conflicts with the % operator in rum. Installing both extensions in the same schema is not supported. To use smlar alongside rum, install them in separate schemas.

Remove:

DROP EXTENSION smlar;

Calculate array similarity

Basic similarity — same data type:

SELECT smlar('{3,2}'::int[], '{3,2,1}');
  smlar
----------
 0.816497
(1 row)

Custom formula:

SELECT smlar('{1,4,6}'::int[], '{5,4,6}', 'N.i / (N.a + N.b)' );
  smlar
----------
 0.333333
(1 row)

In the formula, N.i is the number of identical elements (intersection), N.a is the number of unique elements in the first array, and N.b is the number of unique elements in the second array.

Threshold-based similarity test:

Use the % operator to check whether the similarity of two arrays exceeds the threshold set by smlar.threshold:

-- Set the threshold
SET smlar.threshold = 0.6;

-- Confirm the current value
SHOW smlar.threshold;
 smlar.threshold
-----------------
 0.6
(1 row)

-- Test similarity (returns TRUE or FALSE)
SELECT '{3,2}'::int[] % '{3,2,1}'::int[];

Functions and operators

Function or operatorDescription
float4 smlar(anyarray, anyarray)Calculates the similarity of two arrays of the same data type.
float4 smlar(anyarray, anyarray, bool useIntersect)Calculates the similarity of two arrays of a custom composite type (element and weight pairs). Set useIntersect to TRUE to include only overlapping elements, or FALSE to include all elements. Define the composite type with: CREATE TYPE type_name AS (element_name anytype, weight_name FLOAT4);
float4 smlar(anyarray a, anyarray b, text formula)Calculates the similarity using a custom formula. Predefined variables: N.i (intersection count), N.a (unique elements in array a), N.b (unique elements in array b).
anyarray % anyarrayReturns TRUE if the similarity of the two arrays exceeds smlar.threshold. Otherwise returns FALSE.
text[] tsvector2textarray(tsvector)Converts a tsvector value to a text[] array.
anyarray array_unique(anyarray)Sorts and removes duplicate elements from the array.
float4 inarray(anyarray, anyelement)Returns 1.0 if the element is in the array; otherwise returns 0.
float4 inarray(anyarray, anyelement, float4, float4)Returns the third argument if the element is in the array; otherwise returns the fourth argument.

Parameters

Set parameters using the PostgreSQL SET command or in postgresql.conf.

General parameters

ParameterTypeDefaultDescription
smlar.thresholdFLOATThe minimum similarity score for the % operator to return TRUE.
smlar.persistent_cacheBOOLWhether to store cached global statistics in transaction-independent memory.
smlar.typeSTRINGconsineThe similarity algorithm. Valid values: consine, tfidf, overlap.

TF-IDF parameters

These parameters apply only when smlar.type = 'tfidf'.

ParameterTypeDefaultDescription
smlar.stattableSTRINGThe name of the table that stores collection-level statistics. Create it with: CREATE TABLE table_name (value data_type UNIQUE, ndoc int4 NOT NULL CHECK (ndoc>0));
smlar.tf_methodSTRINGnThe term frequency (TF) calculation method. Valid values: n (simple count), log (1 + log(n)), const (1).
smlar.idf_plus_oneBOOLFALSEThe inverse document frequency (IDF) formula. FALSE uses log(d/df); TRUE uses log(1 + d/df).

References