This topic describes the smlar plug-in. This allows you to calculate the similarity between two arrays of the same data type.

Prerequisites

The instance runs one of the following PostgreSQL versions:
  • PostgreSQL 12 (kernel version 20200421 and later)
  • PostgreSQL 11 (kernel version 20200402 and later)
Note To view the kernel version, perform the following steps: Log on to the ApsaraDB for RDS console, find the target RDS instance, and navigate to the Basic Information page. Then, in the Configuration Information section, check whether the Upgrade Minor Version button exists. If the button exists, click it to view the kernel version. If the button does not exist, it indicates that you are already using the latest kernel version. For more information, see Upgrade the kernel version of an ApsaraDB RDS for PostgreSQL instance.
Upgrade the kernel of PostgreSQL

Background information

The smlar plug-in provides multiple functions to calculate the similarity between two arrays of the same data type. It also provides parameters to control the similarity calculation methods. All built-in data types are supported.

Function description

  • float4 smlar(anyarray, anyarray)

    Calculates the similarity between two arrays of the same data type.

  • float4 smlar(anyarray, anyarray, bool useIntersect)

    Calculates the similarity between two arrays of composite data types. The composite data type is defined as follows:

    CREATE TYPE type_name AS (element_name anytype, weight_name FLOAT4);

    When the useIntersect parameter is set to true, only the parts that contain duplicate elements are calculated. When the useIntersect parameter is set to false, all elements are calculated.

  • float4 smlar( anyarray a, anyarray b, text formula )

    Calculates the similarity between two arrays of the same data type. The arrays are specified by the formula parameter.

    The predefined variables for formula are described as follows:

    • N.i: The number of common elements in the two arrays.
    • N.a: The number of distinct elements in array a.
    • N.b: The number of distinct elements in array b.
  • float4 set_smlar_limit(float4)

    Sets the smlar.threshold parameter.

  • float4 show_smlar_limit()

    Displays the smlar.threshold parameter value.

  • anyarray % anyarray

    Returns true if the similarity between arrays is greater than the smlar.threshold parameter value. Otherwise, returns false.

  • text[] tsvector2textarray(tsvector)

    Converts the tsvector type to the text type.

  • anyarray array_unique(anyarray)

    Sorts the elements (excluding duplicate elements) in an array.

  • float4 inarray(anyarray, anyelement)

    Returns 1 if the anyelement parameter value exists in the anyarray parameter value. otherwise, returns 0.

  • float4 inarray(anyarray, anyelement, float4, float4)

    Returns the third parameter value if anyelement exists in anyarray. Otherwise, returns the fourth parameter value.

For more information about parameter descriptions and supported data types, visit smlar.

Use smlar

  • After you have connected to an instance, execute the following statement to create a smlar plug-in:
    testdb=> create extension smlar;
  • Execute the following statements to use basic functions of smlar:
    testdb=> SELECT smlar('{1,4,6}'::int[], '{5,4,6}' );
      smlar   
    ----------
     0.666667
    (1 row)
    testdb=> SELECT smlar('{1,4,6}'::int[], '{5,4,6}', 'N.i / sqrt(N.a * N.b)' );
      smlar   
    ----------
     0.666667
    (1 row)
  • Execute the following statement to remove smlar:
    testdb=> drop extension smlar;