Calculates the Jaccard index between two trajectories or sub-trajectories, returning a lower bound and an upper bound of the similarity score.
Syntax
record ST_JaccardSimilarity(trajectory tr1, trajectory tr2, double tol_dist,
text unit default '{}', interval tol_time default NULL,
timestamp ts default '-infinity', timestamp te default 'infinity');The function returns a record with the following fields:
| Field | Type | Description |
|---|---|---|
nleaf1 | int | The number of trajectory points in tr1 that intersect with tr2. |
nleaf2 | int | The number of trajectory points in tr2 that intersect with tr1. This value may differ from nleaf1. For example, if tr1 passes the same point on tr2 twice, nleaf1 is 1 and nleaf2 is 2. |
inter1 | int | The number of trajectory points in tr1 whose distance to tr2 meets both the distance and time tolerances. |
inter2 | int | The number of trajectory points in tr2 whose distance to tr1 meets both the distance and time tolerances. |
jaccard_lower | double | The lower bound of the Jaccard index. Calculated as min(inter1, inter2) / (nleaf1 + nleaf2 - min(inter1, inter2)). |
jaccard_upper | double | The upper bound of the Jaccard index. Calculated as max(inter1, inter2) / (nleaf1 + nleaf2 - max(inter1, inter2)). |
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
tr1 | trajectory | — | The first trajectory. |
tr2 | trajectory | — | The second trajectory. |
tol_dist | double | — | The maximum allowed distance between a matching pair of trajectory points, in meters. |
unit | text | '{}' | A JSON string that controls how distances are calculated. See unit parameter fields. |
tol_time | interval | NULL | The maximum allowed time difference between a matching pair of trajectory points. If NULL or negative, the function matches points based on distance only, ignoring timestamps. |
ts | timestamp | -infinity | The start of the time range. If specified, the function compares only the sub-trajectories between ts and te. |
te | timestamp | infinity | The end of the time range. If specified, the function compares only the sub-trajectories between ts and te. |
unit parameter fields
| Field | Type | Default | Description |
|---|---|---|---|
Projection | string | None | The coordinate system to re-project the trajectories into before calculating distances. Valid values: auto (dynamically selects Lambert Azimuthal or UTM based on longitude and latitude; distance unit is meters), srid (re-projects using the specified spatial reference identifier (SRID)). If omitted, calculations use the original coordinate system. |
Unit | string | null | The unit for distance measurement. Valid values: null (Euclidean distance based on raw coordinates), M (distance based on the spatial reference of the trajectories, typically meters). |
useSpheroid | bool | true | Specifies whether to use an ellipsoid model when Unit is M. true: uses an ellipsoid for accurate distances. false: uses a sphere for approximate distances. |
How it works
The classical Jaccard index for two sets is the size of their intersection divided by the size of their union. For trajectories, this extends to spatial (and optionally temporal) matching: a trajectory point in tr1 is considered to intersect with tr2 if there is a point in tr2 within tol_dist meters (and, if tol_time is set, within the specified time window).
Because the matching relationship is not symmetric — tr1 passing near a point on tr2 once is counted differently from tr2 passing near the same point twice — the function returns two bounds rather than a single value:
`jaccard_lower` uses the smaller intersection count (
min(inter1, inter2)), giving a conservative estimate of overlap. Use this when a confirmed minimum level of similarity is required.`jaccard_upper` uses the larger intersection count (
max(inter1, inter2)), giving an optimistic estimate of overlap. Use this to capture the broadest possible match between the two trajectories.
Both values range from 0 (no overlap) to 1 (complete overlap).
Example
The following example compares two trajectories within a three-day time window, using a 100-meter distance tolerance and a 20-second time tolerance.
WITH traj AS (
SELECT
ST_makeTrajectory('STPOINT'::leaftype,
'SRID=4326;LINESTRING(114.49211 37.97921,114.49211 37.97921,114.49211 37.97921,114.49211 37.97921)'::geometry,
ARRAY[
to_timestamp(1590287775) AT TIME ZONE 'UTC',
to_timestamp(1590287778) AT TIME ZONE 'UTC',
to_timestamp(1590302169) AT TIME ZONE 'UTC',
to_timestamp(1590302171) AT TIME ZONE 'UTC'
], '{}') a,
ST_makeTrajectory('STPOINT'::leaftype,
'SRID=4326;LINESTRING(114.49211 37.97921,114.49211 37.97921,114.49211 37.97921,114.49211 37.97921,114.49145 37.97781,114.49145 37.97781,114.49145 37.97781,114.49145 37.97781,114.49145 37.97781,114.49145 37.97781,114.49145 37.97781,114.49145 37.97781,114.49145 37.97781,114.49145 37.97781,114.49211 37.97921,114.49211 37.97921,114.49211 37.97921,114.49211 37.97921,114.49211 37.97921,114.49211 37.97921)'::geometry,
ARRAY[
to_timestamp(1590287765) AT TIME ZONE 'UTC',
to_timestamp(1590287771) AT TIME ZONE 'UTC',
to_timestamp(1590287778) AT TIME ZONE 'UTC',
to_timestamp(1590287780) AT TIME ZONE 'UTC',
to_timestamp(1590295992) AT TIME ZONE 'UTC',
to_timestamp(1590295997) AT TIME ZONE 'UTC',
to_timestamp(1590296013) AT TIME ZONE 'UTC',
to_timestamp(1590296018) AT TIME ZONE 'UTC',
to_timestamp(1590296025) AT TIME ZONE 'UTC',
to_timestamp(1590296032) AT TIME ZONE 'UTC',
to_timestamp(1590296055) AT TIME ZONE 'UTC',
to_timestamp(1590296073) AT TIME ZONE 'UTC',
to_timestamp(1590296081) AT TIME ZONE 'UTC',
to_timestamp(1590296081) AT TIME ZONE 'UTC',
to_timestamp(1590302169) AT TIME ZONE 'UTC',
to_timestamp(1590302174) AT TIME ZONE 'UTC',
to_timestamp(1590302176) AT TIME ZONE 'UTC',
to_timestamp(1590302176) AT TIME ZONE 'UTC',
to_timestamp(1590302172) AT TIME ZONE 'UTC',
to_timestamp(1590302176) AT TIME ZONE 'UTC'
], '{}') b
)
SELECT ST_JaccardSimilarity(a, b, 100, '{"unit":"M"}', '20 second',
'2020-05-23'::timestamptz AT TIME ZONE 'UTC',
'2020-05-26'::timestamptz AT TIME ZONE 'UTC')
FROM traj;Output:
st_jaccardsimilarity
-----------------------------------
(4,20,4,10,0.2,0.714285714285714)
(1 row)The result maps to the return fields as follows:
| Field | Value | Meaning |
|---|---|---|
nleaf1 | 4 | 4 points in tr1 intersect with tr2 |
nleaf2 | 20 | 20 points in tr2 intersect with tr1 |
inter1 | 4 | 4 points in tr1 meet the distance and time tolerances |
inter2 | 10 | 10 points in tr2 meet the distance and time tolerances |
jaccard_lower | 0.2 | Conservative similarity: 4 / (4 + 20 - 4) |
jaccard_upper | 0.714... | Optimistic similarity: 10 / (4 + 20 - 10) |