JSON Partial Similarity – Search Examples

This short article gives examples of JSON partial similarity and optimization tips.

By digoal

Background

Examples of Capturing Partial Similarity

A JSON field contains several KEYs. The number of KEYs, the contents, and the KEY may be different in each line.
The number of keys may exceed 1000, which exceeds the maximum number of columns that a single row can accommodate.
KEY may increase dynamically during the business.

Capture Partial Similarity:

A KEY is required.
It is required to retrieve similar records according to the VALUE (value, time, and other comparable data) of this KEY.
Sort by similarity (Output the records closest to the value first, which requires the support of the <-> operator and the GiST index), and return N records.

How to Optimize

1. The number of KEYs, the contents, and the KEY in each row may be different. Moreover, the number of KEYs may exceed 1000, which exceeds the maximum number of columns a single row can accommodate. KEY may increase dynamically during the business. Therefore, only JSON type is the most suitable choice.

2. A KEY is required. If the number of records returned is very large, there will be a lot of records to be sorted later. Sorting consumes CPU.

3. It is required to retrieve similar records according to the VALUE (value, time, and other comparable data) of this KEY.

If indexes are added, expression indexes must be added to each possible KEY, and the number of indexes may be particularly large. If frequent writing is made, the writing delay will be longer due to each additional index, which reduces the writing performance significantly.

https://www.postgresql.org/docs/current/functions-json.html

create index idx_t_1 on t using btree ((jsb->>'key_name')::numeric) where jsb ? 'key_name';

create extension btree_gist;  
create index idx_t_1 on t using gist ((jsb->>'key_name')::numeric) where jsb ? 'key_name';  
  
select * from t where jsb ? 'key_name' order by (jsb->>'key_name')::numeric <-> 1.234 limit 10;

4. Sort by similarity (Output the records closest to the value first, which requires the support of the <-> operators and the GiST index) and return N records.

You can use the btree_gist (a GiST index operator), or you can take N records forward and backward and return them:

create table tbl (id int, info text);  
insert into tbl select generate_series(1,1000);  
create extension btree_gist;  
create index idx_tbl on tbl using gist(id);  
postgres=# select * from tbl order by id <-> 100 limit 5;  
 id  | info   
-----+------  
 100 |   
 101 |   
  99 |   
 102 |   
  98 |   
(5 rows)

drop index idx_tbl;  
create index idx_tbl on tbl using btree(id);    
  
postgres=# with a as (select * from tbl where id>=100 order by id limit 5),   
b as (select * from tbl where id<100 order by id desc limit 5)   
select * from (select * from a union all select * from b) t   
order by abs(id-100)    
limit 5;  
  
 id  | info   
-----+------  
 100 |   
 101 |   
  99 |   
  98 |   
 102 |   
(5 rows)

0 0 0

Share on

: digoal

281 posts | 24 followers
Follow

Community

JSON Partial Similarity – Search Examples

Background

How to Optimize

Read previous post:

Read next post:

digoal

You may also like

Comments

digoal

Related Products

PolarDB for PostgreSQL

AnalyticDB for PostgreSQL

ApsaraDB RDS for PostgreSQL

Database for FinTech Solution