By digoal
In content retrieval scenarios, such as website article searches and content searches via search engines, aside from merely determining if the content matches, a crucial demand is to sort the content based on the matching degree and content freshness. Content with a higher matching degree, freshness, and weight is given higher priority.
Scenario 1: On blog websites, a blogger can assign tags to each article upon publishing, and an article may have multiple tags. Moreover, the longer it has been since the article was published, the fresher it is. When a user selects a tag while browsing, articles with that tag are returned in order of freshness.
Scenario 2: Building on the previous scenario, each tag corresponding to an article is assigned a weight, indicating the matching degree of the article with that tag. When a user selects a tag while browsing, articles with that tag are returned in order of matching degree and freshness.
Scenario 3: On blog websites, articles matched by content are returned in order of matching degree (rank).
Traditional databases lack array types and do not support one-to-many data structure storage. Traditional databases also lack word embedding types, which means they do not support word embedding full-text searches and word embedding matching degree sorting.
Scenario 1
1. Create a tag table
drop table if exists tbl_art_tag;
drop table if exists tbl_tag;
create table tbl_tag (
id int primary key, -- TagID
info text -- Tag content
);
2. Write 1,000 preset labels
insert into tbl_tag select generate_series(1,1000), md5(random()::text);
3. Create an article table to record each tag of each article
drop table if exists tbl_art_tag;
create unlogged table tbl_art_tag (
art_id int, -- Article ID
tid int references tbl_tag(id), -- Tag ID
ts timestamp -- The timestamp of the article
);
4. Create a sequence
drop sequence if exists seq;
create sequence seq INCREMENT by 1;
5. Write pgbench script to randomly generate the number of tags (1 to 1,000) for each article, generate random tag IDs, and write them into the article table.
vi t1.sql
\set tags random_gaussian(1, 1000, 2.5)
select nextval('seq') as art_id \gset
insert into tbl_art_tag select :art_id, ceil(random()*1000), now() from generate_series(1, :tags);
6. Generate about 10 million pieces of data.
pgbench -M prepared -n -r -P 1 -f ./t1.sql -c 10 -j 10 -t 2000
7. Create a tag and timestamp field index
create index on tbl_art_tag (tid,ts);
8. When browsing, the user selects a tag, and the top 100 articles with this tag are returned in order of freshness.
postgres=# explain analyze select art_id,ts from tbl_art_tag where tid=500 order by ts desc limit 100;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.43..103.31 rows=100 width=12) (actual time=0.134..0.597 rows=100 loops=1)
-> Index Scan using tbl_art_tag_tid_ts_idx on tbl_art_tag (cost=0.43..10238.91 rows=9952 width=12) (actual time=0.131..0.575 rows=100 loops=1)
Index Cond: (tid = 500)
Planning Time: 0.441 ms
Execution Time: 0.689 ms
(5 rows)
9. Performance stress testing
vi t2.sql
\set tid random(1,1000)
select art_id,ts from tbl_art_tag where tid=:tid order by ts desc limit 100;
pgbench -M prepared -n -r -P 1 -f ./t2.sql -c 10 -j 10 -T 120
transaction type: ./t2.sql
scaling factor: 1
query mode: prepared
number of clients: 10
number of threads: 10
duration: 120 s
number of transactions actually processed: 9185044
latency average = 0.131 ms
latency stddev = 0.058 ms
initial connection time = 21.476 ms
tps = 76555.526245 (without initial connection time)
statement latencies in milliseconds:
0.000 \set tid random(1,1000)
0.130 select art_id,ts from tbl_art_tag where tid=:tid order by ts desc limit 100;
10. Search for articles with multiple tags and return them on time.
postgres=# explain analyze select art_id,ts from tbl_art_tag where tid in ( 490,500 ) order by ts desc limit 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=18800.31..18800.56 rows=100 width=12) (actual time=256.937..256.948 rows=100 loops=1)
-> Sort (cost=18800.31..18850.07 rows=19904 width=12) (actual time=256.928..256.932 rows=100 loops=1)
Sort Key: ts
Sort Method: top-N heapsort Memory: 29kB
-> Bitmap Heap Scan on tbl_art_tag (cost=229.93..18039.59 rows=19904 width=12) (actual time=15.951..253.327 rows=20040 loops=1)
Recheck Cond: (tid = ANY ('{490,500}'::integer[]))
Heap Blocks: exact=16804
-> Bitmap Index Scan on tbl_art_tag_tid_ts_idx (cost=0.00..224.95 rows=19904 width=0) (actual time=12.534..12.534 rows=20040 loops=1)
Index Cond: (tid = ANY ('{490,500}'::integer[]))
Planning Time: 2.119 ms
Execution Time: 257.157 ms
(11 rows)
11. Performance testing
vi t22.sql
\set tid1 random(1,1000)
\set tid2 random(1,1000)
select art_id,ts from tbl_art_tag where tid in(:tid1,:tid2) order by ts desc limit 100;
pgbench -M prepared -n -r -P 1 -f ./t22.sql -c 10 -j 10 -T 120
transaction type: ./t22.sql
scaling factor: 1
query mode: prepared
number of clients: 10
number of threads: 10
duration: 120 s
number of transactions actually processed: 13281
latency average = 90.357 ms
latency stddev = 10.483 ms
initial connection time = 22.151 ms
tps = 110.661414 (without initial connection time)
statement latencies in milliseconds:
0.001 \set tid1 random(1,1000)
0.000 \set tid2 random(1,1000)
90.621 select art_id,ts from tbl_art_tag where tid in(:tid1,:tid2) order by ts desc limit 100;
12. Think about how to write SQL if you want to search for articles set with multiple tags.
postgres=# explain analyze select art_id,max(ts) as ts,count(*) from tbl_art_tag where tid in ( 490,500 ) group by 1 having count(*)>=2 order by ts desc limit 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=18477.15..18477.40 rows=100 width=20) (actual time=153.256..153.263 rows=100 loops=1)
-> Sort (cost=18477.15..18486.66 rows=3807 width=20) (actual time=153.244..153.248 rows=100 loops=1)
Sort Key: (max(ts)) DESC
Sort Method: top-N heapsort Memory: 38kB
-> HashAggregate (cost=18188.87..18331.65 rows=3807 width=20) (actual time=152.051..152.890 rows=5308 loops=1)
Group Key: art_id
Filter: (count(*) >= 2)
Batches: 1 Memory Usage: 1681kB
Rows Removed by Filter: 6786
-> Bitmap Heap Scan on tbl_art_tag (cost=229.93..18039.59 rows=19904 width=12) (actual time=13.819..143.743 rows=20040 loops=1)
Recheck Cond: (tid = ANY ('{490,500}'::integer[]))
Heap Blocks: exact=16804
-> Bitmap Index Scan on tbl_art_tag_tid_ts_idx (cost=0.00..224.95 rows=19904 width=0) (actual time=10.269..10.269 rows=20040 loops=1)
Index Cond: (tid = ANY ('{490,500}'::integer[]))
Planning Time: 0.346 ms
Execution Time: 153.743 ms
(16 rows)
13. Performance testing
vi t222.sql
\set tid1 random(1,1000)
\set tid2 random(1,1000)
select art_id,max(ts) as ts,count(*) from tbl_art_tag where tid in ( :tid1, :tid2 ) group by 1 having count(*)>=2 order by ts desc limit 100;
pgbench -M prepared -n -r -P 1 -f ./t222.sql -c 10 -j 10 -T 120
transaction type: ./t222.sql
scaling factor: 1
query mode: prepared
number of clients: 10
number of threads: 10
duration: 120 s
number of transactions actually processed: 11918
latency average = 100.702 ms
latency stddev = 12.866 ms
initial connection time = 21.952 ms
tps = 99.297985 (without initial connection time)
statement latencies in milliseconds:
0.001 \set tid1 random(1,1000)
0.000 \set tid2 random(1,1000)
100.711 select art_id,max(ts) as ts,count(*) from tbl_art_tag where tid in ( :tid1, :tid2 ) group by 1 having count(*)>=2 order by ts d
Scenario 2
Try to design it by yourselves.
Scenario 3
Not supported
Scenario 1
Based on the traditional method scenario 1, convert all tags of an article to array storage.
1. Create a RUM plug-in to support the RUM index interface. RUM is an index that is enhanced based on Gin. It features inverted sorting and includes additional information attached to each tuple item pointer, such as field values other than multi-value types, which can be utilized for additional sorting operations.
create extension rum;
2. Create an article table to indicate the one-to-many relationship between articles and tags.
drop table if exists tbl_art_tag1;
create unlogged table tbl_art_tag1 (
art_id int primary key,
tid int[],
ts timestamp
);
3. Use the data of the traditional method scenario 1 to convert all tags of an article into array storage.
insert into tbl_art_tag1 select art_id, array_agg(tid), max(ts) from tbl_art_tag group by 1;
4. Create RUM index, tag array + timestamp.
CREATE INDEX ON tbl_art_tag1 USING rum (tid rum_anyarray_addon_ops, ts)
WITH (attach = 'ts', to = 'tid');
5. When browsing, the user selects a tag, and the top 100 articles with this tag are returned in order of freshness.
postgres=# explain analyze select art_id,ts from tbl_art_tag1 where tid @> array[500] order by ts <=> '2099-01-01' limit 100;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=4.40..32.74 rows=100 width=20) (actual time=5.699..6.239 rows=100 loops=1)
-> Index Scan using tbl_art_tag1_tid_ts_idx on tbl_art_tag1 (cost=4.40..2184.46 rows=7693 width=20) (actual time=5.695..6.193 rows=100 loops=1)
Index Cond: (tid @> '{500}'::integer[])
Order By: (ts <=> '2099-01-01 00:00:00'::timestamp without time zone)
Planning Time: 0.378 ms
Execution Time: 6.348 ms
(6 rows)
6. Performance stress testing
Compared with the traditional method, the performance drops from 76,555 to 2,845 TPS.
The reason is that the tuple items in the RUM index are not stored in the order of the additional information value (i.e., ts here), which introduces some sorting overhead.
In contrast, the traditional B-tree index does not have this extra sorting overhead.
vi t3.sql
\set tid random(1,1000)
select art_id from tbl_art_tag1 where tid @> (array[:tid])::int[] order by ts <=> '2099-01-01' limit 100;
pgbench -M prepared -n -r -P 1 -f ./t3.sql -c 10 -j 10 -T 120
transaction type: ./t3.sql
scaling factor: 1
query mode: prepared
number of clients: 10
number of threads: 10
duration: 120 s
number of transactions actually processed: 341387
latency average = 3.514 ms
latency stddev = 1.119 ms
initial connection time = 19.174 ms
tps = 2845.241086 (without initial connection time)
statement latencies in milliseconds:
0.000 \set tid random(1,1000)
3.515 select art_id from tbl_art_tag1 where tid @> (array[:tid])::int[] order by ts <=> '2099-01-01' limit 100;
7. Comparison of the space occupied by the tag table and index between the traditional method and RUM method:
The tag table in RUM is only about one-tenth of that in the traditional method, while the index is less than half of the traditional method.
postgres=# \dt+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+--------------+-------+----------+-------------+---------------+--------+-------------
public | tbl_art_tag | table | postgres | unlogged | heap | 424 MB |
public | tbl_art_tag1 | table | postgres | unlogged | heap | 43 MB |
public | tbl_tag | table | postgres | permanent | heap | 104 kB |
(3 rows)
postgres=# \di+
List of relations
Schema | Name | Type | Owner | Table | Persistence | Access method | Size | Description
--------+-------------------------+-------+----------+--------------+-------------+---------------+--------+-------------
public | tbl_art_tag1_pkey | index | postgres | tbl_art_tag1 | unlogged | btree | 592 kB |
public | tbl_art_tag1_tid_ts_idx | index | postgres | tbl_art_tag1 | unlogged | rum | 101 MB |
public | tbl_art_tag_tid_ts_idx | index | postgres | tbl_art_tag | unlogged | btree | 267 MB |
public | tbl_tag_pkey | index | postgres | tbl_tag | permanent | btree | 40 kB |
(4 rows)
8. Search for articles with multiple tags and return them on time.
postgres=# explain analyze select art_id,ts from tbl_art_tag1 where tid && (array[490,500])::int[] order by ts desc limit 100;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=2575.92..2576.17 rows=100 width=12) (actual time=21.177..21.194 rows=100 loops=1)
-> Sort (cost=2575.92..2606.82 rows=12359 width=12) (actual time=21.173..21.181 rows=100 loops=1)
Sort Key: ts DESC
Sort Method: top-N heapsort Memory: 32kB
-> Bitmap Heap Scan on tbl_art_tag1 (cost=110.08..2103.57 rows=12359 width=12) (actual time=6.189..19.271 rows=12094 loops=1)
Recheck Cond: (tid && '{490,500}'::integer[])
Heap Blocks: exact=1835
-> Bitmap Index Scan on tbl_art_tag1_tid_ts_idx (cost=0.00..106.99 rows=12359 width=0) (actual time=5.727..5.727 rows=12094 loops=1)
Index Cond: (tid && '{490,500}'::integer[])
Planning Time: 0.550 ms
Execution Time: 21.315 ms
(11 rows)
9. Performance testing
vi t33.sql
\set tid1 random(1,1000)
\set tid2 random(1,1000)
select art_id,ts from tbl_art_tag1 where tid && (array[:tid1 , :tid2])::int[] order by ts desc limit 100;
pgbench -M prepared -n -r -P 1 -f ./t33.sql -c 10 -j 10 -T 120
transaction type: ./t33.sql
scaling factor: 1
query mode: prepared
number of clients: 10
number of threads: 10
duration: 120 s
number of transactions actually processed: 170336
latency average = 7.044 ms
latency stddev = 2.417 ms
initial connection time = 24.006 ms
tps = 1419.594182 (without initial connection time)
statement latencies in milliseconds:
0.000 \set tid1 random(1,1000)
0.000 \set tid2 random(1,1000)
7.045 select art_id,ts from tbl_art_tag1 where tid && (array[:tid1 , :tid2])::int[] order by ts desc limit 100;
10. Think about how to write SQL if you want to search for articles set with multiple tags.
postgres=# explain analyze select art_id,ts from tbl_art_tag1 where tid @> (array[490,500])::int[] order by ts desc limit 100;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1818.22..1818.47 rows=100 width=12) (actual time=10.007..10.026 rows=100 loops=1)
-> Sort (cost=1818.22..1825.51 rows=2917 width=12) (actual time=10.005..10.014 rows=100 loops=1)
Sort Key: ts DESC
Sort Method: top-N heapsort Memory: 34kB
-> Bitmap Heap Scan on tbl_art_tag1 (cost=33.61..1706.73 rows=2917 width=12) (actual time=3.108..9.403 rows=3182 loops=1)
Recheck Cond: (tid @> '{490,500}'::integer[])
Heap Blocks: exact=1474
-> Bitmap Index Scan on tbl_art_tag1_tid_ts_idx (cost=0.00..32.88 rows=2917 width=0) (actual time=2.867..2.869 rows=3182 loops=1)
Index Cond: (tid @> '{490,500}'::integer[])
Planning Time: 0.204 ms
Execution Time: 10.186 ms
(11 rows)
11. Performance testing
vi t333.sql
\set tid1 random(1,1000)
\set tid2 random(1,1000)
select art_id,ts from tbl_art_tag1 where tid @> (array[:tid1, :tid2])::int[] order by ts desc limit 100;
pgbench -M prepared -n -r -P 1 -f ./t333.sql -c 10 -j 10 -T 120
transaction type: ./t333.sql
scaling factor: 1
query mode: prepared
number of clients: 10
number of threads: 10
duration: 120 s
number of transactions actually processed: 281008
latency average = 4.269 ms
latency stddev = 1.339 ms
initial connection time = 23.848 ms
tps = 2342.110257 (without initial connection time)
statement latencies in milliseconds:
0.000 \set tid1 random(1,1000)
0.000 \set tid2 random(1,1000)
4.269 select art_id,ts from tbl_art_tag1 where tid @> (array[:tid1, :tid2])::int[] order by ts desc limit 100;
Scenario 2
Try to design it by yourselves.
Scenario 3
1. Create a table to store the original text and word embedding.
CREATE TABLE test_rum(t text, a tsvector);
2. Write a trigger to automatically convert the original text to word embedding. Insert data regardless of how the word embedding is generated.
CREATE TRIGGER tsvectorupdate
BEFORE UPDATE OR INSERT ON test_rum
FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger('a', 'pg_catalog.english', 't');
3. Insert the test text.
INSERT INTO test_rum(t) VALUES ('The situation is most beautiful');
INSERT INTO test_rum(t) VALUES ('It is a beautiful');
INSERT INTO test_rum(t) VALUES ('It looks like a beautiful place');
4. Query data.
postgres=# select * from test_rum;
t | a
---------------------------------+----------------------------------------
The situation is most beautiful | 'beauti':5 'situat':2
It is a beautiful | 'beauti':4
It looks like a beautiful place | 'beauti':5 'like':3 'look':2 'place':6
(3 rows)
5. Create a rum index.
CREATE INDEX rumidx ON test_rum USING rum (a rum_tsvector_ops);
6. Search for text according to the text search requirements, and return in the order of matching degree (rank).
postgres=# SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
FROM test_rum
WHERE a @@ to_tsquery('english', 'beautiful | place')
ORDER BY a <=> to_tsquery('english', 'beautiful | place');
t | rank
---------------------------------+----------
It looks like a beautiful place | 8.22467
The situation is most beautiful | 16.44934
It is a beautiful | 16.44934
(3 rows)
From the execution plan, you can see that the RUM index has taken effect, and the RUM index is used for search and sorting.
postgres=# explain SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
FROM test_rum
WHERE a @@ to_tsquery('english', 'beautiful | place')
ORDER BY a <=> to_tsquery('english', 'beautiful | place');
QUERY PLAN
------------------------------------------------------------------------
Index Scan using rumidx on test_rum (cost=2.20..4.41 rows=1 width=36)
Index Cond: (a @@ '''beauti'' | ''place'''::tsquery)
Order By: (a <=> '''beauti'' | ''place'''::tsquery)
(3 rows)
7. Create a function to automatically generate a string
CREATE OR REPLACE FUNCTION generate_random_string(word_count INTEGER, word_length INTEGER)
RETURNS TEXT AS $$
DECLARE
alphabet TEXT[] := array['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'];
random_string TEXT := '';
BEGIN
FOR i IN 1..word_count LOOP
DECLARE
random_word TEXT := '';
BEGIN
FOR j IN 1..word_length LOOP
random_word := random_word || alphabet[ceil(random() * array_length(alphabet, 1))];
END LOOP;
random_string := random_string || random_word || ' ';
END;
END LOOP;
RETURN TRIM(random_string);
END;
$$ LANGUAGE plpgsql;
8. Insert 1 million pieces of string data
insert into test_rum(t) select generate_random_string(100,5) from generate_series(1,1000000);
postgres=# select * from test_rum limit 10;
-[ RECORD 1 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
t | The situation is most beautiful
a | 'beauti':5 'situat':2
-[ RECORD 2 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
t | It is a beautiful
a | 'beauti':4
-[ RECORD 3 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
t | It looks like a beautiful place
a | 'beauti':5 'like':3 'look':2 'place':6
-[ RECORD 4 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
t | sbrga khbvj twxim sktyf zmrga lwqmz rqtiv mrugi tlynt kchdh psuni wdxvi ffcxw qvyrk umnzl ywjpe dkdiv wmwuh xrigo pjylk yoqjq mizsr uiolh upqex pdgit vngxj ztopr yjgbc fclci hldvo xundu mayws ltgct xkvzx caibh dwytd sbfij cthcx mbcgk dgyju ofgmx lxntg onyqh hjtsc crxhc ktrcd xxppn ouwnd rqbeq azdqy tgoje rrlrm fufsy wrjuh vzpud lqbih zxbhl bvtwy fixnv vojjx onpdb sphjc pufyc zmcbw etfbb bgqnt ykaug xtnhj hweqs yxwuc gqlht echao aerzi blvik bfuki pjmsl fviuk gfooa wkpsw pbylk iyckm bompq gsjyo wlgvf vuoye mjlvv qxkxq nkthg fsaui ynmaj nngak boxvs lsqbo sogdf iccef ouuch mmtoe awwoy llenr tzlpl
a | 'aerzi':73 'awwoy':98 'azdqi':50 'bfuki':75 'bgqnt':66 'blvik':74 'bompq':82 'boxv':92 'bvtwi':58 'caibh':35 'crxhc':45 'cthcx':38 'dgyju':40 'dkdiv':17 'dwytd':36 'echao':72 'etfbb':65 'fclci':29 'ffcxw':13 'fixnv':59 'fsaui':89 'fufsi':53 'fviuk':77 'gfooa':78 'gqlht':71 'gsjyo':83 'hjtsc':44 'hldvo':30 'hweq':69 'iccef':95 'iyckm':81 'kchdh':10 'khbvj':2 'ktrcd':46 'llenr':99 'lqbih':56 'lsqbo':93 'ltgct':33 'lwqmz':6 'lxntg':42 'mayw':32 'mbcgk':39 'mizsr':22 'mjlvv':86 'mmtoe':97 'mrugi':8 'nkthg':88 'nngak':91 'ofgmx':41 'onpdb':61 'onyqh':43 'ouuch':96 'ouwnd':48 'pbylk':80 'pdgit':25 'pjmsl':76 'pjylk':20 'psuni':11 'pufyc':63 'qvyrk':14 'qxkxq':87 'rqbeq':49 'rqtiv':7 'rrlrm':52 'sbfij':37 'sbrga':1 'sktyf':4 'sogdf':94 'sphjc':62 'tgoje':51 'tlynt':9 'twxim':3 'tzlpl':100 'uiolh':23 'umnzl':15 'upqex':24 'vngxj':26 'vojjx':60 'vuoy':85 'vzpud':55 'wdxvi':12 'wkpsw':79 'wlgvf':84 'wmwuh':18 'wrjuh':54 'xkvzx':34 'xrigo':19 'xtnhj':68 'xundu':31 'xxppn':47 'yjgbc':28 'ykaug':67 'ynmaj':90 'yoqjq':21 'ywjpe':16 'yxwuc':70 'zmcbw':64 'zmrga':5 'ztopr':27 'zxbhl':57
-[ RECORD 5 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
t | cjrtm yxkgi umhdb ovtvi ranbr xqgqn paxaa ulpnf gfyir vwzul kzldj edngz ktyzj nwoyb wfzas szmhd ljbcg vakcu amfbo mclwb ygiev nmbde oubyd dvzmd djdyn tmjot frxcm lemdl ufbio ljkcx uanqj dvmrn prbfv rfigm vmvxr qjyyw eucix mvotm zjpau phkyi huqyn vsyro uqbth goqsk trcwb pyzcs lusgw zsihz rdkiw fivxi xjsep iulju fpyhi lsxcs gdqoi xxvsc panma kxbnq bovrn kaucz dyokk aswtt mouqj uruwe awudz dzjjs ukitb vdvsq pisex nmyrx llzmu sldyz fakfg zyqjy fbqkt kyuht pnmub eozyd opulx xjamp fdcpm tyxwb znkra aqjsj qpfev sbqff oiblm kqkzj rvsmi xomei saazz fnrfw zreai cnzey ewslo agnst wozgd odido eewga klplv
a | 'agnst':96 'amfbo':19 'aqjsj':84 'aswtt':62 'awudz':65 'bovrn':59 'cjrtm':1 'cnzey':94 'djdyn':25 'dvmrn':32 'dvzmd':24 'dyokk':61 'dzjjs':66 'edngz':12 'eewga':99 'eozyd':78 'eucix':37 'ewslo':95 'fakfg':73 'fbqkt':75 'fdcpm':81 'fivxi':50 'fnrfw':92 'fpyhi':53 'frxcm':27 'gdqoi':55 'gfyir':9 'goqsk':44 'huqyn':41 'iulju':52 'kaucz':60 'klplv':100 'kqkzj':88 'ktyzj':13 'kxbnq':58 'kyuht':76 'kzldj':11 'lemdl':28 'ljbcg':17 'ljkcx':30 'llzmu':71 'lsxcs':54 'lusgw':47 'mclwb':20 'mouqj':63 'mvotm':38 'nmbde':22 'nmyrx':70 'nwoyb':14 'odido':98 'oiblm':87 'opulx':79 'oubyd':23 'ovtvi':4 'panma':57 'paxaa':7 'phkyi':40 'pisex':69 'pnmub':77 'prbfv':33 'pyzc':46 'qjyyw':36 'qpfev':85 'ranbr':5 'rdkiw':49 'rfigm':34 'rvsmi':89 'saazz':91 'sbqff':86 'sldyz':72 'szmhd':16 'tmjot':26 'trcwb':45 'tyxwb':82 'uanqj':31 'ufbio':29 'ukitb':67 'ulpnf':8 'umhdb':3 'uqbth':43 'uruw':64 'vakcu':18 'vdvsq':68 'vmvxr':35 'vsyro':42 'vwzul':10 'wfzas':15 'wozgd':97 'xjamp':80 'xjsep':51 'xomei':90 'xqgqn':6 'xxvsc':56 'ygiev':21 'yxkgi':2 'zjpau':39 'znkra':83 'zreai':93 'zsihz':48 'zyqji':74
-[ RECORD 6 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
t | gptii hvrxw czbfs ucwek kveek gcwms uueli ofttk svmty waxay gzdah lnsds nydew oqqrp nniey jzexb abvye aqafc gvslu pexdh wqbla cqlrt nybot bbgep gyinl gfwok ygumu qmzml bgdcm ruslq jetgm jtani lvphh yfequ hzkkp inyjt xkatw uutyy dfeqk bwvto czajw kkfpm ulvgs gidly wdbok shepp wzegi zgfek eotml wrkar lfung hlfil otyao jimnc ypejg atorh hcxie uhsoe bsmvf ufenk rkspg yvcfi hfocq angmx qbzyh smffj efldt lulkt wvbtr qleck bypmd oeobi xxuwd bhsdi wnfge zddnm sjyaf oalef zgluq tepyk vavxn nzpqh adxqm yuker wbddf nzcde omxnc egsyv begad zyazn lkulg dvcae niqyq kytoy oqxnf usbtc rcvlc rtdyp pmofi fjyjh
a | 'abvy':17 'adxqm':83 'angmx':64 'aqafc':18 'atorh':56 'bbgep':24 'begad':89 'bgdcm':29 'bhsdi':74 'bsmvf':59 'bwvto':40 'bypmd':71 'cqlrt':22 'czajw':41 'czbfs':3 'dfeqk':39 'dvcae':92 'efldt':67 'egsyv':88 'eotml':49 'fjyjh':100 'gcwms':6 'gfwok':26 'gid':44 'gptii':1 'gvslu':19 'gyinl':25 'gzdah':11 'hcxie':57 'hfocq':63 'hlfil':52 'hvrxw':2 'hzkkp':35 'inyjt':36 'jetgm':31 'jimnc':54 'jtani':32 'jzexb':16 'kkfpm':42 'kveek':5 'kytoy':94 'lfung':51 'lkulg':91 'lnsds':12 'lulkt':68 'lvphh':33 'niqyq':93 'nniey':15 'nybot':23 'nydew':13 'nzcde':86 'nzpqh':82 'oalef':78 'oeobi':72 'ofttk':8 'omxnc':87 'oqqrp':14 'oqxnf':95 'otyao':53 'pexdh':20 'pmofi':99 'qbzyh':65 'qleck':70 'qmzml':28 'rcvlc':97 'rkspg':61 'rtdyp':98 'ruslq':30 'shepp':46 'sjyaf':77 'smffj':66 'svmti':9 'tepyk':80 'ucwek':4 'ufenk':60 'uhso':58 'ulvg':43 'usbtc':96 'uueli':7 'uutyy':38 'vavxn':81 'waxay':10 'wbddf':85 'wdbok':45 'wnfge':75 'wqbla':21 'wrkar':50 'wvbtr':69 'wzegi':47 'xkatw':37 'xxuwd':73 'yfequ':34 'ygumu':27 'ypejg':55 'yuker':84 'yvcfi':62 'zddnm':76 'zgfek':48 'zgluq':79 'zyazn':90
-[ RECORD 7 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
t | pukxz cjzrt wluxj eqazl zvnrh lobbu jzvst vcpmw xoarq lbjhq vviss qfsnt jquem rsqes hgbza fvebq wftpg qdziu lejai sacek uhezq hbmxu wofun nrlaw soplm lcuup vczlu oedxf euthb hssfa itlml styog nnebt htoks goekj pkzlj mfjin yjtlt nqsck xemyn svhzp lnagi tpzvh nkarv aiadx qxrzs obaau jgnht yrljv dmshm olmyl zlnai adrjj qmkkw kpfpl mldrw ttmlk aqwhd bmuzl bechv bupch pzhap cbjqn ggmos kmmlh qujxv glusq ebwxw plhnv tdgdx pudbr mnccq zopgk nghke waokc qposi ddjqz ftkof yhfjw hevgc eyxec szgms woahp sqovp iuyyt wmnaq jkniq skbak ohebm cfzfh oxsqz dzhup ypqye naxaw thwqy cqhgo wlxmm xadir pmvpq ephrb
a | 'adrjj':53 'aiadx':45 'aqwhd':58 'bechv':60 'bmuzl':59 'bupch':61 'cbjqn':63 'cfzfh':90 'cjzrt':2 'cqhgo':96 'ddjqz':77 'dmshm':50 'dzhup':92 'ebwxw':68 'ephrb':100 'eqazl':4 'euthb':29 'eyxec':81 'ftkof':78 'fvebq':16 'ggmos':64 'glusq':67 'goekj':35 'hbmxu':22 'hevgc':80 'hgbza':15 'hssfa':30 'htok':34 'itlml':31 'iuyyt':85 'jgnht':48 'jkniq':87 'jquem':13 'jzvst':7 'kmmlh':65 'kpfpl':55 'lbjhq':10 'lcuup':26 'lejai':19 'lnagi':42 'lobbu':6 'mfjin':37 'mldrw':56 'mnccq':72 'naxaw':94 'nghke':74 'nkarv':44 'nnebt':33 'nqsck':39 'nrlaw':24 'obaau':47 'oedxf':28 'ohebm':89 'olmyl':51 'oxsqz':91 'pkzlj':36 'plhnv':69 'pmvpq':99 'pudbr':71 'pukxz':1 'pzhap':62 'qdziu':18 'qfsnt':12 'qmkkw':54 'qposi':76 'qujxv':66 'qxrzs':46 'rsqes':14 'sacek':20 'skbak':88 'soplm':25 'sqovp':84 'styog':32 'svhzp':41 'szgms':82 'tdgdx':70 'thwqi':95 'tpzvh':43 'ttmlk':57 'uhezq':21 'vcpmw':8 'vczlu':27 'vviss':11 'waokc':75 'wftpg':17 'wluxj':3 'wlxmm':97 'wmnaq':86 'woahp':83 'wofun':23 'xadir':98 'xemyn':40 'xoarq':9 'yhfjw':79 'yjtlt':38 'ypqye':93 'yrljv':49 'zlnai':52 'zopgk':73 'zvnrh':5
-[ RECORD 8 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
t | ibutz qmbmp sfjqk gkcpb gwvwu ytoum ueuml xepij pxnvm zsxbp jasvu mmzbj uzoop zzhnk hsley thtuf vmudf kaygi atapb yruei vgwnx ldfkf xbivd wudtk emwqf wpdte sxjso vhumu khign kuyky tsvlz gsmpa ggerz rzsyi aeivx frumg ahkdm kwugm zamtl uzuxf pjenn pzcnu jjtno uljug ggmjl xrbvv ysdwi mssmm exjpu pmzro hsrpe kpfkm lzfta vnijf twpmz ypzux usmxi vcfrg jfilh jeuif vwjmz opmpe ijgjf velyx fnqxe xkahq qlacq bmnwm lczfq yvzsi wvxic omexp vojpg bburj jjmjs essbp edqjf nprnw ynksp cucrd udexz fizco jikmi flicv kekoq rmgde iglnc qnjfp tvigf fxsbg sropz yvkdy ipmwa slkft npkxu xelep oxzna wrdts xuybl vwqrx
a | 'aeivx':35 'ahkdm':37 'atapb':19 'bburj':74 'bmnwm':68 'cucrd':80 'edqjf':77 'emwqf':25 'essbp':76 'exjpu':49 'fizco':82 'flicv':84 'fnqxe':65 'frumg':36 'fxsbg':90 'ggerz':33 'ggmjl':45 'gkcpb':4 'gsmpa':32 'gwvwu':5 'hsley':15 'hsrpe':51 'ibutz':1 'iglnc':87 'ijgjf':63 'ipmwa':93 'jasvu':11 'jeuif':60 'jfilh':59 'jikmi':83 'jjmjs':75 'jjtno':43 'kaygi':18 'kekoq':85 'khign':29 'kpfkm':52 'kuyki':30 'kwugm':38 'lczfq':69 'ldfkf':22 'lzfta':53 'mmzbj':12 'mssmm':48 'npkxu':95 'nprnw':78 'omexp':72 'opmp':62 'oxzna':97 'pjenn':41 'pmzro':50 'pxnvm':9 'pzcnu':42 'qlacq':67 'qmbmp':2 'qnjfp':88 'rmgde':86 'rzsyi':34 'sfjqk':3 'slkft':94 'sropz':91 'sxjso':27 'thtuf':16 'tsvlz':31 'tvigf':89 'twpmz':55 'udexz':81 'ueuml':7 'uljug':44 'usmxi':57 'uzoop':13 'uzuxf':40 'vcfrg':58 'velyx':64 'vgwnx':21 'vhumu':28 'vmudf':17 'vnijf':54 'vojpg':73 'vwjmz':61 'vwqrx':100 'wpdte':26 'wrdts':98 'wudtk':24 'wvxic':71 'xbivd':23 'xelep':96 'xepij':8 'xkahq':66 'xrbvv':46 'xuybl':99 'ynksp':79 'ypzux':56 'yruei':20 'ysdwi':47 'ytoum':6 'yvkdi':92 'yvzsi':70 'zamtl':39 'zsxbp':10 'zzhnk':14
-[ RECORD 9 ]-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
t | zesxj jwmiy lwdhs mniqq pjotx gzkgg bmgpx emand souux lhqxg hdpry fxrgt iykze wxyfo drmgf sadxf axfwe yzqbl pzfsg rwgkv pcten koiin ncsjo cuspk hdstv irswe yhoer bkgyp ryndt tspgx tlraa cpmew rtawa nywad jpnjg dravu vyumk idbah kbegj hvzmw xwaob yccgr edwda sxmoa knfnj tggmh mgdbm igbku imoxi wnarq fockx ufehp fgohg avxca ntbzj muhis igirz hxzns ynlff vzxdm owanl nabkn opttw zkxyx eavcw zvbfk irhzd jjpyo hlgqy nruch ovikn qzyef mpkyr uaebu njycu biqyj zjsju kdtsp ibtls mpfly rlgva pivqg petpc rrmny edgya kvhqi lfvkl aerux drqeg nqpxe izcuq psqmo fpakt pfaeq nltpr hsykx jeiul bkudj yftnb zqybi
a | 'aerux':88 'avxca':54 'axfw':17 'biqyj':76 'bkgyp':28 'bkudj':98 'bmgpx':7 'cpmew':32 'cuspk':24 'dravu':36 'drmgf':15 'drqeg':89 'eavcw':65 'edgya':85 'edwda':43 'emand':8 'fgohg':53 'fockx':51 'fpakt':93 'fxrgt':12 'gzkgg':6 'hdpri':11 'hdstv':25 'hlgqi':69 'hsykx':96 'hvzmw':40 'hxzns':58 'ibtl':79 'idbah':38 'igbku':48 'igirz':57 'imoxi':49 'irhzd':67 'irsw':26 'iykz':13 'izcuq':91 'jeiul':97 'jjpyo':68 'jpnjg':35 'jwmiy':2 'kbegj':39 'kdtsp':78 'knfnj':45 'koiin':22 'kvhqi':86 'lfvkl':87 'lhqxg':10 'lwdhs':3 'mgdbm':47 'mniqq':4 'mpfli':80 'mpkyr':73 'muhi':56 'nabkn':62 'ncsjo':23 'njycu':75 'nltpr':95 'nqpxe':90 'nruch':70 'ntbzj':55 'nywad':34 'opttw':63 'ovikn':71 'owanl':61 'pcten':21 'petpc':83 'pfaeq':94 'pivqg':82 'pjotx':5 'psqmo':92 'pzfsg':19 'qzyef':72 'rlgva':81 'rrmni':84 'rtawa':33 'rwgkv':20 'ryndt':29 'sadxf':16 'souux':9 'sxmoa':44 'tggmh':46 'tlraa':31 'tspgx':30 'uaebu':74 'ufehp':52 'vyumk':37 'vzxdm':60 'wnarq':50 'wxyfo':14 'xwaob':41 'yccgr':42 'yftnb':99 'yhoer':27 'ynlff':59 'yzqbl':18 'zesxj':1 'zjsju':77 'zkxyx':64 'zqybi':100 'zvbfk':66
-[ RECORD 10 ]------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
t | hevbf aqlwc vqmfz dypon dxraa rkhuy ogvmk ebicg yxjst nbykd taavf qofvn mnawt yyvfl ehcom hsape gwbpc njizy niarb jupek tpzsm hzomo wnsdj avjzp xmckm gtwjx eblgr odhto isosm piqxy lgbmj tjqim ilnnv gylpu twjdi gmeui fwcyl lqzgt izzqh oqfca toqql cfhpc ldgbo ryxqi hitsl rtnwu wrydi qzmjq nmjqp yltgh esmgi eeiry hijdl sjmho bxcxv vwtma roaee tnljb ovrhb diaku uowuh rkeym tmolb xwqnw izkje hydni ynlbr pidfz htlux tiirs fjaan gpvnb jtwki cjbsj soppu tbhir zuwpx cpazm adltt rgxft feyiy hehrk hxojf hbhcn uxnoq udjxn blqrl krrti hqtzy juidi yhnfj osesl letzy ogzjs mpoce kmbxu bvcux ljeay zgcmw tykcq
a | 'adltt':79 'aqlwc':2 'avjzp':24 'blqrl':87 'bvcux':97 'bxcxv':55 'cfhpc':42 'cjbsj':74 'cpazm':78 'diaku':60 'dxraa':5 'dypon':4 'ebicg':8 'eblgr':27 'eeiri':52 'ehcom':15 'esmgi':51 'feyiy':81 'fjaan':71 'fwcyl':37 'gmeui':36 'gpvnb':72 'gtwjx':26 'gwbpc':17 'gylpu':34 'hbhcn':84 'hehrk':82 'hevbf':1 'hijdl':53 'hitsl':45 'hqtzi':89 'hsape':16 'htlux':69 'hxojf':83 'hydni':66 'hzomo':22 'ilnnv':33 'isosm':29 'izkj':65 'izzqh':39 'jtwki':73 'juidi':90 'jupek':20 'kmbxu':96 'krrti':88 'ldgbo':43 'letzi':93 'lgbmj':31 'ljeay':98 'lqzgt':38 'mnawt':13 'mpoce':95 'nbykd':10 'niarb':19 'njizi':18 'nmjqp':49 'odhto':28 'ogvmk':7 'ogzj':94 'oqfca':40 'osesl':92 'ovrhb':59 'pidfz':68 'piqxi':30 'qofvn':12 'qzmjq':48 'rgxft':80 'rkeym':62 'rkhuy':6 'roaee':57 'rtnwu':46 'ryxqi':44 'sjmho':54 'soppu':75 'taavf':11 'tbhir':76 'tiir':70 'tjqim':32 'tmolb':63 'tnljb':58 'toqql':41 'tpzsm':21 'twjdi':35 'tykcq':100 'udjxn':86 'uowuh':61 'uxnoq':85 'vqmfz':3 'vwtma':56 'wnsdj':23 'wrydi':47 'xmckm':25 'xwqnw':64 'yhnfj':91 'yltgh':50 'ynlbr':67 'yxjst':9 'yyvfl':14 'zgcmw':99 'zuwpx':77
9. Test the performance of 'search for text and return in the order of rank'.
vi t4.sql
SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
FROM test_rum
WHERE a @@ to_tsquery('english', 'beautiful | place')
ORDER BY a <=> to_tsquery('english', 'beautiful | place') limit 10;
pgbench -M prepared -n -r -P 1 -f ./t4.sql -c 10 -j 10 -T 120
transaction type: ./t4.sql
scaling factor: 1
query mode: prepared
number of clients: 10
number of threads: 10
duration: 120 s
number of transactions actually processed: 16124603
latency average = 0.074 ms
latency stddev = 0.049 ms
initial connection time = 20.692 ms
tps = 134394.399840 (without initial connection time)
statement latencies in milliseconds:
0.074 SELECT t, a <=> to_tsquery('english', 'beautiful | place') AS rank
1. Comparison of article tag storage space
Implementation | Space occupied by 10 million data tables | Space occupied by index | Description |
---|---|---|---|
Traditional method | 424MB | 267MB | Articles and tags use a one-to-one (scalar type) storage structure and Btree indexes, which occupy a large space. |
RUM index interface | 43MB | 101MB | Articles and tags use a one-to-many (array type) storage structure and RUM indexes, which occupy a small space. |
2. Comparison of article tag query performance
Implementation | Single tag query TPS | Union query TPS of 2 tags | Intersection query TPS of 2 tags | Description |
---|---|---|---|---|
Traditional method | 76555 | 110 | 99 | Traditional Btree indexes do not have additional sort overhead for a single tag, but multi-tag scenarios do not support sorting and a large number of tuple queries and sorting are required. |
RUM index interface | 2845 | 1419 | 2342 | The tuple items in the RUM index are not stored in the order of the additional information value (i.e. ts in this case), and in the single tag scenario, the additional sort overhead is a problem.However, when there are multi-tag queries, the inverted index shows its performance advantages. |
3. Word embedding search comparison
Implementation | Search and rank sorting TPS of 1 million text vectors |
---|---|
Traditional method | Not supported |
RUM index interface | 134394 |
https://github.com/postgrespro/rum
https://github.com/digoal/blog/blob/
Three Key Details and Principles of PostgreSQL Online Backup & Recovery
How to Determine Whether the Current PostgreSQL Database is in a Consistent State?
digoal - February 3, 2020
Alibaba Clouder - July 7, 2020
digoal - July 4, 2019
digoal - February 5, 2020
Alibaba Cloud Storage - May 14, 2019
digoal - December 11, 2019
Alibaba Cloud PolarDB for PostgreSQL is an in-house relational database service 100% compatible with PostgreSQL and highly compatible with the Oracle syntax.
Learn MoreAn online MPP warehousing service based on the Greenplum Database open source program
Learn MoreOpenSearch helps develop intelligent search services.
Learn MoreAn intelligent image search service with product search and generic search features to help users resolve image search requests.
Learn MoreMore Posts by digoal