Quick start - PolarDB - Alibaba Cloud Documentation Center

This topic describes the basic operations of full-text search for PolarDB for PostgreSQL (Compatible with Oracle), including extension installation and uninstallation, table creation, index creation, and performing queries.

Install the extension

CREATE EXTENSION rum;

If you need to use the Chinese word segmentation feature, you can install the extension for Chinese word segmentation by executing the following statement:

CREATE EXTENSION pg_jieba;

Note

For more information about the version requirements of the preceding extensions, see pg_jieba and RUM.

Create a table

Create a table for English text:

CREATE TABLE test_english(id serial, t text, d timestamp);

Create a table for Chinese text:

CREATE TABLE test_chinese(id serial, t text, d timestamp, v tsvector);

Insert data

Populate the English table with test data:

INSERT INTO test_english(t,d) VALUES ('The situation is most beautiful', '2016-05-01 20:21:22.326724');
INSERT INTO test_english(t,d) VALUES ('It is a beautiful', '2016-05-01 21:21:22.326724');
INSERT INTO test_english(t,d) VALUES ('It looks like a beautiful place', '2016-05-01 22:21:22.326724');
INSERT INTO test_english(t,d) VALUES ('It looks l a beautiful place', '2016-05-02 00:21:22.326724');
INSERT INTO test_english(t,d) VALUES ('It looks like a beautiful places', '2016-05-02 02:21:22.326724');
INSERT INTO test_english(t,d) VALUES ('It looks like a beaut places', '2016-05-02 03:21:22.326724');
INSERT INTO test_english(t,d) VALUES ('It looks like a pig', '2016-05-02 03:21:22.326724');

Populate the Chinese table with test data:

INSERT INTO test_chinese(t,d) VALUES ('南财理财通数据显示，5月公募理财产品吸金榜前十名归属于5家机构，信银理财有5只产品上榜，南银理财占两席，民生理财、恒丰理财、浦银理财各占一席，上榜产品均为中低风险的封闭式固收类产品。从投资周期看，本月榜单产品中有6只产品投资周期为1-2年，3-6个月及6-12个月期限产品各有2只。', '2016-05-01 20:21:22.326724');
INSERT INTO test_chinese(t,d) VALUES ('其中，信银理财“安盈象固收稳利十四个月封闭式95号”理财产品夺得吸金榜冠军，募集规模达到106.204 亿元；民生理财“富竹固收优选14个月封闭39号”位列第二，募集规模为46.606亿元；信银理财“安盈象固收稳利十四个月封闭式94号”排名第三，募集规模为41.365 亿元。', '2016-05-01 20:21:22.326724');
INSERT INTO test_chinese(t,d) VALUES ('5月新发产品募集规模最大的是信银理财“安盈象固收稳利十四个月封闭式95号”，实际募集资金高达106.204亿元，该系列产品已蝉联4个月榜首。', '2016-05-01 20:21:22.326724');
INSERT INTO test_chinese(t,d) VALUES ('南财理财通数据显示，“安盈象固收稳利十四个月封闭式95号”理财产品成立于2024年05月16日，将于2025年07月16日到期，风险评级PR2（中低风险），产品有A、C、D、K类份额，投资周期1-2年，业绩比较基准2.90%~3.45%。', '2016-05-01 20:21:22.326724');

-- This operation can be performed by executing an update operation or using a trigger
UPDATE  test_chinese 
SET v = to_tsvector('jiebacfg', t);

Create an index

Create an index for the English table:

CREATE INDEX ON test_english USING rum (t rum_text_ops);

Create an index for the Chinese table:

CREATE INDEX ON test_chinese USING rum (v rum_tsvector_ops);

Perform queries

Perform queries on the English table:

Query rows that contain the beautiful or place keyword:

SELECT * FROM test_english
WHERE to_tsvector('english', t) @@ to_tsquery('english', 'beautiful | place');

The query returns the following results:

id  |                t                 |             d              
----+----------------------------------+----------------------------
  1 | The situation is most beautiful  | 2016-05-01 20:21:22.326724
  2 | It is a beautiful                | 2016-05-01 21:21:22.326724
  3 | It looks like a beautiful place  | 2016-05-01 22:21:22.326724
  4 | It looks l a beautiful place     | 2016-05-02 00:21:22.326724
  5 | It looks like a beautiful places | 2016-05-02 02:21:22.326724
  6 | It looks like a beaut places     | 2016-05-02 03:21:22.326724

Query rows that contain the beautiful or place keyword, calculate the similarity between each record and the keyword, and sort the results in descending order of similarity.

SELECT *, to_tsvector('english', t) <=> to_tsquery('english', 'beautiful | place') AS rank
FROM test_english
WHERE to_tsvector('english', t) @@ to_tsquery('english', 'beautiful | place') 
ORDER BY to_tsvector('english', t) <=> to_tsquery('english', 'beautiful | place') DESC;

The query returns the following results:

 id |                t                 |             d              |  rank   
----+----------------------------------+----------------------------+---------
  1 | The situation is most beautiful  | 2016-05-01 20:21:22.326724 | 16.4493
  2 | It is a beautiful                | 2016-05-01 21:21:22.326724 | 16.4493
  6 | It looks like a beaut places     | 2016-05-02 03:21:22.326724 | 16.4493
  3 | It looks like a beautiful place  | 2016-05-01 22:21:22.326724 | 8.22467
  4 | It looks l a beautiful place     | 2016-05-02 00:21:22.326724 | 8.22467
  5 | It looks like a beautiful places | 2016-05-02 02:21:22.326724 | 8.22467
(6 rows)

Perform queries on the Chinese table:

Query rows that contain the 数据 or 理财 keyword:

SELECT id, v <=> to_tsquery('jiebacfg', '数据 | 理财') as rank
FROM test_chinese 
where v @@ to_tsquery('jiebacfg', '数据 | 理财');

The query returns the following results:

 id |   rank    
----+-----------
  1 | 6.6024785
  2 |  12.08523
  3 |  16.44934
  4 |   8.22467

Query rows that contain 数据 in column v, calculate the similarity between each record and '数据 | 理财', and return the id and similarity:

SELECT id, v <=> to_tsquery('jiebacfg', '数据 | 理财') as rank
FROM test_chinese 
where v @@ to_tsquery('jiebacfg', '数据');

The query returns the following results:

 id |   rank    
----+-----------
  1 | 6.6024785
  4 |   8.22467

(Optional) Uninstall the extension

DROP EXTENSION rum CASCADE;
DROP EXTENSION pg_jieba CASCADE;