All Products
Search
Document Center

AnalyticDB:Configure a stopword dictionary

Last Updated:Mar 30, 2026

Stop words are common, high-frequency terms that carry little search value — words like "的", "是", or "the". Without stop word filtering, a search for "同一个世界" treats every token equally. By configuring stop words, the jieba tokenizer focuses on the terms that distinguish documents, which improves BM25 search relevance and query performance.

The pgsearch extension includes two built-in stop word dictionaries: CN_SIMPLE for Chinese and EN_SIMPLE for English.

Prerequisites

Before you begin, ensure that you have:

  • An AnalyticDB for PostgreSQL V7.0 instance running minor version V7.2.1.0 or later

To check the minor version of your instance, see View the minor version of an instance. To upgrade, see UpgradeDBVersion.

Manage stop words

Stop word dictionaries are stored in the pgsearch.stopword_dict table. The dict column holds the dictionary name and the word column holds the stop word. Together, they form the composite primary key, so duplicate stop words within the same dictionary are not allowed.

The jieba tokenizer uses a dictionary named default by default. An instance can have multiple stop word dictionaries.

Add a stop word

When adding a stop word, always specify a dictionary name. If the dictionary does not exist (except default, which always exists), pgsearch creates it automatically and adds the stop word to it.

Add a stop word to the default dictionary:

INSERT INTO pgsearch.stopword_dict(dict, word) VALUES('default', '的');

Add a stop word to a custom dictionary named user_stop_cn:

INSERT INTO pgsearch.stopword_dict(dict, word) VALUES('user_stop_cn', '的');

Update a stop word

Update a stop word in the default dictionary:

UPDATE pgsearch.stopword_dict SET word = '一个' WHERE dict = 'default' AND word = '的';

Update a stop word in the user_stop_cn dictionary:

UPDATE pgsearch.stopword_dict SET word = '一个' WHERE dict = 'user_stop_cn' AND word = '的';

Remove a stop word

Remove a stop word from the default dictionary:

DELETE FROM pgsearch.stopword_dict WHERE dict = 'default' AND word = '的';

Remove a stop word from the user_stop_cn dictionary:

DELETE FROM pgsearch.stopword_dict WHERE dict = 'user_stop_cn' AND word = '的';
Note

If you omit the dict filter in an UPDATE or DELETE statement, pgsearch scans all dictionaries and modifies every matching stop word across all of them.

Reload a dictionary

After updating a dictionary, reload it into memory:

SELECT pgsearch.reload_stopword_dict('user_stop_cn');
Important

Reloading the dictionary does not automatically update existing indexed data. To apply the new stop word configuration to existing data, complete the following steps in order:

  1. Close all existing database connections and reconnect.

  2. Rebuild the indexes on the affected tables.

Create a BM25 index with a stop word dictionary

When creating a BM25 index, use the stopword parameter to assign a stop word dictionary to the jieba tokenizer:

CALL pgsearch.create_bm25(
    index_name => '<index_name>',
    table_name => '<table_name>',
    text_fields => pgsearch.field('<column_name>', tokenizer => pgsearch.tokenizer('jieba', SEARCH => false, dict => '<dict_name>', stopword => '<stopword_dict_name>'))
);

Replace the placeholders with your actual values:

Placeholder Description Example
<index_name> Name of the BM25 index to create my_bm25_index
<table_name> Table to index articles
<column_name> Text column to index content
<dict_name> jieba segmentation dictionary default
<stopword_dict_name> Stop word dictionary to apply user_stop_cn

Verify stop word filtering

Run the following query to confirm that stop words are filtered correctly. The example uses the user_stop_cn dictionary on the input string 同一个世界:

SELECT pgsearch.tokenizer(pgsearch.tokenizer('jieba', stopword => 'user_stop_cn'), '同一个世界');

Related topics