All Products
Search
Document Center

AnalyticDB:Configure a custom dictionary

Last Updated:Mar 30, 2026

Domain-specific terms — such as product names, legal phrases, or industry jargon — are often split incorrectly by general-purpose tokenizers, which causes BM25 full-text search to miss or misrank relevant results. The jieba tokenizer in AnalyticDB for PostgreSQL supports custom word segmentation dictionaries, letting you add specialized vocabulary so the tokenizer treats those terms as single tokens.

Prerequisites

Before you begin, make sure you have:

Limitations

Only the jieba tokenizer supports custom dictionaries.

How it works

Custom dictionaries are stored in the pgsearch.jieba_custom_word table. The dict and word columns form a composite primary key, so duplicate word segments in the same dictionary are not allowed. The default dictionary is named default.

When you add a word segment:

  • If you omit the dictionary name, the word is added to default.

  • If the specified dictionary does not exist, a new dictionary is created automatically.

  • If the specified dictionary already exists, the word is added to it.

When you update or delete a word segment without specifying a dictionary name, all dictionaries are scanned.

Manage dictionary entries

Add, update, or delete entries in the default dictionary

-- Add a word to the default dictionary (implicit)
INSERT INTO pgsearch.jieba_custom_word(word) VALUES('永和服装饰品');

-- Add a word to the default dictionary (explicit)
INSERT INTO pgsearch.jieba_custom_word(dict, word) VALUES('default', '永和服装饰品');

-- Update a word in the default dictionary
UPDATE pgsearch.jieba_custom_word SET word = '永和' WHERE dict = 'default' AND word = '永和服装饰品';

-- Delete a word from the default dictionary
DELETE FROM pgsearch.jieba_custom_word WHERE dict = 'default' AND word='永和服装饰品';

Add, update, or delete entries in a custom dictionary

-- Add a word to a custom dictionary (created automatically if it does not exist)
INSERT INTO pgsearch.jieba_custom_word(dict, word) VALUES('custom_dict', '永和服装饰品');

-- Update a word in a custom dictionary
UPDATE pgsearch.jieba_custom_word SET word = '永和' WHERE dict = 'custom_dict' AND word = '永和服装饰品';

-- Delete a word from a custom dictionary
DELETE FROM pgsearch.jieba_custom_word WHERE dict = 'custom_dict' AND word='永和服装饰品';

Load a dictionary

After updating entries in pgsearch.jieba_custom_word, reload the dictionary into memory:

SELECT pgsearch.reload_user_dict('custom_dict');

The reload does not automatically apply to existing indexed data. To apply the updated dictionary to existing data, complete the following steps in order:

  1. Close existing database connections and re-establish them.

  2. Rebuild the indexes on tables that use the dictionary.

Create a BM25 index with a custom dictionary

Specify a custom dictionary when calling pgsearch.create_bm25().

Single column:

CALL pgsearch.create_bm25(
    index_name => '<index_name>',
    table_name => '<table_name>',
    text_fields => pgsearch.field('<column_name>', tokenizer=>pgsearch.tokenizer('jieba', dict=>'<dict_name>'))
);

Multiple columns with different dictionaries:

CALL pgsearch.create_bm25(
    index_name => '<index_name>',
    table_name => '<table_name>',
    text_fields => pgsearch.field('<column1_name>', tokenizer=>pgsearch.tokenizer('jieba', hmm=>false, SEARCH=>false, dict=>'<dict_name>'))
                || pgsearch.field('<column2_name>', tokenizer=>pgsearch.tokenizer('jieba', hmm=>false, SEARCH=>false, dict=>'<dict2_name>'))
);

Replace the placeholders with actual values:

Placeholder Description
<index_name> Name of the BM25 index to create
<table_name> Name of the table to index
<column_name> Name of the column to index
<dict_name> Name of the custom dictionary
<dict2_name> Name of the custom dictionary for the second column

Verify the word segmentation effect

Use pgsearch.tokenizer() to confirm how the jieba tokenizer segments text with your custom dictionary. This lets you catch segmentation issues before building an index.

For example, after adding 永和服装饰品 to custom_dict, you can verify that the tokenizer treats it as a single token:

SELECT pgsearch.tokenizer(pgsearch.tokenizer('jieba', hmm=>false, SEARCH=>false, dict=>'custom_dict'), '永和服装饰品有限公司');

For more information about pgsearch.tokenizer() parameters, see the "Parameters" section of BM25 high-performance full-text search.

What's next