All Products
Search
Document Center

PolarDB:zhparser (Chinese tokenization)

Last Updated:Mar 30, 2026

zhparser is a PostgreSQL extension for Chinese full-text search. It tokenizes Chinese text into word segments so you can build full-text indexes and run text queries on Chinese content in PolarDB for PostgreSQL.

Enable zhparser

Step 1: Install the extension and create a text search configuration.

CREATE EXTENSION zhparser;
CREATE TEXT SEARCH CONFIGURATION testzhcfg (PARSER = zhparser);
ALTER TEXT SEARCH CONFIGURATION testzhcfg ADD MAPPING FOR n,v,a,i,e,l WITH simple;

Step 2: (Optional) Configure segmentation parameters.

Set parameters at the role level to control how zhparser segments text:

Parameter

Default

Description

zhparser.multi_short

off

Combines short words into compound segments

To enable multi_short for all roles:

ALTER ROLE ALL SET zhparser.multi_short = on;

Step 3: Test the parser.

Run a quick test to verify the extension is working:

SELECT * FROM ts_parse('zhparser', 'hello world! 2010年保障房建设在全国范围内获全面启动,从中央到地方纷纷加大了保障房的建设和投入力度。2011年,保障房进入了更大规模的建设阶段。住房城乡建设部党组书记、部长姜伟新去年底在全国住房城乡建设工作会议上表示,要继续推进保障性安居工程建设。');

Verify the text search vector and query functions:

SELECT to_tsvector('testzhcfg', '"今年保障房新开工数量虽然有所下调,但实际的年度在建规模以及竣工规模会超以往年份,相对应的对资金的需求也会创历史纪录。"陈国强说。在他看来,与2011年相比,2012年的保障房建设在资金配套上的压力将更为严峻。');
SELECT to_tsquery('testzhcfg', '保障房资金压力');

Create a full-text index

Use a Generalized Inverted Index (GIN) to index Chinese text for fast full-text search. The following example creates a GIN index on the name column of table t1:

-- Create the GIN index
CREATE INDEX idx_t1 ON t1 USING gin (to_tsvector('zhcfg', upper(name)));

-- Query using the index
SELECT * FROM t1 WHERE to_tsvector('zhcfg', upper(t1.name)) @@ to_tsquery('zhcfg', '(防火)');

Customize a Chinese word segmentation dictionary

The default dictionary covers common Chinese words. For domain-specific terms such as industry jargon or product names, add custom word segments to pg_ts_custom_word.

Step 1: Check the current segmentation result.

SELECT to_tsquery('testzhcfg', '保障房资金压力');

Step 2: Add the custom word segment.

INSERT INTO pg_ts_custom_word VALUES ('保障房资');

Step 3: Sync the dictionary and reconnect.

SELECT zhprs_sync_dict_xdb();

After the sync completes, close and reopen the connection:

\c

Step 4: Verify the new segmentation result.

SELECT to_tsquery('testzhcfg', '保障房资金压力');

Limits

Limit

Value

Behavior when exceeded

Custom word segments

1,000,000

Word segments beyond the limit are ignored

Word segment length

128 bytes

Bytes beyond 128 are truncated

The custom dictionary and the default dictionary are both active at the same time.

After any add, delete, or update to word segments, run SELECT zhprs_sync_dict_xdb(); and reconnect for the changes to take effect.