This topic provides an overview of the pgsearch extension and describes how to install, uninstall, and use the extension to perform full-text search.
Overview
The pgsearch extension helps you create and manage Best Matching 25 (BM25) indexes that are built based on Tantivy, a high-performance full-text search engine. The BM25 algorithm is preferentially used in search engines, such as Elasticsearch, to score and rank rows based on the number of occurrences of a term in a row and the rarity of a term across rows. The BM25 algorithm is suitable for scenarios in which you want to search for keywords or phrases from a large number of rows. You can use BM25 indexes to perform full-text search in AnalyticDB for PostgreSQL tables and use the BM25 algorithm to match accurate search results based on relevance scores.
Installation and uninstallation
The pgsearch extension does not support GUI-based installation. To install the pgsearch extension, submit a ticket. After the extension is installed, you must restart your AnalyticDB for PostgreSQL instance. To uninstall the pgsearch extension, submit a ticket.
Test table
In this example, a test table that has dozens of rows is used. You can call the pgsearch.create_test_table() function to create the test table.
CALL pgsearch.create_test_table(table_name => 'mock_items', schema_name => 'public');Index management
The creation and deletion of indexes cannot be rolled back.
Create an index
You can call the pgsearch.create_bm25() function to create a BM25 index. This function allows you to create indexes on multiple fields and specify an index configuration for each field. If you create multiple indexes, excessive resources are consumed, which affects user experience. We recommend that you create only one BM25 index for each table.
Syntax
CALL pgsearch.create_bm25(
index_name => '<index_name>',
table_name => '<table_name>',
schema_name => '<schem_name>'
text_fields => '<text_fields>|pgsearch.field()',
numeric_fields => '<numeric_fields>|pgsearch.field()',
boolean_fields => '<boolean_fields>|pgsearch.field()',
json_fields => '<json_fields>|pgsearch.field()',
datetime_fields => '<datetime_fields>|pgsearch.field()'
)Parameters
Parameter | Required | Supported data type | Description |
index_name | Yes | STRING | The name of the index. |
table_name | Yes | STRING | The name of the table for which you want to create the index. |
schema_name | No | STRING | The schema of the table. By default, the current schema is used. |
text_fields | Yes (at least one) |
| The text fields to be indexed and their corresponding index configurations. Specify an index configuration for each field by using a JSON5 string or invoking the
Important Only AnalyticDB for PostgreSQL instances of V7.2.1.0 or later support the |
numeric_fields |
| The numeric fields to be indexed and their corresponding index configurations. Specify an index configuration for each field by using a JSON5 string or invoking the
| |
boolean_fields |
| The Boolean fields to be indexed and their corresponding index configurations.
| |
json_fields |
| The JSON fields to be indexed and their corresponding index configurations.
| |
datetime_fields |
| The date and time fields to be indexed and their corresponding index configurations. If you leave this parameter empty, the UTC time zone and the RFC 3339 format are used.
|
Records
The pgsearch extension supports three types of records.
raw: does not perform word segmentation.
freq: records the row ID and the term frequency.
position: records the row ID, term frequency, and occurrence position.
Tokenizers
The pgsearch extension provides a variety of built-in tokenizers, such as jieba, ngram, lindera, en_stem, and whitespace. The tokenizers can meet your business requirements without the need to install the pg_jieba or zhparser extension.
Tokenizer name | Description | Configuration |
default | Segments text based on spaces and punctuation marks, and converts the text into lowercase letters. This tokenizer filters out terms whose length exceeds 255 bytes. | {type: "default"} |
raw | Does not segment text. | {type: "raw"} |
en_stem | Segments text based on spaces and punctuation marks, converts the text into lowercase letters, and then extracts the stem of each term. This tokenizer filters out terms whose length exceeds 40 characters. | {type: "en_stem"} |
whitespace | Segments text based on spaces. | {type: "whitespace"} |
ngram | Segments text into a sequence of n-grams by using the following parameters. An n-gram is a sequence of characters of length n.
| {type: "ngram", min_gram: 1, max_gram: 2, prefix_only: true} |
chinese_compatible | Segments text based on spaces and punctuation marks, which is suitable for Chinese text. For consecutive non-Chinese characters, a token is generated. For a separate Chinese character, a separate token is generated. For non-numeric or non-alphabetic characters, such as punctuation marks, no token is generated. For example, the "我爱吃橙子 oranges!12" string is segmented into "我, 爱, 吃, 橙, 子, oranges, 12" that contains seven tokens. | {type: "chinese_compatible"} |
chinese_lindera | Segments text based on the lindera tokenizer and the CC-CEDICT dictionary. | {type: "chinese_lindera"} |
korean_lindera | Segments text based on the lindera tokenizer and the KoDic dictionary. | {type: "korean_lindera"} |
japanese_lindera | Segments text based on the lindera tokenizer and the IPADIC dictionary. | {type: "japanese_lindera"} |
jieba | Segments text based on the jieba tokenizer, which is suitable for most Chinese text. The jieba tokenizer supports custom word segmentation dictionaries and stop word dictionaries. For more information, see Configure a custom word segmentation dictionary and Configure a stop word dictionary.
Note Only AnalyticDB for PostgreSQL V7.0 instances of V7.2.1.0 or later support the hmm and search parameters. | {type: "jieba",hmm=>true,search=>true} |
You can invoke the pgsearch.tokenizer() function to generate a string of configuration items for a tokenizer. This simplifies the configuration process. The following section describes the syntax and parameters:
Syntax
pgsearch.tokenizer(<name>, <min_gram>, <max_gram>, <prefix_only>, <search>, <hmm>, <dict>,<stopword>,<lowercase>,<remove_long>, <stemmer>);Parameters
name: the name of the tokenzier, which is of the text type. For information about the valid values, see the "Tokenizers" section of this topic.For information about the
min_gram,max_gram,prefix_only,search, andhmmparameters, see the "Tokenizers" section of this topic.dict: the name of the word segmentation dictionary used by the jieba tokenizer, which is of the text type. For more information, see Configure a custom word segmentation dictionary.stopword: the name of the stop word dictionary, which is of the text type. By default, you do not need to specify this parameter. For more information, see Configure a stop word dictionary.lowercase: specifies whether to convert a keyword into lowercase, which is of the Boolean type. Default value: true.remove_long: the length of keywords to be removed, which is of the Integer type. By default, you do not need to specify this parameter. After you set the parameter, pgsearch.tokenizer() filters out keywords whose length equals or exceeds remove_long bytes and remove them. Each Chinese character occupies 3 bytes.stemmer: the stem to be extracted, which is of the text type. By default, you do not need to specify this parameter. If you set this parameter, the stem is extracted. For example, therun,running,runs, andranwords are truncated torun. You can retrieve relevant results regardless of whichrunyou search for. Only English is supported for stemming and you must set the value toen.
Only AnalyticDB for PostgreSQL instances of V7.2.1.0 or later support the pgsearch.tokenizer() function. To create an index for AnalyticDB for PostgreSQL instances of V7.2.1.0 or later, you must invoke the pgsearch.tokenizer() function. Otherwise, the custom word segmentation dictionary and stop word dictionary of the jieba tokenizer cannot take effect.
Examples of creating indexes
Use the text_fields parameter to create an index.
-- Use the ngram tokenizer. You must specify all configuration fields of the ngram tokenizer. CALL pgsearch.create_bm25( index_name => 'search_idx', table_name => 'mock_items', text_fields => '{description: { tokenizer: {type: "ngram", min_gram: 2, max_gram: 3, prefix_only: false}}}' ); -- Use the jieba tokenizer. CALL pgsearch.create_bm25( index_name => 'search_idx', table_name => 'mock_items', text_fields => '{description: { tokenizer: {type: "jieba"}}}' ); -- Use the lindera tokenizer. CALL pgsearch.create_bm25( index_name => 'search_idx', table_name => 'mock_items', text_fields => '{description: { tokenizer: {type: "chinese_lindera"}}}' ); -- Create an index on multiple fields. CALL pgsearch.create_bm25( index_name => 'search_idx', table_name => 'mock_items', text_fields => '{description: {fast: false, filednorms: true, tokenizer: {type: "jieba"}}, category: {}}', datetime_fields => '{created_at: {fast: true}, last_updated_date: {fast: true}}', numeric_fields => '{rating: {fast: true}}', json_fields => '{metadata: {fast: true, expand_dost: true, tokenizer: {type: "en_stem"}, record: "position"}}', boolean_fields => '{in_stock: {fast: true} }' ); -- In most cases, you can configure only the tokenizer field for the text_fields and json_fields parameters. Use the default values for other parameters. CALL pgsearch.create_bm25( index_name => 'search_idx', table_name => 'mock_items', text_fields => '{description: {tokenizer: {type: "jieba"}}, category: {}}', datetime_fields => '{created_at: {}, last_updated_date: {}}', numeric_fields => '{rating: {}}', json_fields => '{metadata: {tokenizer: {type: "en_stem"}}}', boolean_fields => '{in_stock: {} }' );Use the
pgsearch.field()andpgsearch.tokenizer()functions to create indexes and query the word segmentation effect.-- Set the fast parameter in the description column to true, set the record parameter to position, and set the tokenizer parameter to jieba. In the jieba tokenizer, the non-search mode is configured, user_dict is used as the custom word segmentation dictionary, and CN_SIMPLE is used as the built-in stop word. -- Use the default configuration for the category column. -- The configurations of the two columns can be concatenated by using a double vertical bars (||). CALL pgsearch.create_bm25( index_name => 'search_idx', table_name => 'mock_items', text_fields => pgsearch.field('description', fast=>true, record=>'position', tokenizer=>pgsearch.tokenizer('jieba', search=>false, dict=>'user_dict', stopword=>'CN_SIMPLE')) || pgsearch.field('category') ); CALL pgsearch.create_bm25( index_name => 'search_idx', table_name => 'mock_items', text_fields => pgsearch.field('description', tokenizer=>pgsearch.tokenizer('jieba'), datetime_fields => '{created_at: {}, last_updated_date: {}}', numeric_fields => '{rating: {}}', json_fields => '{metadata: {tokenizer: {type: "en_stem"}}}', boolean_fields => '{in_stock: {} }' ); -- Query the word segmentation effect by using the specified jieba tokenizer. SELECT pgsearch.tokenizer(pgsearch.tokenizer('jieba', hmm=>false, search=>false, dict=>'custom_dict_1'), '永和服装饰品有限公司'); tokenizer -- Use various filters to process word segmentation results. SELECT pgsearch.tokenizer( pgsearch.tokenizer('jieba', search=>false, dict=>'user_dict', stopword=>'CN_SIMPLE', lowercase=>false, remove_long=>27, stemmer=>'en')::text, '永和服装饰品有限公司。 Shoping' );
Query the configuration of an index
SELECT * FROM pgsearch.schema_bm25('index_name'::regclass);Delete an index
DROP INDEX index_name;Rebuild an index
REINDEX INDEX index_name;Search syntax
@@@ operator
You can use the @@@ operator in k-nearest neighbor (KNN) search to sort the search results based on BM25 scores and return the top results.
Syntax
SELECT * FROM <table_name>
ORDER BY <index_col> @@@ pgsearch.config('<query>');Parameters
<table_name>: the name of the table.<index_col>: the name of an index field that is specified when you create an index. We recommend that you use the first index field that is contained in the search keyword specified by thequeryparameter.<query>: the search keyword.pgsearch.config: the search function, which supports character strings or other types of search objects. Search objects can be a combination or in any granularity. When a search keyword is passed into thepgsearch.config()function, thepgsearch.parse()function implicitly parses the search keyword to the search object. The following sample statements provide the same query effect:
SELECT * FROM mock_items
ORDER BY description @@@ pgsearch.config('description:socks');
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
query => pgsearch.parse('description:socks')
);Basic queries
Specific field search
Text field search
Query the rows that include the specified word or phrase. If the phrase contains spaces, you must enclose the phrase in double quotation marks (").
Query the rows that include the "keyboard" word or the "hello world" phrase.
SELECT * FROM mock_items
ORDER BY description @@@ pgsearch.config('description:keyboard');
-- Query the rows that include the space-separated phrase "hello world".
SELECT * FROM mock_items
ORDER BY description @@@ pgsearch.config('description:"hello world"');JSON field search
Query the rows that include {"metadata": {"color": "white"}}.
SELECT * FROM mock_items
ORDER BY metadata @@@ pgsearch.config('metadata.color:white');DATETIME field search
When you query a DATETIME field, the UTC time zone and the RFC 3339 format are used.
SELECT * FROM mock_items ORDER BY created_at @@@ pgsearch.config('created_at:"2023-05-01T09:12:34Z"') LIMIT 10;
SELECT * FROM mock_items ORDER BY created_at @@@ pgsearch.config('created_at:"2023-05-01T04:12:34-05:00"') LIMIT 10;Proximity operator search
You can use the ~ operator to query the phrases that are separated by a word. For example, the description field includes the "ergonomic metal keyboard" phrase. You can execute the following statement to query "ergonomic metal keyboard" because "ergonomic" and "keyboard" are separated by a word.
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config('description:"ergonomic keyboard"~1');Efficient filter search
Filters are suitable for only numeric fields and Boolean fields on which BM25 indexes are created. Compared with the WHERE clauses in standard SQL statements, the filter search can obtain results in a faster manner.
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config('description:keyboard AND rating:<4');Rank improvement search
If you want to improve the result ranking of a search keyword, you can use the ^ character followed by an improvement coefficient. The coefficient is used to increase the BM25 scores of search results. This helps improve the result ranking.
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config('description:keyboard^2 OR category:electronics^3');Boolean operator search
You can use the AND, OR, and NOT operators to filter multiple keywords. Parentheses () can be used to manage grouping and operation sequence.
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config('description:keyboard OR category:toy');Set operator search
You can use the IN operator to provide the same capabilities as one or more OR operators. The IN operator consumes less CPU resources.
When you use the IN operator, each phrase is separated by spaces, and the word segmentation result of each phrase cannot contain more than one token. The number of tokens generated from Chinese word segmentation depends on the tokenizer and is hard to be controlled. Therefore, the IN operator is applicable only to English. In Chinese scenarios, we recommend that you use multiple OR operators instead of this syntax.
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config('description:IN [keyboard toy]');Limit- and offset-based search
You can use the OFFSET and LIMIT clauses to perform search.
SELECT * FROM mock_items
ORDER BY description @@@ pgsearch.config('description:socks') OFFSET 2 LIMIT 10;Advanced query
Obtain the word segmentation result
You can use the pgsearch.tokenizer() function to obtain the word segmentation result. The function returns an array that consists of the segmented words.
Syntax
SELECT pgsearch.tokenizer(<tokenizer_config>, <query_str>);Parameters
tokenizer_config: the JSON string that includes the tokenizer configuration information. You can also invoke thepgsearch.tokenizer()function with the same name to generate a string of tokenizer configuration items. For more information, see the Tokenizers section of this topic.query_str: the character string that you want to segment.
Example
SELECT pgsearch.tokenizer('{type: "ngram", min_gram: 1, max_gram: 2, prefix_only: true}', 'hell');
SELECT pgsearch.tokenizer('{type: "jieba"}', '数据仓库');
SELECT pgsearch.tokenizer(pgsearch.tokenizer('jieba'), '永和服装饰品有限公司');Term set search
You can search for rows that include any keyword in an array of terms.
Example
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
query => pgsearch.term_set(
terms => ARRAY[
pgsearch.term(field => 'description', VALUE => 'socks'),
pgsearch.term(field => 'description', VALUE => 'novel')
]
)
);Parameters
field: the search field. If you leave this parameter empty, all index fields are searched.terms: an array of terms to be searched.
By default, the tokenizer converts the indexed keywords into lowercase. You cannot obtain relevant query results by using uppercase keywords in the pgsearch.term() function. We recommend that you use lowercase keywords for your queries.
Phrase search
You can query the rows that include an array of ordered keywords. Before you perform phrase search, make sure that the record configuration item of the index is set to position.
Example
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
query => pgsearch.phrase(
field => 'description',
phrases => ARRAY['little', 'red', 'riding' 'hood'],
slop => 0
)
);Parameters
field: the search field. If you leave this parameter empty, all index fields are searched.phrases: an array of keywords. If a row includes all keywords in the array and the keywords in the row are in the same order as the keywords in the array, the row is returned. A row is returned only when the words match all the keywords and the order is consistent with the order of the keywords.slop: the maximum distance between two keywords. This parameter is optional. A value of 0 specifies that the matching row must be in the same order as the keywords in the array and every two keywords must be adjacent. For example, the little, red, riding, and hood keywords must be consecutively sorted in the row. A value of greater than 0 specifies that the keywords can be inconsecutive. For example, other words can exist between any two of the little, red, riding, and hood keywords in the matching row.
Phrase prefix search
You can search for rows that include the content prefixed with an array of ordered keywords. Before you perform phrase prefix search, make sure that an index is created on the search field.
Example
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
query => pgsearch.phrase_prefix(
field => 'description',
phrases => ARRAY['little', 'red', 'riding' 'hood'],
max_expansion => 1
)
);Parameters
field: the search field. If you leave this parameter empty, all index fields are searched.phrases: an array of keywords. The query parameter specifies a subquery. You can query the character strings that are prefixed with the specified array of ordered keywords. For example, if you set the phrases parameter to ARRAY['little', 're'], "little re" is used as the prefix. In this case, "little red riding hood" can be returned.max_expansion: the maximum number of keyword variables that can be expanded from the prefix. This parameter is optional. This parameter narrows the search scope.
The word segments search (tokenizer_terms)
You can use the specified tokenizer to segment the query text, and then query the generated tokens. The pgsearch.tokenizer_terms() function supports the use of Boolean logic between the specified tokens.
Example
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
pgsearch.tokenizer_terms('description', '朝阳百货', pgsearch.tokenizer('jieba'), 'OR'));Parameters
tokenizer: the configuration string of the tokenizer specified by thepgsearch.tokenizer()function. Default value: jieba.operator: the operator that is used to explain the Boolean logic between tokens after word segmentation. Valid values:ORandAND. Default value:OR. For example, the word "朝阳百货" is segmented into "朝阳/百货". If you set theoperatorparameter toOR, the rows that include "朝阳" or "百货" are returned. If you set theoperatorparameter toAND, only the rows that include both "朝阳" and "百货" are returned.
All search
You can search for all rows that include the index field and have the same score of 1.0. All search results are equal in preference.
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
query => pgsearch.all()
);Boolean search
You can query the rows that are filtered by using the logical relationships defined in a subquery.
Example
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
query => pgsearch.boolean(
should => ARRAY[
pgsearch.parse('description:socks'),
pgsearch.phrase_prefix(field => 'description', phrases => ARRAY['book']),
pgsearch.term(field => 'description', VALUE => 'writer'),
pgsearch.fuzzy_term(field => 'description', VALUE => 'wow')
],
must_not => ARRAY[
pgsearch.term(field => 'description', VALUE => 'writer')
],
must => ARRAY[
pgsearch.term(field => 'rating', VALUE => 4)
]
)
);
-- You can configure one or more of the should, must_not, and must parameters.
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
query => pgsearch.boolean(
should => pgsearch.parse('description:socks')
)
);
Parameters
must: the conditions that must be matched for an array of search objects.The rows that match all conditions specified by themustparameter can be returned.must_not: the conditions that cannot be matched for an array of search objects.The rows that match a condition specified by themust_notparameter are not returned.should: the conditions that can be matched for an array of search objects when themustparameter does not exist.If the
mustparameter does not exist, the rows that match at least one condition specified by theshouldparameter are returned.
Rank boost search
You can increase the score of a subquery by using rank boost search to boost the result ranking of the subquery. In this example, query => pgsearch.parse('description:socks') is used.
Example
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
query => pgsearch.boost(query => pgsearch.parse('description:socks'), boost => 2)
);Parameters
boost: the coefficient that is multiplied by each result score.query: the search object.
Constant score search
You can use a constant score to query the rows that match the conditions specified by a subquery. In this example, query => pgsearch.all() is used. This prevents unnecessary score calculations on subqueries.
Example
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
query => pgsearch.const_score(query => pgsearch.all(), score => 2)
);Parameters
score: the constant score of each result of the subquery.query: the search object.
Maximum disjunction search
You can query the rows that match one or more subqueries. The rows that match more conditions can obtain higher BM25 scores.
Example
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
query => pgsearch.disjunction_max(
disjuncts => ARRAY[
pgsearch.parse('description:socks'),
pgsearch.parse('description:Generic')
],
tie_breaker => 0.75
)
);Parameters
disjuncts: an array that consists of one or more subqueries.tie_breaker: the coefficient that is used to adjust the BM25 score of a row when the row matches multiple subqueries. This parameter is optional. If a row matches multiple subqueries, the BM25 score of the row is calculated by using the following rule: First, calculate the BM25 score of the best matching subquery. Second, add the incremental scores of the other matching subqueries to the preceding BM25 score. In the preceding example, if the BM25 score of a row is 1.0 with thepgsearch.parse('description:socks')subquery and 0.5 with thepgsearch.parse('description:Generic')subquery, the final BM25 score of the row is calculated by using the following formula: 1.0 + 0.75 × 0.5 = 1.375.
Empty search
You can use empty search as a placeholder in test scenarios or edge cases. In this case, no rows are returned.
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
query => pgsearch.empty()
);Fuzzy term search
You can query rows that include fuzzy terms of the search keyword. Search results can be returned even if the spelling is not completely correct.
Example
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
query => pgsearch.fuzzy_term(
field => 'description',
VALUE => 'wow',
distance => 2,
tranposition_cost_one => true,
prefix => true)
);Parameters
field: the search field. If you leave this parameter empty, all index fields are searched.value: the search keyword. Fuzzy matching based on the Levenshtein distance is used to search for results similar to the specified keyword.distance: the maximum edit distance (maximum number of single-character changes) for matching the specified field. This parameter is optional. Default value: 2. Maximum value: 2.tranposition_cost_one: specifies whether the exchange of two adjacent characters is considered a single edit in the Levenshtein distance calculation. This parameter is optional. Default value: true. If you set the tranposition_cost_one parameter to false, the exchange is considered two separate edits (deletion and insertion).prefix: specifies whether the prefix of the search keyword is excluded from the calculation of the fuzzy edit distance. This parameter is optional. Default value: true. If you set the prefix parameter to false, the entire search keyword is used for the calculation.
Range search
You can query the rows that include values within a specific range. The range search is suitable for the numeric_fields and datetime_fields parameters.
Example
SELECT * FROM mock_items ORDER BY rating @@@ pgsearch.config(
query => pgsearch.range(
field => 'rating',
RANGE => '[1,4)'::int4range
)
);Parameters
field: the search field. If you leave this parameter empty, all index fields are searched.range: the value range for matching the specified field. The following range types are supported: INT4RANGE, INT8RANGE, DATERANGE, TSRANGE, and TSTZRANGE.
Regular expression search
You can query the rows that include the words that match a specific regular expression.
Example
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
query => pgsearch.regex(
field => 'description',
PATTERN => '(glass|screen|like|cloth|phone)'
)
);
SELECT * FROM mock_items ORDER BY description @@@ pgsearch.config(
query => pgsearch.regex(
field => 'description',
PATTERN => '(.*screen.*)'
)
);Parameters
field: the search field. If you leave this parameter empty, all index fields are searched.pattern: the regular expression pattern.
Others
Obtain BM25 scores
You can place the @@@ operator in the front part of a SELECT statement and use the AS keyword to obtain BM25 scores.
By default, the ORDER BY clause returns ascending results for AnalyticDB for PostgreSQL databases. A higher BM25 score indicates a higher result ranking. In this case, the negative values of BM25 scores are returned. For example, if a BM25 score is 2.86, the return value is -2.86. BM25 scores are calculated only for searches on the fields specified by the text_fields and json_fields parameters. For searches on the fields specified by the numeric_fields, datetime_fields, and boolean_fields parameters, BM25 scores cannot be calculated.
Syntax
SELECT *, mock_items @@@ pgsearch.config(<query>) as BM25
FROM mock_items
ORDER BY BM25;Example
-- 1. Perform full-text search only on a string field. A BM25 score is returned.
SELECT description, rating, description @@@ pgsearch.config('description:socks') AS bm25 FROM mock_items ORDER BY bm25 limit 1;
-- 2. Perform full-text search on a string field and filter data on a numeric field. A BM25 score is returned.
SELECT description, rating, description @@@ pgsearch.config('description:socks AND rating:4') AS bm25 FROM mock_items ORDER BY bm25 limit 4;
-- 3. Perform full-text search on a JSON field. A BM25 score is returned.
SELECT metadata, metadata @@@ pgsearch.config('metadata.color:White') AS bm25 FROM mock_items ORDER BY bm25 LIMIT 1;
-- 4. Perform range search on a rank field. BM25 scores cannot be calculated. The default value -1 is returned.
SELECT description, rating, rating @@@ pgsearch.config('rating:[4 TO 5]') AS bm25 FROM mock_items ORDER BY bm25 LIMIT 1;Query result
-- 1. Perform full-text search only on a string field. A BM25 score is returned.
description | rating | bm25
---------------+--------+------------
Generic socks | 4 | -2.1048825
(1 row)
-- 2. Perform full-text search on a string field and filter data on a numeric field. A BM25 score is returned.
description | rating | bm25
---------------+--------+------------
Generic socks | 4 | -3.1081846
(1 row)
-- 3. Perform full-text search on a JSON field. A BM25 score is returned.
metadata | bm25
-----------------------------------------+-----------
{"color": "White", "location": "China"} | -3.453373
(1 row)
-- 4. Perform range search on a rank field. BM25 scores cannot be calculated. The default value -1 is returned.
description | rating | bm25
------------------+--------+------
Plastic Keyboard | 4 | -1
(1 row)