All Products
Search
Document Center

:tag_match

Last Updated:Sep 09, 2021

Functions that can be used as both feature functions and functionality functions

Specific functions can be used as both feature functions and functionality functions. You can use such functions in filter clauses, sort clauses, and expressions.

The fields that you reference in parameters of such functions must be configured as index or attribute fields based on the description of each function.

tag_match: matches query clauses with documents based on tags and scores the documents by calculating the weights of matched tags

1.Scenario overview

You can use the tag_match function in most scenarios where you need to provide personalized searches by matching query clauses with documents. For example, the shops that users have liked are listed first and news such as sports and entertainment news that users may like is recommended. The tag-match function adds a set of key-value pairs as an array in a document. You can use the kvpairs clause to define the key-value pairs in a query clause. The tag_match function matches the keys in the documents with the keys in a query clause, calculates the score for each pair of matched keys, and then calculates the final score of each document. The final score can be used to sort documents by weights or filter documents.

The following figure shows how the tag_match function calculates the final score.

2.Syntax

Common syntax:

tag_match(query_key, doc_field, kv_op, merge_op)

Advanced syntax:

tag_match(query_key, doc_field, kv_op, merge_op, has_default, doc_kv, max_kv_count)

3.Parameters

  • query_key: the field that defines the key-value pairs in a query clause. You must pass the field by using a kvpairs clause. In each key-value pair, the key and value are separated by an equal sign (=). Multiple key-value pairs are separated by colons (:). Example: kvpairs=query_tags:10=0.67:960=0.85:1=48// The keys are 10, 960, and 1. The values of the three keys are 0.67, 0.85, and 48. You can also pass only keys by using the kvpairs clause. Example: kvpairs=cats:10:960:1.

  • doc_field: the name of the field in a document. The field stores the key-value pairs. The field must be of the INT_ARRAY, FLOAT_ARRAY, or DOUBLE_ARRAY type. If the field is of the FLOAT_ARRAY type, the keys are converted to 64-bit integers for matching. Keys occupy the odd positions in the array of the field and values occupy the even positions in the array. Sample array: [key0 value0 key1 value1 …].

  • kv_op: the operation that is performed on the values when a key in the query clause matches a key in the document. You can set this parameter to max, min, sum, avg, mul, query_value, doc_value, or a constant. The query_value operation returns the value of the matched key in the query clause. The doc_value operation returns the value of the matched key in the document.

  • merge_op: If multiple keys in the query clause match multiple keys in the document, the operation specified by the kv_op operation calculates the score for each pair of matched keys. Then, you can set the merge_op parameter to perform an operation on the scores. You can set the merge_op parameter to max, min, sum, avg, and first_match. The first_match operation returns only the score that is calculated for the first pair of matched keys.

  • has_default: specifies whether to use the initial score. The default value is false. If this parameter is set to true, the first value of the doc_field parameter is the initial score. Example of the doc_field parameter: [init_score k0 v0 k1 v1…]. The initial score can be considered as a base score.

  • doc_kv: specifies whether the value of the field in the document consists of key-value pairs. The default value is true. If you set this parameter to false, the value of the field in the document consists of only keys. You can set this parameter to false in scenarios where you do not need to sort documents by weights.

  • max_kv_count: the maximum number of key-value pairs that can be passed from a query clause. The default value is 50. You can change the value to a number that is smaller than or equal to 5120.

4.Return value

The return value is a value of the DOUBLE type and indicates the final score of a document. If you set the has_default parameter to false or do not set this parameter, 0 is returned. If you want to return a 64-bit integer, you must use the int_tag_match function. Except for the return value, the int_tag_match function can be used in the same way as the tag_match function. The int_tag_match cannot be used in sort expressions.

5.Scenarios

Scenario 1: Different tags are added to posts on a large and comprehensive forum, such as funny, sports, news, music, and science. When you push documents to OpenSearch, you can assign an ID for each tag. For example, the IDs of the funny, sports, news, and music tags are 1, 5, 3, and 6. Then, you can use the tag field to store the tags. You can also obtain the weight of each tag for each post after preprocessing. For example, a post has the funny, sports, and news tags. The weights of the funny, sports, and news tags are 0.5, 0.5, and 0.1. In this case, the value of the tag field is [1 0.5 5 0.5 3 0.1]. After a long-term analysis of the searches that are performed by forum members, you can know the favorite post tags of each member.

For example, the nba_fans member is interested in sports and funny content and the weights of the sports and funny tags are 0.6 and 0.3. Then, you can use the kvpairs clause to define the tag-weight pairs as key-value pairs and pass the key-value pairs to the query clause when the member searches for posts. If the field name defined in the kvpairs clause is user_tag, the value of the user_tag field for the nba_fans member is 5=0.6:1=0.3. This way, if you use the tag_match(user_tag, tag, mul, sum) function in a fine sort expression, your search service can calculate the weights of posts in which the member is interested and list the posts with high weights first.

For example, when the nba_fans member searches for the preceding post, both the funny and sports tags can be matched. You can set the kv_op parameter to mul to obtain the product of the value of each key in the query clause and that of each matched key in the document. In this example, the score of the sports tag is calculated by using the following formula: 0.5 × 0.6 = 0.3. The score of the funny tag is calculated by using the following formula: 0.5 × 0.3 = 0.15. You can set the merge_op parameter to sum the scores of two tags. The sum of two scores is calculated by using the following formula: 0.3 + 0.15 = 0.45. Then, the sum is added to the final sorting score. This way, you can sort the posts in which the member is interested by calculating weights.

Scenario 2:

Goods can have multiple attribute tags. For example, 1 indicates young (age), 2 indicates middle-aged (age), 3 indicates fresh (style), 4 indicates fashion (style), 5 indicates women (gender), and 6 indicates men (gender).

You may only want to match tags but do not want to calculate the weights of tags for sorting. In this case, you can use the options field to store tags. If the clothes have the young, fashion, and women tags, the value of the options field is [1 4 5]. The field value consists of only keys. Users also have attribute tags that are similar to the attribute tags of goods. For example, a young female user used to purchase fresh-style clothes in historical transactions. In this case, the user_options=1:3:5 field can be added to a query clause when this user searches for clothes. Note that the field that is defined by the kvpairs clause consists of only keys.

If you want to sort goods that have the favorite tags of users by calculating weights, you can use the tag_match(user_options, options, 10, sum, false, false) function in a sort expression. In the preceding function, user_options is the field that stores the tags in a query clause and options is the field that stores the tags in a document. The value 10 of the kv_op parameter indicates that 10 is the score for each pair of matched keys. The value false of the has_default parameter indicates that the initial score is not used. The value false of the doc_kv parameter indicates that the value of the field in the document consists of only keys.

When the preceding young female user searches for the preceding clothes, both the women and young tags can be matched and the scores of both tags are 10. After the sum operation specified by the merge_op parameter is performed on the two scores, the final score of the good is 20. This way, you can also sort documents by weight without the weight information about tags.

Usage notes

  • The field that you reference in the parameter of the function must be configured as an attribute field.

  • If the tag_match function is used in a filter clause or a sort clause, the query_key, kv_op, merge_op, has_default, and doc_kv parameters must be enclosed in double quotation marks ("). Example: sort=-tag_match("user_options", options, "mul", "sum", "false", "true", 100).

  • The tag_match function matches the keys of an integer type. Therefore, the keys in a query clause and that in the document must be converted into integers. If keys are floating-point numbers, the tag_match function forces conversion to integers.

Example

Your document has the following 10 tags:

1: finance and economics
2: technology
3: sports
4: entertainment
5: fashion
6: education
7: traveling
8: games
9: science
10: medical

Case 1: Sort titles with the same keyword but different tags

1When you search for "chicken dinner", two documents are retrieved, as shown in the preceding figure. The tags of the two documents are different. The tag ID of the first document is 1 that indicates finance and economy. The tag ID of the second document is 8 that indicates games. If you want the document with the games tag to be listed first, you can use the tag_match function. The following examples show how the tag_match function is used in a sort expression and a sort clause:

kvpairs clause: type:8
Sort expression: tag_match(type, type_arr, 10, max,false,false) 
Sort clause: tag_match("type", type_arr, 10, "max","false","false")

The following figure shows the search results obtained by using the sort expression.2The following figure shows the search results obtained by using the sort clause.3

Case 2: Sort titles by calculating the final score based on the weights of multiple tags

4If the first-level tags are the same, as shown in the preceding figure, you need to calculate the scores of the second-level tags. The following examples show how the tag_match function is used in a sort expression and a sort clause:

kvpairs clause: type:3=2:10=1
Sort expression: tag_match(type, type_arr, 10, sum,false,true) 
Sort clause: tag_matchtag_match("type", type_arr, 10, "sum","false","true")
5

Case 3: Sort titles with the same tags that have different weights

6The documents that are framed in red in the preceding figure have the same tags but the tags have different weights. The following examples show how the tag_match function is used in a sort expression and a sort clause:

kvpairs clause: type:3=2:9=2
Sort expression: tag_match(type, type_arr, sum, sum,false,true) 
Sort clause: tag_match("type", type_arr, "sum", "sum","false","true")
7