Benefits of uploading behavioral data to OpenSearch
Based on the user feedback on search results, R&D engineers can evaluate search effects and optimize user experience.
Users can be provided with intuitive reports that show search effects.
Upload behavioral data
Note: We recommend that you use SDKs to manually upload behavioral data. The following table describes the fields of behavioral data.
To upload behavioral data by using SDKs, you must specify the following fields: user_id, biz_id, trace_id, rn, item_type, bhv_type, bhv_time, and reach_time.
ID | Field | Type | Description | Value | Required |
1 | app_version | STRING | The version number of the website or mobile app on which the behavior occurs. | No | |
2 | sdk_type | STRING | The type of the SDK that is used to upload behavioral data. OpenSearch uses this field to distinguish whether behavioral data is uploaded or collected by using a server SDK or mobile SDK. | No. If you upload behavioral data by using OpenSearch SDKs, this field is set to opensearch_sdk by default. | |
3 | sdk_version | STRING | The version number of the SDK that is used to upload behavioral data. | No. If you upload behavioral data by using OpenSearch SDKs, this field is set by default. | |
4 | login | STRING | Specifies whether the user has logged on to the website or mobile app on which the behavior occurs. | Valid values: 0 and 1. A value of 0 indicates that the user has not logged on. A value of 1 indicates that the user has logged on. | No |
5 | user_id | STRING | The unique ID of the user. | No. However, you must specify at least one of the imei and user_id fields. | |
6 | imei | STRING | The ID of the user device. The value can be an International Mobile Equipment Identity (IMEI), a device ID, or an Identifier for Advertisers (IDFA). | No. However, you must specify at least one of the imei and user_id fields. | |
7 | biz_id | BIGINT | A numeric ID that is used to distinguish between different search services. Generally, a biz_id value matches an OpenSearch application. For example, you can use an application ID as the biz_id value. | Yes | |
8 | trace_id | STRING | The provider of the search service from which the document is searched and collected. | If the document is searched and collected from OpenSearch, set this field to Alibaba. If the document is searched and collected from another service provider, specify this field based on your business requirements. | Yes |
9 | trace_info | STRING | The value of this field is the value of the ops_request_misc parameter that OpenSearch returns in the search results. Pass in the value of the ops_request_misc parameter as it is. | No Note: You must pass in this field if the trace_id field is set to Alibaba. This field is used to check whether the search results are provided by OpenSearch. | |
10 | rn | STRING | The page view (PV). The value of this field is the value of the request_id parameter that OpenSearch returns in the search results. Pass in the value of the request_id parameter as it is. | Yes | |
11 | item_id | STRING | The data subscript of the occurrence of the behavior. | No. By default, this field is set to 0. | |
12 | item_type | STRING | The type of the behavioral data. |
| Yes |
13 | bhv_type | STRING | The type of the behavior, such as expose, dwell, browse, add to favorites, or download. |
| Yes |
14 | bhv_value | STRING | The description of the behavior in JSON format. | For more information, see the "Common behavior values" section of this topic. | No |
15 | bhv_time | STRING | The time when the behavior occurs. The value is a UNIX timestamp that is accurate to the second. | Yes | |
16 | bhv_detail | STRING | The detailed description of the behavior. | The format of this field is key=value{,key=value}. One or more key=value pairs can be used. | No |
17 | ip | STRING | The IP address of the mobile phone or terminal device on which the behavior occurs. | No. However, we recommend that you specify this field. | |
18 | longitude | STRING | The longitude of the location at which the behavior occurs. | No. However, we recommend that you specify this field. | |
19 | latitude | STRING | The latitude of the location at which the behavior occurs. | No. However, we recommend that you specify this field. | |
20 | session_id | STRING | The ID of the user session. | No. However, we recommend that you specify this field. | |
21 | spm | STRING | The page module at which the behavior occurs. | The encoding format of this field is a.b.c.d, which indicates the site ID, page ID, module ID, and location ID. | No |
22 | report_src | STRING | The method that is used to upload behavioral data. | Valid values: 1, 2, and 3. A value of 1 indicates that behavioral data is uploaded by calling OpenSearch SDKs. A value of 2 indicates that behavioral data is collected by calling mobile SDKs. A value of 3 indicates that behavioral data is uploaded by calling OpenSearch API operations. | No |
23 | mac | STRING | The media access control (MAC) address of the mobile phone or terminal device on which the behavior occurs. | No | |
24 | brand | STRING | The brand of the mobile phone or terminal device on which the behavior occurs. | No. However, we recommend that you specify this field. | |
25 | device_model | STRING | The model of the mobile phone or terminal device on which the behavior occurs. | No | |
26 | resolution | STRING | The screen resolution of the mobile phone or terminal device on which the behavior occurs. | No | |
27 | carrier | STRING | The carrier of the mobile phone or terminal device on which the behavior occurs. | No | |
28 | access | STRING | The network connected to the mobile phone or terminal device on which the behavior occurs. | No | |
29 | access_subtype | STRING | The type of the network connected to the mobile phone or terminal device on which the behavior occurs. | No | |
30 | os | STRING | The operating system of the mobile phone or terminal device on which the behavior occurs. | No | |
31 | os_version | STRING | The version of the operating system of the mobile phone or terminal device on which the behavior occurs. | No | |
32 | language | STRING | The language that is configured for the mobile phone or terminal device on which the behavior occurs. | No | |
33 | phone_md5 | STRING | The MD5 hash value of the mobile number. | No | |
34 | reserve1 | STRING | A reserved field. | No | |
35 | reserve2 | STRING | A reserved field. | No | |
36 | reach_time | BIGINT | The time when the data is received by the server. The value is a UNIX timestamp that is accurate to the second. | Yes. If you upload behavioral data by using OpenSearch SDKs, the value of this field is automatically set. If you upload behavioral data by calling OpenSearch API operations, you must manually specify this field. |
Common behavior values
bhv_value format
{
"code":[2.3,2.5], # The types of the issues. The following table describes the code values.
"expect" : "expected response"
}ID | code | Description |
1 | 1.1 | Question understanding - No search results or reference links are available for the answer. |
2 | 2.1 | Answer quality - Fabricated content exists in the answer. |
3 | 2.2 | Answer quality - Conceptual confusion exists in the answer. |
4 | 2.3 | Answer quality - Duplicated content exists in the answer. |
5 | 2.4 | Answer quality - Irrelevant content exists in the answer. |
6 | 2.5 | Answer quality - The answer is not comprehensive. |
7 | 2.6 | Answer quality - The answer is ambiguous in expression. |
8 | 3.1 | Harmful information - Sensitive content exists in the answer. |
9 | 3.2 | Harmful Information - Discriminatory content exists in the answer. |
10 | 3.3 | Harmful Information - Harmful content exists in the answer. |
Other issues |
Push collected data
Parameters
Parameter | Type | Description |
$docJson | string | The list of documents in JSON format. |
$searchAppName | string | The name of the associated OpenSearch application. |
$dataCollectionName | string | The name of data collection, which is returned by the OpenSearch console when the feature of collecting behavioral data is enabled. |
$dataCollectionType | string | Data collection type. Set the value to BEHAVIOR. |
Dependencies
<dependency>
<groupId>com.aliyun.opensearch</groupId>
<artifactId>aliyun-sdk-opensearch</artifactId>
<version>4.0.0</version>
</dependency>pip install alibabacloud_tea_util
pip install alibabacloud_opensearch_util
pip install alibabacloud_credentialsDemo code for pushing data
import com.aliyun.opensearch.DataCollectionClient;
import com.aliyun.opensearch.OpenSearchClient;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult;
import static org.junit.Assert.assertTrue;
public class PushLLMBehavioralData {
private static String accesskey = "your ak";
private static String secret = "your secret";
private static String host = "your host";
private static String searchAppName = "app_name";
private static String dataCollectionName = "app_name";
private static String dataCollectionType = "DATA_COLLECTION_TYPE_BEHAVIOR";
public static void main(String[] args) throws Exception {
// Create an OpenSearch object.
OpenSearch opensearch = new OpenSearch(accesskey, secret, host);
// Use the OpenSearch object as a parameter to create an OpenSearchClient object.
OpenSearchClient client = new OpenSearchClient(opensearch);
// Use the OpenSearchClient object as a parameter to create a DataCollectionClient object.
DataCollectionClient dataCollectionClient = new DataCollectionClient(client);
// Push documents.
String docJson = "[{\"cmd\": \"ADD\",\"fields\": {"\user_id"\:"\1120021255"\,\"biz_id\": 1365378,\"rn\": \"170107366216796189819166\",\"trace_id\": \"Alibaba\",\"item_id\": \"id\",\"item_type\": \"goods\",\"bhv_type\": \"click\",\"bhv_time\": \"1701074578\"}}]";
try {
OpenSearchResult openSearchResult = dataCollectionClient.push(docJson,
searchAppName, dataCollectionName,
dataCollectionType);
System.out.println(openSearchResult);
} catch (Exception e) {
e.printStackTrace();
assertTrue(false);
return;
}
}
}# -*- coding: utf-8 -*-
import time, os
from typing import Dict, Any
from Tea.exceptions import TeaException
from Tea.request import TeaRequest
from alibabacloud_tea_util import models as util_models
from BaseRequest import Config, Client
class opensearch:
def __init__(self, config: Config):
self.Clients = Client(config=config)
self.runtime = util_models.RuntimeOptions(
connect_timeout=10000,
read_timeout=10000,
autoretry=False,
ignore_ssl=False,
max_idle_conns=50,
max_attempts=3
)
self.header = {}
def behaviorBulk(self, app_name: str, collections_name: str, doc_content: list) -> Dict[str, Any]:
try:
response = self.Clients._request(method="POST",
pathname=f'/v3/openapi/app-groups/{app_name}/data-collections/{app_name}/data-collection-type/DATA_COLLECTION_TYPE_BEHAVIOR/actions/bulk',query={},headers = self.header,
body=doc_content, runtime=self.runtime)
return response
except Exception as e:
print(e)
if __name__ == "__main__":
# Specify the endpoint of the OpenSearch API.
endpoint = "<endpoint>"
# Specify the request protocol. Valid values: HTTPS and HTTP.
endpoint_protocol = "HTTP"
# Specify your AccessKey pair.
# Obtain the AccessKey ID and AccessKey secret from environment variables.
# You must configure environment variables before you run this code.
access_key_id = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_ID")
access_key_secret = os.environ.get("ALIBABA_CLOUD_ACCESS_KEY_SECRET")
# Specify the authentication method. Default value: access_key. A value of sts specifies authentication based on RAM and STS.
# Valid values: sts and access_key.
auth_type = "sts"
# If you use authentication based on RAM and STS, you must specify the security_token parameter. You can call the AssumeRole operation of Alibaba Cloud RAM to obtain an STS token.
security_token = "<security_token>"
# Specify common request parameters.
Configs = Config(endpoint=endpoint, access_key_id=access_key_id, access_key_secret=access_key_secret,
security_token=security_token, type=auth_type, protocol=endpoint_protocol)
# Create an OpenSearch instance.
ops = opensearch(Configs)
app_name = "app_name"
# --------------- Push behavior logs ---------------
# item_id: the ID of the primary key returned in search results.
item_id = "358713"
# ops_request_misc: the request miscellaneous information returned in search results.
ops_request_misc = "%7B%22request%5Fid%22%3A%22161777635816780357273903%22%2C%22scm%22%3A%2220140713.130149759..%22%7D"
# bhv_type: the event type of the behavioral data. Valid values:
# expose: exposes a commodity.
# cart: adds a commodity to the shopping cart.
# collect: adds a commodity to favorites.
# comment: posts a comment on a commodity.
# buy: purchases a commodity.
# like: likes a commodity.
# dislike: dislikes a commodity.
bhv_type = "like"
# request_id: the request ID returned in search results.
request_id = "161777635816780357273903"
# The time when the data is received by the server. The value is a UNIX timestamp that is accurate to the second.
reach_time = "1709708439"
# user_id: the unique ID of the user who sent the request.
# * In most cases, the ID is that of the user that has logged on.
# * If the user sent the request from a PC client and has not logged on, the ID is the cookie ID.
user_id = "a7a0d37c824b659f36a5b9e3b819fcdd"
behavior_fields1 = behavior_fields2 = {
"item_id": item_id,
"sdk_type": "opensearch_sdk",
"sdk_version": "<sdk_version>", # The version number of the current OpenSearch SDK for Python, which is 3.2.0.
{"trace_id", "ALIBABA"}, # The service provider.
"trace_info": ops_request_misc,
"bhv_type": bhv_type,
"item_type": "item",
"rn": request_id,
"biz_id":"<biz_id>", # The numerical ID that is used by mobile applications or application clients to differentiate business. This parameter can be associated with OpenSearch applications and Artificial Intelligence Recommendation (AIRec) instances.
"reach_time": reach_time,
"user_id": user_id,
}
behavior_documents = [{"cmd": "add", "fields": behavior_fields1}, {"cmd": "add", "fields": behavior_fields2}]
res6 = ops.behaviorBulk(app_name=app_name, collections_name=app_name, doc_content=behavior_documents)
print(res6)Demo code for committing data
package com.aliyun.opensearch.demo;
import com.aliyun.opensearch.DataCollectionClient;
import com.aliyun.opensearch.OpenSearchClient;
import com.aliyun.opensearch.sdk.generated.OpenSearch;
import com.aliyun.opensearch.sdk.generated.commons.OpenSearchResult;
import java.util.HashMap;
import java.util.Map;
public class PushDataCollectionDoc {
private static String accesskey = "your ak";
private static String secret = "your secret";
private static String host = "your host";
private static String searchAppName = "opensearch_app_name";
private static String dataCollectionName = "opensearch_app_name";
private static String dataCollectionType = " DATA_COLLECTION_TYPE_BEHAVIOR";
public static void main(String[] args) {
// Create an OpenSearch object.
OpenSearch opensearch = new OpenSearch(accesskey, secret, host);
// Use the OpenSearch object as a parameter to create an OpenSearchClient object.
OpenSearchClient client = new OpenSearchClient(opensearch);
// Use the OpenSearchClient object as a parameter to create a DataCollectionClient object.
DataCollectionClient dataCollectionClient = new DataCollectionClient(client);
Map<String, Object> fields = new HashMap<String, Object>();
// The unique user ID.
fields.put("user_id", "1120021255");
// The numeric ID used to distinguish between different search services. A numeric ID matches an OpenSearch application.
fields.put("biz_id", 1365378);
// The value of this field is the value of the request_id parameter that OpenSearch returns in the search results. Pass in the value of the request_id parameter as it is.
fields.put("rn", "1564455556323223680397827");
// If the document is searched and collected from OpenSearch, set this field to Alibaba.
fields.put("trace_id", "Alibaba");
// The value of this field is the primary key value of the primary table in the OpenSearch application.
fields.put("item_id", "2223");
// The type of the item
fields.put("item_type", "goods");
// The behavioral data that is collected on clicks.
fields.put("bhv_type", "like");
// The time at which the behavior occurs. The value is a UNIX timestamp that is accurate to the second.
fields.put("bhv_time", "1566475047");
// Add a document.
// The document is added to the SDK client buffer but is not committed to the server. The document is committed to the server only when you call the commit() method.
// You can call the add() method multiple times to add multiple documents, and then call the commit() method to commit the documents to the server at a time.
dataCollectionClient.add(fields);
try {
OpenSearchResult openSearchResult = dataCollectionClient.commit(searchAppName, dataCollectionName, dataCollectionType);
System.out.println(openSearchResult);
} catch (Exception e) {
e.printStackTrace();
return;
}
}
}