All Products
Search
Document Center

OpenSearch:Data collection V2.0

Last Updated:Jan 31, 2024

Benefits of uploading behavioral data to OpenSearch

  • You can use behavioral data to understand user reactions to search results, such as browse, click, dwell, like, share, add to favorites, and purchase. This can provide guidance for you to optimize search effects.

  • The report statistics feature of OpenSearch allows you to view various search reports for applications, such as the reports of page views (PVs), item page views (IPVs), and click-through rate (CTR). You can improve your business operations based on the reports.

  • OpenSearch provides an algorithm platform, which allows you to use feedback data of search behavior to train search and sort algorithm models. This helps you improve your search effects.

Usage notes

  • The data collection feature is automatically enabled after an application is created.

  • Data refers to the feedback data of user reactions to search results.

  • Collection refers to the process of uploading search behavioral data to OpenSearch by using OpenSearch SDKs. In the latest version, OpenSearch allows you to collect search behavioral data only by using a server SDK. The features of collecting search behavioral data by using a mobile SDK or web SDK are under development.

  • Compared with earlier data collection features, the data collection V2.0 feature allows you to pass parameters and use SDKs with ease. If you are new to OpenSearch, you can use OpenSearch SDKs to upload behavioral data by using the fields that are described in this topic. Note: The SDK for Java 3.4.0 and SDK for PHP 3.2.0 support data collection V2.0.

Upload behavioral data

Note: After you enable the feature of collecting behavioral data in the OpenSearch console, we recommend that you upload behavioral data by using SDKs. The following section describes the fields that are used to upload behavioral data. Description:

  1. To upload behavioral data by using SDKs, you must specify the following fields: imei or user_id, biz_id, trace_id, rn, bhv_type, bhv_time, item_id, and item_type.

  2. To upload behavioral data by calling API operations, you must also specify the reach_time field in addition to the preceding fields.

  3. For more information about the demos for uploading behavioral data by using SDKs or calling API operations, see SDKs for data collection V2.0.

Description of behavioral data fields

ID

Field

Type

Description

Value

Required

1

app_version

STRING

The version number of the website or mobile app that collects behavioral data.

No

2

sdk_type

STRING

The type of the SDK that is used to upload behavioral data. OpenSearch uses this field to distinguish whether behavioral data is uploaded or collected by using a server SDK or mobile SDK.

No. If you upload behavioral data by using OpenSearch SDKs, this field is set to opensearch_sdk by default.

3

sdk_version

STRING

The version number of the SDK that is used to upload behavioral data.

No. If you upload behavioral data by using OpenSearch SDKs, this field is specified by default.

4

login

STRING

Specifies whether the user has logged on to the website or mobile app that collects behavioral data.

Valid values: 0 and 1. 0: indicates that the user has not logged on. 1: indicates that the user has logged on.

No

5

user_id

STRING

The ID that is used to uniquely identify the user.

No. However, you must specify either the imei field or the user_id field.

6

imei

STRING

The ID of the user device. Valid values: imei, device_id, and idfa.

No. However, you must specify either the imei field or the user_id field.

7

biz_id

STRING

A numeric ID that is used to distinguish between different search services. Generally, a biz_id field represents an OpenSearch application. You can specify multiple biz_id fields to represent web, iOS, and Android applications. These fields can be used to divide traffic and run tests in subsequent steps.

If you do not distinguish search services, we recommend that you set this field to default. If you distinguish search services, you can set this field to pc, ios, or android based on your business requirements.

Yes

8

trace_id

STRING

The provider of the search service from which the document is searched and collected.

If the document is searched and collected from OpenSearch, set this field to Alibaba. If the document is searched and collected from another service provider, specify this field based on your business requirements.

Yes

9

trace_info

STRING

The value of this field is the value of the ops_request_misc parameter that OpenSearch returns in the search results. Pass in the value of the ops_request_misc parameter as it is.

No

Note: You must pass in this field if the trace_id field is set to Alibaba. This field is used to check whether the search results are provided from OpenSearch.

10

rn

STRING

This field is used to identify a PV. The value of this field is the value of the request_id parameter that OpenSearch returns in the search results. Pass in the value of the request_id parameter as it is.

Yes

11

item_id

STRING

The primary key value of a document. The value of this field is the primary key value of the primary table in the OpenSearch application.

Yes

12

item_type

STRING

The business type of the document.

For more information about valid values of this field, see the Description of the item_type field section of this topic.

Yes

13

bhv_type

STRING

The type of the behavior, such as expose, dwell, browse, add to favorites, and download.

For more information about valid values of this field, see the Common behavior types section of this topic.

Yes

14

bhv_value

STRING

The value that is used to measure the behavior, such as the dwell time and number of items that are purchased.

For more information about valid values of this field, see the Common behavior types section of this topic.

No

15

bhv_time

STRING

The time when the behavior occurs. The value is a UNIX timestamp that is accurate to the second.

Yes

16

bhv_detail

STRING

The detailed description of the behavior.

The format of this field is key=value{,key=value}. The value can contain one or more key=value pairs.

No

17

ip

STRING

The IP address of the mobile phone or terminal device on which the behavior occurs.

No. However, we recommend that you specify this field.

18

longitude

STRING

The longitude of the location at which the behavior occurs.

No. However, we recommend that you specify this field.

19

latitude

STRING

The latitude of the location at which the behavior occurs.

No. However, we recommend that you specify this field.

20

session_id

STRING

The ID of a user session.

No. However, we recommend that you specify this field.

21

spm

STRING

This field is used to track the page module at which the behavior occurs.

The encoding format of this field is a.b.c.d, which indicates the site ID, page ID, module ID, and location ID.

No

22

report_src

STRING

This field is used to identify the method that is used to upload behavioral data.

Valid values: 1, 2, 3, and patch_data.

  • 1: indicates that behavioral data is uploaded by calling OpenSearch SDKs.

  • 2: indicates that behavioral data is collected by calling mobile SDKs.

  • 3: indicates that behavioral data is uploaded by calling OpenSearch API operations.

  • patch_data: indicates that behavioral data is uploaded together with historical data or data of other sources.

No

23

mac

STRING

The media access control (MAC) address of the mobile phone or terminal device that collects behavioral data.

No

24

brand

STRING

The brand of the mobile phone or terminal device that collects behavioral data.

No. However, we recommend that you specify this field.

25

device_model

STRING

The model of the mobile phone or terminal device that collects behavioral data.

No

26

resolution

STRING

The screen resolution of the mobile phone or terminal device that collects behavioral data.

No

27

carrier

STRING

The carrier of the mobile phone or terminal device that collects behavioral data.

No

28

access

STRING

The network connected to the mobile phone or terminal device that collects behavioral data.

No

29

access_subtype

STRING

The type of the network connected to the mobile phone or terminal device that collects behavioral data.

No

30

os

STRING

The operating system of the mobile phone or terminal device that collects behavioral data.

No

31

os_version

STRING

The version of the operating system of the mobile phone or terminal device that collects behavioral data.

No

32

language

STRING

The language that is configured for the mobile phone or terminal device that collects behavioral data.

No

33

phone_md5

STRING

The MD5 hash value of a mobile phone number.

No

34

reserve1

STRING

A reserved field.

No

35

reserve2

STRING

A reserved field. If the report_src field is set to patch_data, you must set the reserve2 field to the value of the raw_query field.

No

36

reach_time

BIGINT

The time when the data is received by the server. The value is a UNIX timestamp that is accurate to the second.

Yes. If you upload behavioral data by using OpenSearch SDKs, this field is automatically configured by the SDKs. If you upload behavioral data by calling API operations of OpenSearch, you must specify this field.

Description of the item_type field

ID

item_type

Description

1

goods

Goods and commodities

2

article

Articles, blogs, and fictions

3

ask

Q&A

4

bbs

Forum posts

5

download

Item downloads

6

image

Images

7

media

Multimedia such as movies, TV plays, and music

8

recipe

Food and recipes

9

news

News and information

10

institution

Organizations

11

other

Others

Common behavior types

ID

bhv_type

Description

bhv_value

bhv_detail

1

expose

The behavior to expose an item.

Empty.

Empty

2

stay

The behavior to dwell on a page.

The dwell time. Unit: seconds.

Empty

3

click

The behavior to click an item.

The number of clicks. Default value: 1.

Empty

4

cart

The behavior to add an item to a shopping cart, bookshelf, or playlist.

Empty.

Empty

5

buy

The behavior to purchase an item.

The number of items that are purchased. Default value: 1.

Example: buy_price=12,price_unit=RMB

  • A value of the buy_price field indicates the price of the item when the order is placed.

  • By default, the price_unit field is set to RMB.

6

collect

The behavior to add an item to favorites.

Empty.

Empty

7

like

The behavior to like an item.

The number of likes. Default value: 1.

Empty

8

dislike

The behavior to dislike an item.

The number of dislikes. Default value: 1.

Empty

9

comment

The behavior to comment on an item.

The number of comments. Default value: 1.

Empty

10

share

The behavior to share or forward an item.

The number of shares or forwards. Default value: 1.

Empty

11

subscribe

The behavior to follow or subscribe to an item.

Empty.

Empty

12

gift

The behavior to send gifts.

Empty.

Empty

13

download

The behavior to download an item.

Empty.

Empty

14

read

The behavior to read an item.

Empty.

Empty

15

tip

The behavior to reward an item.

Empty.

Empty

16

complain

The behavior to complain about an item.

Empty.

Empty

View a data report

After you enable the data collection feature and upload a specific amount of behavioral data, you can view the data status and quality on the data collection page.

验证报告

Data status

Data can be in the Normal (Available) or Abnormal (Unavailable) state. Normal (Available) indicates that no quality issue occurs on the behavioral data and the behavioral data is verified. Abnormal (Unavailable) indicates that a quality issue occurs on the behavioral data.

If data is in the Abnormal (Unavailable) state, the creation and training of popularity models and category prediction may be affected.

Abnormal data

5

Normal data

6

Data quality

If the quality check on the behavioral data fails, an error message appears on the Data Verification page in the OpenSearch console. If the quality check is passed, no error message appears on the Data Verification page.7Note: The sample data that is checked in the preceding figure is the behavioral data that is synchronized to OpenSearch within an hour before a sample quality check is performed at the beginning of each hour.