All Products
Search
Document Center

FAQ related to data access

Last Updated: Oct 14, 2021

1. What is a scene ID? Is a scene ID in the behavior table mapped to a behavior type during event tracking or queries? What is the usage of a scene ID?

Example:

The item table contains item A whose item_id is 1 and scene_id are 1001 and 1002.

The behavior table contains the following behavioral data:

item_id=1,scene_id=1001,bhv_type=click;

item_id=1,scene_id=1002,bhv_type=expose

Scenes indicate different usage of an item and can be considered categories of the item. If an item is used only in one scene, scene_id is left empty.. For more information about scenes, see Use event tracking.

If an item is used in multiple scenes, separate the scene IDs in scene_id with commas (,).

If multiple scene IDs are involved in a query, set scene_id to the ID of the scene where the item is used. If an item is used only in one scene, this parameter is not required. For more information, see Obtain recommendation results.

If scene_id is set to 1001 in a query, this item may be recalled. In this case, AIRec determines whether to recommend this item based on the result provided by the algorithm model. If scene_id is set to 1003, this item is never recalled.

The value of scene_id in the behavior table must be the same as that in the item table. An appropriate value of scene_id helps train the algorithm model.

Data in the behavior table shows that the item has one click behavior and one expose behavior in scene 1001. Behavior training on the algorithm model is separately performed in scenes 1001 and 1002. This means that behavior training in the two scenes does not interfere with each other. Therefore, an appropriate value of scene_id helps improve the algorithm performance.

2. How do I check data upload? How do I query the uploaded data?

If the SDK returns true after you upload data, the upload message has been sent. You can query the update history or data table to check whether data is uploaded. For more information, see Data management.

3. Is data of an earlier data version cleared after data of a new data version takes effect? How is the incremental data of the earlier data version processed? How long does it take for the data of a new version to take effect?

Note: Only the instances that are started by using historical data have their data versions.

For example, after user A purchases and launches AIRec, major business adjustments are made. In this case, an update is required to replace the original user IDs 1 to 10000 with new IDs 20000 to 30000. User A plans to perform this update by using AIRec.

If a large amount of data needs to be updated, we recommend that you use an SDK to push incremental data. This method ensures a smooth data update.

Specifically, user A uses an SDK to add the new user IDs (20000 to 30000) to the user table at a time and makes the new user IDs take effect in the behavior table. If no error is reported after a period of time, user A deletes the original user IDs.

A full update can be performed if quotas are insufficient. After a new data version takes effect, data of this version takes effect in the item table after four hours, and takes effect in the user table and behavior table at 00:00:00 on the next day. This is due to the limits on the scheduling period of the AIRec algorithm. Data of the earlier version can still be queried before data of the new version takes effect. After data of the new version takes effect, data of the earlier version, including the pushed incremental data, is deleted.

If user A wants to perform a full update and the earlier data version is M, user A must create version N to upload the latest user data. After version N takes effect in the AIRec console, the incremental data of version M is replaced by the incremental data of version N. However, data of version N takes effect at 00:00:00 on the next day. The user IDs that are queried on the current day are still 1 to 10000. The new user IDs 20000 to 30000 are not lost and can be queried on the next day.

To prevent AIRec from frequently sending the same content to users, exposure blocking can be used, especially in the news industry.

For a user who has a small number of items, it is very likely that the query results returned by an SDK are empty after a user ID sends multiple queries. Prepare workaround policies to avoid this issue. The AIRec console will be available to all users in the future so that users can manually specify the exposure blocking period.

4. What is the usage of category? How do I use category? Is category required?

Reasonable category reporting can improve the algorithm performance and help discretize data.

Category is similar to an N-level category of an item on Taobao. Category is used for discretization in AIRec. It can also be used as a feature to sort data in the algorithm model of AIRec. This improves the algorithm performance.

The usage of category is based on the business logic of users. If discretization is not required, category can be left empty. For more information about the usage of category, see the description of the category fields in Data specifications.For more information about the discretization feature, see Improve the diversity of recommendations by using instance operation rules.

5. How do I use constructed data to start AIRec and test the related SDK if I have not prepared data?

AIRec provides full initialization data for testing. You can use the data to start AIRec and test the push and query features of the SDK. After tests are completed, you can replace all test data by creating a new data version. You can also directly push incremental data to overwrite the original test data. To download the initialization data, visit the following URL:

URL to download initialization data

6. How do I set tag fields? Do tags in the user table need to be associated with tags in the item table?

Assume that user A manages an e-commerce app, and the operations team maintains a tag pool that includes a number of tags, such as red, yellow, black, casual, and formal wear. These tags are used to identify the products of user A and the preferences of consumers.

Tag fields in the user table are associated with those in the item table. A tag pool contains a maximum of 50,000 tags.

The tags in the user table and item table of AIRec must be from the same tag pool. We recommend that the tag pool contains a maximum of 50,000 tags.

A tag field provides user persona information or item-specific information. It is manually specified by the algorithm team or operations team of users. Users can also select their tags when they sign up with AIRec. Tags are used as input information for training the AIRec algorithm. Reasonable tags can improve the algorithm performance.

If tags of a user are black and formal wear, AIRec tends to recommend the items with these tags to this user. This improves the algorithm performance.

7. What does a data version mean?

Note: Only the instances that are started by using historical data have their data versions.

A data version makes it easy for you to manage full instance data. You can configure a maximum of three data versions to facilitate the iteration, comparison, and optimization of data versions. If a full data import is performed, a data version is automatically generated. Among the three data versions, only one version can take effect. This version becomes ineffective until another version takes effect. The remaining two versions that are not in effect follow the first-in-first-out (FIFO) principle.

I. After you import data, one or more data versions are generated.

II. If two data versions are generated, delete the previous data version.

III. Click Activate to make one of the two versions take effect. This operation triggers model training that takes about 1.5 hours.

8. Can I push incremental data to a new data version only by using an SDK after the version takes effect? Is data updated in real time?

We recommend that you use an SDK to push incremental data after a new data version takes effect and runs stably. All data updates take effect in real time.

9. What are the differences between the add and update operations?

add is used to add a document, and update is used to update a document. You can perform the add operation to add a document. If a document has been added, the existing document is replaced by the new document. To report information about an add or update operation, you must submit the primary key and all related fields. The update operation updates an existing document. Only the fields to be updated are reported after the primary key is confirmed. If you want to update a document that is not added, this update operation takes effect only after an internal scheduling period.

10. How do I transmit tag data?

Multiple tags are separated by commas (,). A piece of content can contain a maximum of 100 tags. We recommend that a tag pool contains a maximum of 50,000 tags.

11. What are the definitions of feature and feature_num?

feature refers to a descriptive item feature, which is of the STRING type. feature_num refers to an item feature, which is of a numeric type. You can use the two fields to customize item features.

12. What do I do if I want to use an item type that is not specified in item_type?

item_type can be set to image, article, video, shortvideo, item, recipe, or audio. If a type, such as audio, is not used but you want to use the theme type, use audio as theme.

13. What is traceInfo? How do I obtain this field? What is the usage of this field?

a. traceInfo specifies the tracing information about a recommendation link. b. You can obtain this field after you obtain recommendation results from AIRec by using the server SDK. c. After you obtain traceInfo, if behavioral data is generated based on the recommended item_id, you must report the obtained traceInfo to the behavior table when you report the behavioral data. d. traceInfo is used for AIRec algorithm engineers to optimize and troubleshoot algorithm models.Test sample

15. Is real-time reporting of user data, item data, and behavioral data required? Why?

Incremental or existing data of users, items, and behavior need to be reported in real time. If behavioral data is not reported in real time, the real-time link cannot be used and feedback cannot be obtained in real time. If user data is not reported in real time, AIRec may provide inaccurate recommendations for a request with a new user ID. We recommend that item data is updated immediately after it is changed.

16. Do I need to retransmit data when Junior Edition is switched to Standard Edition?

No. If you want to switch Junior Edition to Standard Edition, upgrade configurations to ensure a smooth switching.

17. Common errors when data is pushed by using the server SDK

ClientException: DocumentError.MissingField : Missing fields, field names: item_id: This error is returned when required fields, such as item_id, are missing or are empty strings.

ClientException: BadFormat : null: This error is returned when the reported JSON strings fail to be parsed due to a string concatenation error.

ClientException: InstanceNotExist : The specified instance does not exist: This error is returned in one of the following scenarios: 1. The instance has not started. 2. The region contained in the pushed data code is inaccurate and needs to be changed to the region where the purchased instance resides.

ServerException: FetchDocumentBackendError: Internal server error: This error is returned when fixed strings in JSON data have spelling errors, for example, field is mistakenly written as filed.