All Products
Search
Document Center

Artificial Intelligence Recommendation:FAQ related to data access

Last Updated:Aug 11, 2023

FAQ related to data access

1. What is a scene ID? Does a scene ID in the behavior table need to be mapped to that in the item table during data tracking or queries? What is the usage of a scene ID?

For example, the item table contains Item A. The ID of Item A is 1. The scene IDs of Item A are 1001 and 1002. The behavior table contains the following two behavior records: item_id=1,scene_id=1001,bhv_type=click and item_id=1,scene_id=1002,bhv_type=expose.

Scenes indicate different usage of an item and can be considered categories of the item. If an item is used only in one scene, you can leave the scene_id field empty. For more information about the usage of a scene ID during data tracking, see Use event tracking.

If an item is used in multiple scenes, you must specify the scene IDs in the scene_id field and separate the scene IDs with commas (,).

If multiple scenes are involved in a query, set the scene_id field to the scene IDs. If only one scene ID is involved in a query, you do not need to specify the scene_id field. For more information, see Obtain recommendation results.

If the scene_id field is set to 1001 in a query, this item may be recalled. In this case, Artificial Intelligence Recommendation (AIRec) determines whether to recommend this item based on the result provided by the algorithm model. If the scene_id field is set to 1003, this item is never recalled.

2. How do I check whether data is uploaded? How do I query the uploaded data?

If the SDK returns true after you upload data, the upload message has been sent. You can query the update history or data table to check whether data is uploaded. For more information, see Data query.

3. What is the usage of a category? How do I use a category? Is a category required?

Reasonable category reporting can improve the algorithm performance and help discretize data.

A category is similar to an N-level category of an item on Taobao. A category is used for discretization in AIRec. A category can also be used as a feature to sort data in the algorithm model of AIRec. This improves the algorithm performance.

The usage of a category depends on your business logic. If discretization is not required, the category can be left empty. For more information about the usage of a category, see the description of the category fields in E-commerce industry. For more information about how to use a category during data tracking, see Use event tracking. For more information about the discretization feature, see Improve the diversity of recommendations by using instance operation rules.

4. How do I use constructed data to start AIRec and test the related SDK if I have not prepared data?

AIRec provides full initialization data for testing. You can use the data to start AIRec and test the push and query features of the SDK. After tests are complete, you can replace all test data by creating a data version. You can also push incremental data to overwrite the original test data.

For more information about the download links of the initialization data, see Sample data.

5. How do I set the tag field? Do tags in the user table need to be associated with those in the item table?

For example, you manage an e-commerce app, and your operations team maintains a tag pool that contains tags such as red, yellow, black, casual, and formal wear. These tags are used to identify your products and the preferences of users.

The tags in the user table are associated with those in the item table. A tag pool can contain a maximum of 50,000 tags.

The tags in the user table and the item table of AIRec must come from the same tag pool. The total number of tags in a tag pool cannot exceed 50,000.

A tag provides a user portrait or an item portrait. Tags are manually specified by your algorithm team or operations team. Alternatively, users can select tags when they are registered with your app. Tags are used as input information for algorithm training in AIRec. Proper tags can improve the algorithm performance.

If the tags of a user are black and formal wear, AIRec tends to recommend the items that contain these tags to this user. This improves the algorithm performance.

6. What are the differences between add and update operations?

An add operation is used to add a data record, whereas an update operation is used to update a data record.

You can perform an add operation to add a data record. If a data record already exists, the existing data record is replaced by the new data record. To add a data record, you must submit the primary key that consists of the item_id and item_type fields and all related fields.

An update operation updates an existing data record. Only the fields to be updated are reported after the primary key that consists of the item_id and item_type fields is confirmed. If you update a data record that does not exist, data may be lost and unexpected errors may occur. Therefore, we recommend that you do not use an update operation as an add operation.

7. How do I transmit tag data?

Separate multiple tags with commas (,). A piece of content contains a maximum of 100 tags. A tag pool contains a maximum of 50,000 tags.

8. What are the definitions of the feature and feature_num fields?

The feature field specifies a descriptive item feature of the STRING type.

The feature_num field specifies an item feature of the NUMERIC type. You can use the two fields to customize item features.

9. What do I do if I want to use an item type that is not within the valid values of the item_type field?

Valid values of the item_type field are image, article, video, shortvideo, item, recipe, and audio.

For example, you want to use the theme type but this item type is not within the valid values of the item_type field. In this case, you can use an unoccupied item type that is included in the valid values of the item_type field, such as audio, to represent the theme type.

10. What is the traceInfo field? How do I obtain the field value? What is the usage of this field?

a. The traceInfo field specifies the tracing information about a recommendation link.

b. You can obtain the value of this field after you obtain recommendation results from AIRec by using a server SDK.

c. After you obtain the trace information, if behavioral data is generated based on the recommended item ID, you must report the obtained trace information to the behavior table when you report the behavioral data.

d. The trace information is used for AIRec algorithm engineers to optimize and troubleshoot algorithm models.Test sample

11. What is positive behavioral data? What is negative behavioral data?

Positive behavioral data refers to the behavioral data about positive user behavior such as clicks, likes, and purchases. Clicks are required for initialization.

Negative behavioral data mainly refers to the behavioral data about exposures. For example, if a piece of content involves only exposures but no clicks or likes, the corresponding behavioral data is considered negative. Behavioral data about exposures and clicks must be reported.

12. Do the user data, item data, and behavioral data need to be reported in real time? Why?

Incremental or existing data about users, items, and behavior must be reported in real time.

If behavioral data is not reported in real time, feedback cannot be obtained in real time. If user data is not reported in real time, AIRec may provide inaccurate recommendations for a request with a new user ID. We recommend that you update item data immediately after it is changed.

13. Do I need to retransmit data when Junior Edition is switched to Standard Edition?

No. If you want to switch from Junior Edition to Standard Edition, upgrade configurations to ensure a smooth switch.

14. What are the common errors that occur when data is pushed by using a server SDK?

ClientException: DocumentError.MissingField : Missing fields, field names: item_id: This error occurs if a required field, such as item_id, is not reported or left empty.

ClientException: BadFormat : null: This error occurs if the reported JSON strings fail to be parsed due to a string concatenation error.

ClientException: InstanceNotExist : The specified instance does not exist: This error occurs in one of the following scenarios: 1. The instance has not been started. 2. The region specified in the pushed data code is invalid and needs to be changed to the region in which the purchased instance resides.

ServerException:FetchDocumentBackendError:Internal server error: This error occurs if the fixed strings in the reported JSON data have spelling errors. For example, field is mistakenly written as filed.

15. What do I do if no exposure data or trace information can be provided based on the instrumentation logic of my application?

If the instrumentation logic of your application does not allow you to obtain exposure data or trace information, you can enable the special processing feature in AIRec. This way, AIRec automatically uploads exposure data and trace information without the need for manual upload. You can separately enable this feature for exposure data and trace information.

  • Special processing of exposure data:

    Exposure data of a recommended item refers to the data that is reported to AIRec after the recommended item appears in the feed streams of users for a specific period and is viewed by users. Exposure data affects subsequent behavioral data of users and is required for training AIRec algorithms. If you cannot provide accurate exposure data, you can enable the special processing feature for exposure data in AIRec. This way, AIRec can automatically supplement exposure data to start the required instance. To enable the special processing feature, perform the following operations:

    Procedure

    On the details page of the instance that you want to manage in the AIRec console, click Data Source in the left-side navigation pane.

    image..png

    Click the Show icon before Real-time Data Source. Then, you can enable the special processing feature for exposure data, as shown in the following figure.

    Select No, special processing is required.

  • Special processing of trace information:

    Trace information is used to identify whether the user behavior, such as the click behavior, is performed based on the recommendation results provided by AIRec during model calculation. AIRec can use trace information to iterate and update recommendation algorithms. If you cannot provide accurate trace information, you can enable the special processing feature for trace information in AIRec. This way, AIRec can automatically supplement trace information to start the required instance.

    For more information about trace information, see Question 10 in this topic.

    To enable the special processing feature, perform the following operations:

    Procedure

    On the details page of the instance that you want to manage in the AIRec console, click Data Source in the left-side navigation pane.

    image..pngClick the Show icon before Real-time Data Source. Then, you can enable the special processing feature for trace information, as shown in the following figure.

    Select No, special processing is required. AIRec automatically supplements the required trace information.