All Products
Search
Document Center

Artificial Intelligence Recommendation:Data registration and field configuration

Last Updated:Oct 20, 2025

Data registration

Before data registration, make sure that you have activated MaxCompute (previously known as ODPS), bound a MaxCompute project, and registered a MaxCompute table. This table is the intermediate table for performance statistics collection, and is also the source table of experiment reports. This table uses user_id, exp_id, and a dimension field as the composite primary key. The composite primary key indicates the metrics of a user in a certain experiment, such as impressions and clicks. For example, you can configure country as the dimension field and use this table as the intermediate statistical table that is collected by user_id, country, and exp_id. The configurations of single-dimension metrics and derived metrics depend on this intermediate table. Before you understand the source table of experiment reports, you must be familiar with the overall experiment architecture, as shown in the following figure.

image.png

PAI-Rec is the recommendation service provided by Alibaba Cloud. This service integrates the SDK of the configuration center from which experiment information is obtained. The response returned by the recommendation service contains the experiment ID (exp_id) and the request ID (reqid).

When a user performs an operation on an app, behavioral logs are returned to MaxCompute by the front-end event tracking task on the app. A source table of experiment reports is generated during SQL-based statistics collection. After several metrics are configured in the configuration center, a SQL statement is dynamically assembled to collect statistics from the source table of experiment reports and save the results to a Hologres database. You can query metric data on the Performance Reports page of the PAI-Rec console.

I. Offline experiment reports

Code for generating an offline source table of experiment reports

CREATE TABLE IF NOT EXISTS experiment_report (
	user_id STRING COMMENT 'User ID',
        exp_id STRING COMMENT 'Experiment ID',
	exposure_count BIGINT COMMENT 'Exposure count',
	click_count BIGINT COMMENT 'Click count',
	like_comment_collect_count BIGINT COMMENT 'Like count'
)
PARTITIONED  by (dt STRING COMMENT 'Date')
STORED AS ALIORC;
INSERT OVERWRITE TABLE experiment_report PARTITION (dt = '${bdp.system.bizdate}')
SELECT  user_id
        ,exp_id
        ,SUM(IF(event == 'expose',1,0))
        ,SUM(IF(event == 'click',1,0))
        ,SUM(IF(event IN ('like','comment','collect'),1,0))
FROM    rec_sln_demo_behavior_table_v1
WHERE   ds = '${bdp.system.bizdate}'
GROUP BY user_id
         ,exp_id

Fields in an offline source table of experiment reports

Field

Description

Required

Remarks

user_id

User ID

Yes

The value of this field can be a user ID, a device ID, or an International Mobile Equipment Identity (IMEI) number.

exp_id

Experiment ID

Yes

The value of this field is the experiment ID returned by the A/B test, which is logged by the event tracking task. If you use a self-managed recommendation service, you can specify a custom experiment ID. Example: exp_id:ER1_L1#EG1#E1_L2#EG2#E2

Dimension field

Used for more granular performance report statistics

No

You can use this dimension field to collect statistics and display corresponding metrics.

For example, you can set a province field to collect statistics and analyze product performance in different provinces.

You can set a maximum of two fields such as province and city. When you set a third field, there are too many combinations, which slows down the calculation.

Common dimension fields can also include recall ID (recall_id), which is used to analyze the performance of different recall sources. You need to first set up the backend logs of the PAI-Rec engine. For more information, see Other configurations for pairec_debug_log.

Compute field

You can specify multiple compute fields,

such as show_count (impressions);

click_count (clicks)

Yes

You need to enter numeric values for the compute fields, such as the number of impressions, the number of clicks, and view duration.

dt

The date field.

Yes

The value of this field is in the yyyyMMdd format.

hh

The hour field.

No

The 24-hour system is used. The value of this field ranges from 00 to 23.

mm

The minute field.

No

The value of this field ranges from 00 to 59.

The source table can be generated offline on a daily basis or on an hourly basis (for example, scheduled once per hour in DataWorks to collect statistics on impression, click, and purchase log data from the previous hour), depending on your business requirements. If the source table is generated offline, the user_id, exp_id, and dt fields are required. If the source table is generated on an hourly basis, the user_id, exp_id, dt, and hh fields are required. The mm field is optional.

Metrics are divided into offline metrics and real-time metrics (corresponding to hourly reports). If data sources are generated in real time, offline metrics and real-time metrics can share the same data source. If data sources are generated offline, you can only create offline metrics, not real-time metrics.

The dimension field in the preceding table is optional. After the dimension field is specified, you can analyze metrics by the specified dimension. For example, you can view report data by common dimension fields including gender and OS (such as iOS and Android).

II. Real-time source table of experiment reports

Code for generating a source table of experiment reports

CREATE TABLE IF NOT EXISTS experiment_report_real (
	user_id STRING COMMENT 'User ID',
        exp_id STRING COMMENT 'Experiment ID',
	exposure_count BIGINT COMMENT 'Exposure count',
	click_count BIGINT COMMENT 'Click count',
	like_comment_collect_count BIGINT COMMENT 'Like count'
)
PARTITIONED  by (
    dt string
    ,hh string
)
STORED AS ALIORC;
INSERT OVERWRITE TABLE experiment_report_real PARTITION (dt = '${bdp.system.bizdate}',hh = '${hour}')
SELECT  user_id
        ,exp_id
        ,SUM(IF(event == 'expose',1,0))
        ,SUM(IF(event == 'click',1,0))
        ,SUM(IF(event IN ('like','comment','collect'),1,0))
FROM    rec_sln_demo_behavior_table_v1
WHERE   ds = '${bdp.system.bizdate}'
AND hh = hour(now()) -1
GROUP BY user_id
         ,exp_id;

Fields in a real-time source table of experiment reports

Field

Description

Required

Remarks

user_id

User ID

Yes

The value of this field can be a user ID, a device ID, or an International Mobile Equipment Identity (IMEI) number.

exp_id

Experiment ID

Yes

The value of this field is the experiment ID returned by the A/B test, which is logged by the instrumentation task. If you use a self-managed recommendation service, you can specify a custom experiment ID. Example: exp_id:ER1_L1#EG1#E1_L2#EG2#E2

Dimension field

The dimension field, such as OS

No

You can filter metrics based on the specified dimension field.

Compute field

You can specify multiple compute fields,

such as show_count (impressions) and click_count (clicks)

Yes

You need to enter numeric values for the compute fields, such as the number of impressions, the number of clicks, and view duration.

dt

Date partition

Yes

The value of this field is in the yyyyMMdd format.

hh

Hour partition

Yes

The 24-hour system is used. The value of this field ranges from 00 to 23.

mm

Minute partition

No

The value of this field ranges from 00 to 59.

The fields in result tables are fixed, except the dimension field. Make sure that the definition of the dimension field in result tables is consistent with that in source tables.

Offline metrics and real-time metrics cannot share a result table. Among the three time fields, only the dt field is required for the result table of offline metrics, and the dt and hh fields are required for the result table of real-time metrics.

After you obtain the source table of experiment reports, you can go to the Data Registration page on the PAI-Rec console to add a MaxCompute table. To access the Data Registration page, choose Metric Management > Data Registration in the navigation pane on the left.

We recommend that you use the default Hologres table to write back metric calculation results. This table does not consume resources. The system automatically creates tables and stores relevant metric data for you. You can also use the Hologres table you created instead of the default Hologres table.

III. Register a MaxCompute table

Click Create Data Table. In the panel that appears, select a MaxCompute project, select the MaxCompute data table that you want to add, specify a name for the data table, and then click Import.

Note

If the fields in the data table that you registered are modified, we recommend that you import the data table again at the earliest opportunity. Otherwise, the system may not recognize the modified fields.

image.png

Field configuration

The registered data table automatically appears in the list. You can click View Fields in the Actions column of the data table to view the fields in the data table or modify the fields. The user_id, exp_id, and dt fields are required. These fields represent the total number of behaviors such as impressions, clicks, and likes of a user (user_id) in a set of experiments (exp_id) on a specific day (dt). Custom metrics are defined and calculated based on this table.

image

Dimension fields

You can add one or two dimension fields to the preceding table. For example, if you set city as a dimension field, you can view the comparative effects of experiments under different city dimensions in the experiment report. If you set city and gender as dimension fields, the experiment comparison results will be based on the combination of these two fields.