All Products
Search
Document Center

Platform For AI:Swing Train

Last Updated:Mar 27, 2024

Swing is an item recall algorithm. You can use the Swing Train component of Platform for AI (PAI) to measure the similarity of items based on user-item-user principles. This topic describes how to configure the Swing Train component.

Limits

You can use the Swing Train component based on the computing resources of MaxCompute and Realtime Compute for Apache Flink.

Configure the component

You can configure the component by using one of the following methods:

Method 1: Configure the component in the PAI console

Configure the Swing Train component on the pipeline page of Machine Learning Designer. The following table describes the parameters.

Tab

Parameter

Description

Field Setting

itemCol

The name of the item column.

userCol

The name of the user column.

Parameter Setting

alpha

The alpha parameter. Default value: 1.0.

maxItemNumber

The maximum number of users who use an item for the calculation. Default value: 1000.

Note

If the number of occurrences of an item is greater than this value, the algorithm randomly selects the maximum number of users based on the total number of users.

maxUserItems

The maximum number of items used by a user for the calculation. Default value: 1000.

Note

If the number of items used by a user for the calculation is greater than this value, the user is not included in the calculation.

minUserItems

The minimum number of items used by a user for the calculation. Default value: 10.

Note

If the number of items used by a user for the calculation is less than this value, the user is not included in the calculation.

resultNormalize

Specifies whether to normalize the results.

userAlpha

The alpha parameter for users. Default value: 5.0.

userBeta

The beta parameter for users. Default value: -0.35.

Execute Tuning

Number of Workers

The number of worker nodes. The value must be a positive integer. This parameter must be used together with the Memory per worker, unit MB parameter. Valid values: 1 to 9999.

Memory per worker

The memory size of each worker node. Unit: MB. The value must be a positive integer. Valid values: 1024 to 65536.

Method 2: Configure the component by using Python code

You can configure the Swing Train component by using the PyAlink Script component to call Python code. For more information, see the PyAlink script documentation.

Parameter

Required

Description

Default value

itemCol

Yes

The name of the item column.

N/A

userCol

Yes

The name of the user column.

N/A

alpha

No

The alpha parameter, which is a smoothing factor.

1.0

userAlpha

No

The alpha parameter for users.

Note

This parameter is used to calculate the weight of a user by using the following formula: User weight = 1.0/(userAlpha + userClickCount)^userBeta.

5.0

userBeta

No

The beta parameter for users.

Note

This parameter is used to calculate the weight of a user by using the following formula: User weight = 1.0/(userAlpha + userClickCount)^userBeta.

-0.35

resultNormalize

No

Specifies whether to normalize the value.

false

maxItemNumber

No

The maximum number of users who use an item for the calculation.

Note

If the number of occurrences of an item is greater than this value, the algorithm randomly selects the maximum number of users based on the total number of users.

1000

minUserItems

No

The minimum number of items used by a user for the calculation.

Note

If the number of items used by a user for the calculation is less than this value, the user is not included in the calculation.

10

maxUserItems

No

The maximum number of items used by a user for the calculation.

Note

If the number of items used by a user for the calculation is greater than this value, the user is not included in the calculation.

1000

Sample Python code:

df_data = pd.DataFrame([
    ["a1", "11L", 2.2],
    ["a1", "12L", 2.0],
    ["a2", "11L", 2.0],
    ["a2", "12L", 2.0],
    ["a3", "12L", 2.0],
    ["a3", "13L", 2.0],
    ["a4", "13L", 2.0],
    ["a4", "14L", 2.0],
    ["a5", "14L", 2.0],
    ["a5", "15L", 2.0],
    ["a6", "15L", 2.0],
    ["a6", "16L", 2.0],
])

data = BatchOperator.fromDataframe(df_data, schemaStr='user string, item string, rating double')


model = SwingTrainBatchOp()\
    .setUserCol("user")\
    .setItemCol("item")\
    .setMinUserItems(1)\
    .linkFrom(data)

model.print()

predictor = SwingRecommBatchOp()\
    .setItemCol("item")\
    .setRecommCol("prediction_result")

predictor.linkFrom(model, data).print()

Examples

The following figure shows a sample pipeline in which the Swing Train component is used. 使用示例 In this example, the following steps are performed to configure the components in the preceding figure:

  1. Prepare a training dataset and a test dataset.

  2. Create two MaxCompute tables named Table 1 and Table 2. Table 1 contains the userid and itemid fields, and Table 2 contains the itemid field. The fields are of the STRING type. Run the tunnel command on the MaxCompute client to upload the training dataset to Table 1 and the test dataset to Table 2. Then, set the Table Name parameter of the Read Table-1 component to Table 1 and the Table Name parameter of the Read Table-2 component to Table 2. For information about how to install and configure the MaxCompute client, see MaxCompute client (odpscmd). For information about Tunnel commands, see Tunnel commands.

  3. Import the training dataset to the Swing Train component and configure the component parameters. For more information, see the Method 1: Configure the component in the PAI console section of this topic.

  4. Use the test dataset and the trained model as input to the Swing Recommendation component to perform prediction.