All Products
Search
Document Center

Platform For AI:Collaborative Filtering (etrec)

Last Updated:Feb 08, 2024

The Collaborative Filtering (etrec) component provided by Platform for AI (PAI) uses etrec which is a collaborative filtering algorithm based on items. The algorithm uses two input columns and returns the top N items with the highest similarity as the output.

Configure the component

You can use one of the following methods to configure the Collaborative Filtering (etrec) component.

Method 1: Configure the component in the PAI console

You can configure the parameters of the Collaborative Filtering (etrec) component on the pipeline page of Machine Learning Designer of PAI. The following table describes the parameters.

Tab

Parameter

Description

Fields Setting

User Column

The name of the user column.

Item Column

The name of the item column.

Delimiter between items in the output table

Specify the delimiter between items in the output table. The default delimiter is a space.

Delimiter between key-value in the output table

The delimiter that is used to separate keys and values in the output table. The default delimiter is a colon (:). Spaces are not supported.

Parameters Setting

Similarity Type

The type of similarity. Valid values: WbCosine, asymcosine, and Jaccard.

TopN

The maximum number of similar items that can be reserved in the output table.

Calculation Behavior

The method used to calculate the payload when an item of a user appears multiple times. Valid values: Add, Mul, Min, and Max.

Note

This parameter is about to be phased out and does not affect the current training result.

Minimum Item Value

If the number of items of a user is less than the value of this parameter, the behavior of the user is ignored.

Maximum Item Value

If the number of items of a user is greater than the value of this parameter, the behavior of the user is ignored.

Smoothing Factor

This parameter is valid only if the Similarity Type parameter is set to asymcosine.

Weighting Coefficient

This parameter is valid only if the Similarity Type parameter is set to asymcosine.

Method 2: Configure the parameters by using PAI commands

Configure the component parameters by using PAI commands. The following section describes the parameters. You can use SQL scripts to call PAI commands. For more information, see SQL Script.

PAI -name pai_etrec
    -project algo_public
    -DsimilarityType="wbcosine"
    -Dweight="1"
    -DminUserBehavior="2"
    -Dlifecycle="28"
    -DtopN="2000"
    -Dalpha="0.5"
    -DoutputTableName="etrec_test_result"
    -DmaxUserBehavior="500"
    -DinputTableName="etrec_test_input"
    -Doperator="add"
    -DuserColName="user"
    -DitemColName="item"

Parameter

Required

Description

Default value

inputTableName

Yes

The name of the input table.

N/A

userColName

Yes

The name of the user column in the input table.

None

itemColName

Yes

The name of the item column in the input table.

None

inputTablePartitions

No

The partitions that are selected from the input table for training.

Full table

outputTableName

Yes

The name of the output table.

None

outputTablePartition

No

The partitions in the output table.

None

similarityType

No

The type of similarity. Valid values: wbcosine, asymcosine, and jaccard.

wbcosine

topN

No

The number of items with the largest similarity that can be reserved in the output table. The number of trees. Valid values: 1 to 10000.

2000

minUserBehavior

No

The minimum number of user behavior records.

2

maxUserBehavior

No

The maximum number of user behavior records.

500

itemDelimiter

No

The delimiter that is used to separate items in the output table.

Backspace

kvDelimiter

No

The delimiter that is used to separatekeys and values in the output table.

Colons (:)

alpha

No

The smoothing factor when the similarityType parameter is set to asymcosine. Valid values: (0,1).

0.5

weight

No

The weight index when the similarityType parameter is set to asymcosine.

1.0

operator

No

The method used to calculate the payload when an item of a user appears multiple times. Valid values: add, mul, min, and max.

add

lifecycle

No

The lifecycle of the output table.

1

coreNum

No

The number of cores.

Determined by the system

memSizePerCore

No

The memory size of each core. Unit: MB.

Determined by the system

Examples

  1. Execute the following SQL statements to generate training data:

    drop table if exists etrec_test_input;
    create table etrec_test_input
    as
    select
        *
    from
    (
        select
            cast(0 as string) as user,
            cast(0 as string) as item
        from dual
        union all
            select
                cast(0 as string) as user,
                cast(1 as string) as item
            from dual
        union all
            select
                cast(1 as string) as user,
                cast(0 as string) as item
            from dual
        union all
            select
                cast(1 as string) as user,
                cast(1 as string) as item
            from dual
    ) a;

    A training data table named etrec_test_input is generated.

    user

    item

    0

    0

    0

    1

    1

    0

    1

    1

  2. Run the following PAI command to submit training parameters:

    drop table if exists etrec_test_result;
    PAI -name pai_etrec
        -project algo_public
        -DsimilarityType="wbcosine"
        -Dweight="1"
        -DminUserBehavior="2"
        -Dlifecycle="28"
        -DtopN="2000"
        -Dalpha="0.5"
        -DoutputTableName="etrec_test_result"
        -DmaxUserBehavior="500"
        -DinputTableName="etrec_test_input"
        -Doperator="add"
        -DuserColName="user"
        -DitemColName="item";
  3. View the result output table named etrec_test_result.

    itemid

    similarity

    0

    1:1

    1

    0:1