The Collaborative Filtering (etrec) component uses etrec which is a collaborative filtering algorithm based on items. It uses two input columns and provides the top N items with the highest similarity as the output.

Configure the component

You can configure the component by using one of the following methods:
  • Use the Machine Learning Platform for AI console
    Tab Parameter Description
    Fields Setting User Column The name of the user column.
    Item Column The name of the item column.
    Parameters Setting Similarity Type The type of similarity. Valid values: WbCosine, asymcosine, and Jaccard.
    TopN The maximum number of similar items that can be reserved in the output table.
    Calculation Method The method used to calculate the payload when an item of a user appears multiple times. Valid values: Add, Mul, Min, and Max.
    Note This parameter is about to be phased out. It does not affect the current training result.
    Minimum Item Value If the number of items of a user is less than the value of this parameter, the behavior of the user is ignored.
    Maximum Items If the number of items of a user is larger than the value of this parameter, the behavior of the user is ignored.
    Smoothing Factor This parameter is valid only if the Similarity Type parameter is set to asymcosine.
    Weighting Coefficient This parameter is valid only if the Similarity Type is set to asymcosine.
  • Use commands
    PAI -name pai_etrec
        -project algo_public
        -DsimilarityType="wbcosine"
        -Dweight="1"
        -DminUserBehavior="2"
        -Dlifecycle="28"
        -DtopN="2000"
        -Dalpha="0.5"
        -DoutputTableName="etrec_test_result"
        -DmaxUserBehavior="500"
        -DinputTableName="etrec_test_input"
        -Doperator="add"
        -DuserColName="user"
        -DitemColName="item"
    Parameter Required Description Default value
    inputTableName Yes The name of the input table. N/A
    userColName Yes The name of the user column in the input table. N/A
    itemColName Yes The name of the item column in the input table. N/A
    inputTablePartitions No The partitions that are selected from the input table for training. Full table
    outputTableName Yes The name of the output table. N/A
    outputTablePartition No The partitions in the output table. N/A
    similarityType No The type of similarity. Valid values: wbcosine, asymcosine, and jaccard. wbcosine
    topN No The number of items with the largest similarity that can be reserved in the output table. Valid values: 1 to 10000. 2000
    minUserBehavior No The minimum number of user behavior records. 2
    maxUserBehavior No The maximum number of user behavior records. 500
    itemDelimiter No The delimiter that is used to separate items in the output table. Space
    kvDelimiter No The delimiter that is used to separate keys and values in the output table. :
    alpha No The smoothing factor when the similarityType parameter is set to asymcosine. 0.5
    weight No The weight index when the similarityType parameter is set to asymcosine. 1.0
    operator No The method used to calculate the payload when an item of a user appears multiple times. Valid values: add, mul, min, and max. add
    lifecycle No The lifecycle of the output table. 1
    coreNum No The number of cores. Determined by the system
    memSizePerCore No The memory size of each core. Determined by the system

Example

  1. Execute the following SQL statements to generate training data:
    drop table if exists etrec_test_input;
    create table etrec_test_input
    as
    select
        *
    from
    (
        select
            cast(0 as string) as user,
            cast(0 as string) as item
        from dual
        union all
            select
                cast(0 as string) as user,
                cast(1 as string) as item
            from dual
        union all
            select
                cast(1 as string) as user,
                cast(0 as string) as item
            from dual
        union all
            select
                cast(1 as string) as user,
                cast(1 as string) as item
            from dual
    ) a;
    The training data table etrec_test_input is generated.
    user item
    0 0
    0 1
    1 0
    1 1
  2. Run the following PAI command to submit training parameters:
    drop table if exists etrec_test_result;
    PAI -name pai_etrec
        -project algo_public
        -DsimilarityType="wbcosine"
        -Dweight="1"
        -DminUserBehavior="2"
        -Dlifecycle="28"
        -DtopN="2000"
        -Dalpha="0.5"
        -DoutputTableName="etrec_test_result"
        -DmaxUserBehavior="500"
        -DinputTableName="etrec_test_input"
        -Doperator="add"
        -DuserColName="user"
        -DitemColName="item";
  3. View the result output table etrec_test_result.
    itemid similarity
    0 1:1
    1 0:1