The Collaborative Filtering (etrec) component uses etrec which is a collaborative filtering algorithm based on items. It uses two input columns and provides the top N items with the highest similarity as the output.
Configure the component
You can configure the component by using one of the following methods:
- Use the Machine Learning Platform for AI console
Tab Parameter Description Fields Setting User Column The name of the user column. Item Column The name of the item column. Parameters Setting Similarity Type The type of similarity. Valid values: WbCosine, asymcosine, and Jaccard. TopN The maximum number of similar items that can be reserved in the output table. Calculation Method The method used to calculate the payload when an item of a user appears multiple times. Valid values: Add, Mul, Min, and Max. Note This parameter is about to be phased out. It does not affect the current training result.Minimum Item Value If the number of items of a user is less than the value of this parameter, the behavior of the user is ignored. Maximum Items If the number of items of a user is larger than the value of this parameter, the behavior of the user is ignored. Smoothing Factor This parameter is valid only if the Similarity Type parameter is set to asymcosine. Weighting Coefficient This parameter is valid only if the Similarity Type is set to asymcosine. - Use commands
PAI -name pai_etrec -project algo_public -DsimilarityType="wbcosine" -Dweight="1" -DminUserBehavior="2" -Dlifecycle="28" -DtopN="2000" -Dalpha="0.5" -DoutputTableName="etrec_test_result" -DmaxUserBehavior="500" -DinputTableName="etrec_test_input" -Doperator="add" -DuserColName="user" -DitemColName="item"
Parameter Required Description Default value inputTableName Yes The name of the input table. N/A userColName Yes The name of the user column in the input table. N/A itemColName Yes The name of the item column in the input table. N/A inputTablePartitions No The partitions that are selected from the input table for training. Full table outputTableName Yes The name of the output table. N/A outputTablePartition No The partitions in the output table. N/A similarityType No The type of similarity. Valid values: wbcosine, asymcosine, and jaccard. wbcosine topN No The number of items with the largest similarity that can be reserved in the output table. Valid values: 1 to 10000. 2000 minUserBehavior No The minimum number of user behavior records. 2 maxUserBehavior No The maximum number of user behavior records. 500 itemDelimiter No The delimiter that is used to separate items in the output table. Space kvDelimiter No The delimiter that is used to separate keys and values in the output table. : alpha No The smoothing factor when the similarityType parameter is set to asymcosine. 0.5 weight No The weight index when the similarityType parameter is set to asymcosine. 1.0 operator No The method used to calculate the payload when an item of a user appears multiple times. Valid values: add, mul, min, and max. add lifecycle No The lifecycle of the output table. 1 coreNum No The number of cores. Determined by the system memSizePerCore No The memory size of each core. Determined by the system
Example
- Execute the following SQL statements to generate training data:
The training data table etrec_test_input is generated.drop table if exists etrec_test_input; create table etrec_test_input as select * from ( select cast(0 as string) as user, cast(0 as string) as item from dual union all select cast(0 as string) as user, cast(1 as string) as item from dual union all select cast(1 as string) as user, cast(0 as string) as item from dual union all select cast(1 as string) as user, cast(1 as string) as item from dual ) a;
user item 0 0 0 1 1 0 1 1 - Run the following PAI command to submit training parameters:
drop table if exists etrec_test_result; PAI -name pai_etrec -project algo_public -DsimilarityType="wbcosine" -Dweight="1" -DminUserBehavior="2" -Dlifecycle="28" -DtopN="2000" -Dalpha="0.5" -DoutputTableName="etrec_test_result" -DmaxUserBehavior="500" -DinputTableName="etrec_test_input" -Doperator="add" -DuserColName="user" -DitemColName="item";
- View the result output table etrec_test_result.
itemid similarity 0 1:1 1 0:1