全部產品
Search
文件中心

OpenSearch:定製排序模型

更新時間:Jul 13, 2024

通過本篇文檔,使用者可以更好的體驗和瞭解OpenSearch行業演算法版推出的定製排序模型功能。

操作步驟

  1. 在特徵管理中建立如下欄位特徵:(以system_item表為例,如果需要的特徵不在system_item表中,可以先將外部MaxCompute表註冊進來)如title欄位分別建立原值(custom_title), 分詞後產生lookup特徵(custom_title_match),分詞後統計詞數量(custom_title_len),其他欄位類似,可根據業務需要增加。以ctr中使用的欄位為例:

image.png

  1. 結合system_internal表中的內建特徵,以及上步中建立的欄位特徵,下面進行特徵產生(特徵工程):

以目前CTR常用的特徵產生為例:可以通過OpenApi(CreateFunctionResource - 建立演算法資源)大量註冊:

其中ResourceType選擇feature_generatorData填以下內容:(注意每個input features中以custom_開頭的特徵需要提前準備好,如缺失請在第一步中添加)

[
  {
    "input": {
      "features": [
        {
          "type": "user",
          "name": "system_raw_q_ultra"
        },
        {
          "type": "item",
          "name": "system_item_id"
        }
      ]
    },
    "generator": "combo",
    "output": "comb_q_nid"
  },
  {
    "input": {
      "features": [
        {
          "type": "user",
          "name": "system_user_id"
        },
        {
          "type": "item",
          "name": "system_item_id"
        }
      ]
    },
    "generator": "combo",
    "output": "comb_uid_nid"
  },
  {
    "input": {
      "features": [
        {
          "type": "user",
          "name": "system_user_id"
        },
        {
          "type": "item",
          "name": "custom_tags"
        }
      ]
    },
    "generator": "combo",
    "output": "comb_uid_tags"
  },
  {
    "input": {
      "features": [
        {
          "type": "user",
          "name": "system_raw_q_ultra"
        },
        {
          "type": "item",
          "name": "custom_tags"
        }
      ]
    },
    "generator": "combo",
    "output": "comb_q_tags"
  },
  {
    "input": {
      "features": [
        {
          "type": "user",
          "name": "system_exp_time"
        }
      ]
    },
    "generator": "id",
    "output": "exp_time"
  },
  {
    "input": {
      "features": [
        {
          "type": "user",
          "name": "system_terms2"
        }
      ]
    },
    "generator": "id",
    "output": "terms2"
  },
  {
    "input": {
      "features": [
        {
          "type": "user",
          "name": "system_raw_q_ultra"
        }
      ]
    },
    "generator": "id",
    "output": "raw_q_ultra"
  },
  {
    "input": {
      "features": [
        {
          "type": "user",
          "name": "system_user_id"
        }
      ]
    },
    "generator": "id",
    "output": "user_id"
  },
  {
    "input": {
      "features": [
        {
          "type": "item",
          "name": "system_item_id"
        }
      ]
    },
    "generator": "id",
    "output": "item_id"
  },
  {
    "input": {
      "features": [
        {
          "type": "item",
          "name": "custom_description"
        }
      ]
    },
        "generator": "id",
        "output": "description"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "custom_desc_len"
                }
            ]
        },
        "generator": "id",
        "output": "desc_len"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "custom_title"
                }
            ]
        },
        "generator": "id",
        "output": "title"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "custom_title_len"
                }
            ]
        },
        "generator": "id",
        "output": "title_len"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "custom_category"
                }
            ]
        },
        "generator": "id",
        "output": "category"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "custom_tags"
                }
            ]
        },
        "generator": "id",
        "output": "tags"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "system_all_nid_ctr_30"
                }
            ]
        },
        "generator": "id",
        "output": "all_nid_ctr_30"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "system_all_nid_ctr_7"
                }
            ]
        },
        "generator": "id",
        "output": "all_nid_ctr_7"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "system_all_nid_ctr_1"
                }
            ]
        },
        "generator": "id",
        "output": "all_nid_ctr_1"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "system_all_nid_pv_30"
                }
            ]
        },
        "generator": "id",
        "output": "all_nid_pv_30"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "system_all_nid_pv_7"
                }
            ]
        },
        "generator": "id",
        "output": "all_nid_pv_7"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "system_all_nid_pv_1"
                }
            ]
        },
        "generator": "id",
        "output": "all_nid_pv_1"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "system_all_nid_ipv_30"
                }
            ]
        },
        "generator": "id",
        "output": "all_nid_ipv_30"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "system_all_nid_ipv_7"
                }
            ]
        },
        "generator": "id",
        "output": "all_nid_ipv_7"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "system_all_nid_ipv_1"
                }
            ]
        },
        "generator": "id",
        "output": "all_nid_ipv_1"
    },
    {
        "input": {
            "features": [
                {
                    "role": "map",
                    "type": "item",
                    "name": "custom_title_match"
                },
                {
                    "role": "key",
                    "type": "user",
                    "name": "system_terms2"
                }
            ]
        },
        "generator": "lookup",
        "output": "term_title_match"
    },
    {
        "input": {
            "features": [
                {
                    "role": "map",
                    "type": "item",
                    "name": "custom_desc_match"
                },
                {
                    "role": "key",
                    "type": "user",
                    "name": "system_terms2"
                }
            ]
        },
        "generator": "lookup",
        "output": "term_desc_match"
    },
    {
        "input": {
            "features": [
                {
                    "role": "map",
                    "type": "item",
                    "name": "custom_tags_match"
                },
                {
                    "role": "key",
                    "type": "user",
                    "name": "system_terms2"
                }
            ]
        },
        "generator": "lookup",
        "output": "term_tags_match"
    },
    {
        "input": {
            "features": [
                {
                    "role": "map",
                    "type": "item",
                    "name": "system_qterm_match_decay"
                },
                {
                    "role": "key",
                    "type": "user",
                    "name": "system_terms2"
                }
            ]
        },
        "generator": "lookup",
        "output": "term_os_kw_match"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "system_query_cnt"
                }
            ]
        },
        "generator": "id",
        "output": "opensearch_query_cnt"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "system_qterm_cnt"
                }
            ]
        },
        "generator": "id",
        "output": "opensearch_qterm_cnt"
    },
    {
        "input": {
            "features": [
                {
                    "role": "map",
                    "type": "item",
                    "name": "system_query_ctr_decay"
                },
                {
                    "role": "key",
                    "type": "user",
                    "name": "system_raw_q_ultra"
                }
            ]
        },
        "generator": "lookup",
        "output": "os_q_ctr_decay"
    },
    {
        "input": {
            "features": [
                {
                    "role": "map",
                    "type": "item",
                    "name": "system_qterm_ctr_decay"
                },
                {
                    "role": "key",
                    "type": "user",
                    "name": "system_terms2"
                }
            ]
        },
        "generator": "lookup",
        "output": "os_term_ctr_decay"
    },
    {
        "input": {
            "features": [
                {
                    "role": "map",
                    "type": "item",
                    "name": "system_query_ctr_decay"
                },
                {
                    "role": "key",
                    "type": "user",
                    "name": "system_raw_q_ultra"
                }
            ]
        },
        "generator": "lookup",
        "output": "os_q_ctr_decay_nokey"
    },
    {
        "input": {
            "features": [
                {
                    "role": "map",
                    "type": "item",
                    "name": "system_qterm_ctr_decay"
                },
                {
                    "role": "key",
                    "type": "user",
                    "name": "system_terms2"
                }
            ]
        },
        "generator": "lookup",
        "output": "os_term_ctr_decay_nokey"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "system_query_seq_decay"
                }
            ]
        },
        "generator": "id",
        "output": "os_q_seq_decay"
    },
    {
        "input": {
            "features": [
                {
                    "type": "item",
                    "name": "system_qterm_seq_decay"
                }
            ]
        },
        "generator": "id",
        "output": "os_term_seq_decay"
    },
    {
        "input": {
            "features": [
                {
                    "role": "query",
                    "type": "user",
                    "name": "system_terms2"
                },
                {
                    "role": "title",
                    "type": "item",
                    "name": "system_qterm_seq_decay"
                }
            ],
            "method": "query_common_ratio"
        },
        "generator": "overlap",
        "output": "os_qterm_q_common_ratio"
    },
    {
        "input": {
            "features": [
                {
                    "role": "query",
                    "type": "user",
                    "name": "system_terms2"
                },
                {
                    "role": "title",
                    "type": "item",
                    "name": "system_qterm_seq_decay"
                }
            ],
            "method": "title_common_ratio"
        },
        "generator": "overlap",
        "output": "os_qterm_title_common_ratio"
    },
    {
        "input": {
            "features": [
                {
                    "role": "query",
                    "type": "user",
                    "name": "system_terms2"
                },
                {
                    "role": "title",
                    "type": "item",
                    "name": "custom_title"
                }
            ],
            "method": "query_common_ratio"
        },
        "generator": "overlap",
        "output": "title_q_common_ratio"
    },
    {
        "input": {
            "features": [
                {
                    "role": "query",
                    "type": "user",
                    "name": "system_terms2"
                },
                {
                    "role": "title",
                    "type": "item",
                    "name": "custom_title"
                }
            ],
            "method": "title_common_ratio"
        },
        "generator": "overlap",
        "output": "title_title_common_ratio"
    },
    {
        "input": {
            "features": [
                {
                    "role": "query",
                    "type": "user",
                    "name": "system_terms2"
                },
                {
                    "role": "title",
                    "type": "item",
                    "name": "custom_description"
                }
            ],
            "method": "query_common_ratio"
        },
        "generator": "overlap",
        "output": "desc_q_common_ratio"
    },
    {
        "input": {
            "features": [
                {
                    "role": "query",
                    "type": "user",
                    "name": "system_terms2"
                },
                {
                    "role": "title",
                    "type": "item",
                    "name": "custom_description"
                }
            ],
            "method": "title_common_ratio"
        },
        "generator": "overlap",
        "output": "desc_title_common_ratio"
    },
    {
        "input": {
            "features": [
                {
                    "type": "user",
                    "name": "system_term_seq_length"
                }
            ],
            "dimension": 1
        },
        "generator": "raw",
        "output": "term_seq_length"
    }
]

建立完成後,可以在對應頁面上進行編輯:

image.png

至此特徵準備完成,具體如何使用特徵需要在模型代碼中進行指定。

  1. 建立模型描述:

快速入門為基礎,修改需要使用的特徵列表。一般用embedding_colums。

image.png

  1. 在自訂排序模型中指定需要使用的特徵描述和模型描述

image.png

  1. 建立並訓練成功後,同CTR模型,在Cava中使用該排序模型即可(參考:定製排序模型)。上線前可以通過A/B Test驗證模型效果。