全部產品
Search
文件中心

Simple Log Service:頻繁模式挖掘函數

更新時間:Sep 24, 2025

頻繁模式挖掘函數通過分析多維資料,提取顯著差異的屬性群組合并量化其影響,支援多種參數配置以最佳化挖掘結果。

get_patterns

get_patterns 是頻繁集挖掘的運算元,除了挖掘頻繁項之外,get_patterns還會對挖掘到的頻繁項做合并和去重。專指提取表格式資料的模板(頻繁集)。

文法

get_patterns($TABLE, $HEADER, $PARAM)

參數說明

參數

資料類型

是否必選

說明

$TABLE

row<array<T>, array<E>, ..., array<F>>

待挖掘頻繁項的輸入資料的表格,每一列是一個待挖掘的維度列。

$HEADER

array<varchar>

列名,和$TABLE對應,header名字數和table列的數量一致。

$PARAM

varchar

參見param參數說明

param參數說明

參數名

參數解釋

參數類型

是否必填

預設值

取值範圍

minimum_support_fraction

輸出的pattern在測試組的最低的支援度。比如某個pattern在測試組出現的頻率是0.1,那麼我們說這個pattern的支援度是0.1。這個參數可以通過控制pattern的敏感度控制pattern的數量。

double

0.05

(0, 1)

樣本

  • 查詢分析:

    關於"set session enable_remote_functions=true ", 目前功能處於公測階段,需手動添加該 flag。後續版本將移除此要求,實現自動化支援。
    (*)| set session enable_remote_functions=true ;
    with t0 as (select  JSON_EXTRACT_SCALAR(entity, '$.platform') AS platform,  JSON_EXTRACT_SCALAR(entity, '$.region') AS region, cast(value as double) as value, if((value > 100), 'true', 'false') as anomaly_label from log), 
    t1 as ( select array_agg(platform) as platform, array_agg(region) as region, array_agg(anomaly_label) as anomaly_label, array_agg(value) as value from t0),
    t2 as (select row(platform, region) as table_row from t1),
    t3 as (select get_patterns(table_row, ARRAY['platform', 'region']) as ret from t2)
    select * from t3 
  • 輸出結果:

    [["platform=eBay","platform=edX","platform=Amazon","platform=Skillshare","platform=Shopify","platform=Khan Academy","platform=Coursera","platform=Udemy","platform=Alibaba","platform=Taobao","platform=Snapchat","platform=Amazon Prime","platform=YouTube","platform=Hulu","platform=Peloton","platform=Twitter","platform=Fitbit","platform=Nike Training","platform=LinkedIn","platform=Instagram","platform=Disney+","platform=Strava","platform=MyFitnessPal","platform=Facebook","platform=Netflix","platform=Console","platform=Samsung SmartThings","platform=Apple HomeKit","platform=Mobile","platform=PC","platform=Google Home","platform=VR"],[156960,149760,148320,148320,146880,145440,139680,136800,133920,133920,96480,95040,92160,90720,90720,89280,89280,87840,84960,83520,83520,82080,82080,77760,70560,46080,41760,41760,34560,33120,31680,30240],null,null]

返回結果說明

參數

類型

說明

對應樣本

$RET.patterns

array<varchar>

表格模板,頻繁集。每個varchar是一個用 AND 串連的運算式,類似"\"platform\"='Netflix' AND \"region\"='Asia'",不同的varchar之間是並列的關係。

  • ["platform=Coursera","platform=Udemy","platform=Khan Academy","platform=","platform=Shopify","platform=Skillshare","platform=edX","platform=eBay","platform=Console","platform=Square","platform=Taobao","platform=Google Meet","platform=E*TRADE","platform=Skype","platform=PayPal","platform=Robinhood","platform=Microsoft Teams","platform=Webex","platform=Zoom","platform=Mobile","platform=Alibaba","platform=VR","platform=Stripe","platform=PC","platform=Amazon","platform=Snapchat","platform=Instagram","platform=Twitter","platform=LinkedIn","platform=Strava","platform=Facebook","platform=Apple HomeKit","platform=Google Home","platform=Amazon Prime"]

$RET.test_supports

array<bigint>

每一個模板出現的頻次。

[79200,74880,74880,72000,67680,66240,64800,61920,61920,61920,60480,60480,59040,59040,57600,57600,57600,56160,54720,53280,51840,51840,51840,48960,46080,37440,33120,25920,25920,24480,21600,21600,21600,18720]

$RET.labels

array<bigint>

預留的傳回值位置,便於後續自動將資料歸類。目前始終為 null

null

$RET.error_msg

array<varchar>/null

報錯資訊。如果沒有報錯,則傳回值為null

null