Frequent pattern mining functions analyze multidimensional data to extract attribute combinations with significant differences and quantify their impact. You can configure parameters to optimize mining results.
get_patterns
get_patterns is a frequent itemset mining operator. It mines frequent items, merges and deduplicates the results, and extracts templates (frequent itemsets) from tabular data.
Syntax
get_patterns($TABLE, $HEADER, $PARAM)
Parameters
|
Parameter |
Data type |
Required |
Description |
|
$TABLE |
row<array<T>, array<E>, ..., array<F>> |
Yes |
The input table for frequent item mining. Each column is a dimension column to be mined. |
|
$HEADER |
array<varchar> |
Yes |
The column names, corresponding to the columns in $TABLE. The number of header names must match the number of columns. |
|
$PARAM |
varchar |
No |
For more information, see PARAM parameter description. |
PARAM parameter description
|
Parameter |
Description |
Parameter type |
Required |
Default value |
Value range |
|
minimum_support_fraction |
The minimum support threshold for output patterns in the test group. For example, a pattern that appears with a frequency of 0.1 has a support of 0.1. Adjust this value to control the number of patterns returned. |
double |
No |
0.05 |
(0, 1) |
Examples
-
Query analysis:
This feature is in public preview. You must manually add the `set session enable_remote_functions=true` statement. This requirement will be removed in a future version.
(*)| set session enable_remote_functions=true ; with t0 as (select JSON_EXTRACT_SCALAR(entity, '$.platform') AS platform, JSON_EXTRACT_SCALAR(entity, '$.region') AS region, cast(value as double) as value, if((value > 100), 'true', 'false') as anomaly_label from log), t1 as ( select array_agg(platform) as platform, array_agg(region) as region, array_agg(anomaly_label) as anomaly_label, array_agg(value) as value from t0), t2 as (select row(platform, region) as table_row from t1), t3 as (select get_patterns(table_row, ARRAY['platform', 'region']) as ret from t2) select * from t3 -
Outputs:
[["platform=eBay","platform=edX","platform=Amazon","platform=Skillshare","platform=Shopify","platform=Khan Academy","platform=Coursera","platform=Udemy","platform=Alibaba","platform=Taobao","platform=Snapchat","platform=Amazon Prime","platform=YouTube","platform=Hulu","platform=Peloton","platform=Twitter","platform=Fitbit","platform=Nike Training","platform=LinkedIn","platform=Instagram","platform=Disney+","platform=Strava","platform=MyFitnessPal","platform=Facebook","platform=Netflix","platform=Console","platform=Samsung SmartThings","platform=Apple HomeKit","platform=Mobile","platform=PC","platform=Google Home","platform=VR"],[156960,149760,148320,148320,146880,145440,139680,136800,133920,133920,96480,95040,92160,90720,90720,89280,89280,87840,84960,83520,83520,82080,82080,77760,70560,46080,41760,41760,34560,33120,31680,30240],null,null]
Return value description
|
Parameter |
Type |
Description |
Example |
|
$RET.patterns |
array<varchar> |
Frequent itemsets extracted as table templates. Each varchar is an expression with conditions connected by AND, such as |
|
|
$RET.test_supports |
array<bigint> |
The occurrence count of each template. |
|
|
$RET.labels |
array<bigint> |
Reserved for future automatic data classification. Currently always returns |
|
|
$RET.error_msg |
array<varchar>/null |
The error message. Returns |
|