The URL classification function groups similar URL request paths, assigns them a common pattern, and generates a corresponding regular expression for each group to simplify URL categorization. You can then use the query results for ETL operations.
The URL classification function is only available in the China (Beijing) and China (Shanghai) regions.
-
Syntax
select url_classify(url_path varchar); select url_classify(url_path varchar, weight long); -
Input parameters
Parameter
Description
url_path
The URL request path.
weight
The weight for classification.
-
Output parameters
Parameter
Description
url_path
The original URL request path.
api_path
The generalized API endpoint pattern.
regex_tpl
A regular expression template that matches the pattern.
-
Output
url_path | api_path | regex_tpl -------------------------------------+------------------------------+------------------------------------- /gl/balance/666398186799140 | /gl/balance/* | \/gl\/balance\/[0-9].+ /gl/glaccount/30579281472076 | /gl/glaccount/* | \/gl\/glaccount\/[0-9].+ /gl/balance/709016207098025 | /gl/balance/* | \/gl\/balance\/[0-9].+ -
Example
-
Query statement
* | select url_classify(uri, num) from (select uri, COUNT(*) as num from log group by uri limit 1000) -
The query returns three columns: url_path (the original request path), api_path (the classified API path pattern, where an asterisk (
*) replaces variable parts), and regex_tpl (the corresponding regular expression template). For example, the original path/v1/task/20200403_064500_63933_w69w5.2.28/results/17/1is classified as/v1/task/*/results/17/1, and its corresponding regular expression template is\/v1\/task\/.*\/results\/17\/1.
-