AnalyticDB for MySQL provides path analysis functions, such as SEQUENCE_MATCH and SEQUENCE_COUNT, to efficiently process path computation tasks. These functions match and count event sequences. You can then use the analysis results to detect and analyze complex event sequences. This topic describes how to use these path analysis functions.
AnalyticDB for MySQL supports the following path analysis functions:
SEQUENCE_MATCH(): checks whether a sequence of events occurs in a specific order.
SEQUENCE_COUNT(): counts the number of occurrences of a specific sequence of events.
Prerequisites
An AnalyticDB for MySQL cluster of V3.2.1.0 or later is created.
To view and update the minor version of an AnalyticDB for MySQL cluster, log on to the AnalyticDB for MySQL console and go to the Configuration Information section of the Cluster Information page.
Test dataset
The following example shows how to prepare a dataset for testing the path analysis functions:
Create an AnalyticDB for MySQL test table.
CREATE TABLE sequence_test( `uid` INT COMMENT 'user id', `ts` DATETIME COMMENT 'date time', `number` INT NULL COMMENT 'number' );Insert data into the test table.
INSERT INTO sequence_test VALUES (1, '2022-11-02 10:41:00', 1); INSERT INTO sequence_test VALUES (2, '2022-11-02 13:28:02', 2); INSERT INTO sequence_test VALUES (3, '2022-11-02 16:15:01', 1); INSERT INTO sequence_test VALUES (4, '2022-11-02 19:05:04', 2); INSERT INTO sequence_test VALUES (5, '2022-11-02 20:08:44', 3);
SEQUENCE_MATCH
The SEQUENCE_MATCH() function checks whether a sequence of events occurs in a specific order. This function matches a sequence of events against the specified regular expression and returns a Boolean value. A value of 1 indicates that the sequence of events matches the specified regular expression. A value of 0 indicates that the sequence of events does not match the specified regular expression.
Syntax
SEQUENCE_MATCH(pattern, timestamp, cond1, cond2, ...)Parameters
Parameter | Description |
pattern | The regular expression used to match the order of event occurrences. The supported syntax for pattern is
|
timestamp | The timestamp field of the event, which indicates when the event occurred. This field supports the BIGINT, DATETIME, TIMESTAMP, and DATE data types. |
cond | The constraints for an event. Each constraint is typically a comparison operation on a field. The data type is BOOLEAN. You can pass up to 32 cond constraints. Note The function first filters for events that meet the cond constraints, and then matches them against the pattern. |
Examples
This example filters an event sequence for three types of events:
number=1,number=2, andnumber=3. It then checks if an event wherenumber=1is immediately followed by an event wherenumber=3.Sample statement:
SELECT SEQUENCE_MATCH('(?1)(?3)', ts, number = 1, number = 2, number = 3) FROM sequence_test;Sample result:
+-------------------------------------------------------------------+ | sequence_match('(?1)(?3)', ts, number = 1, number = 2, number = 3)| +-------------------------------------------------------------------+ | 0 | +-------------------------------------------------------------------+This example filters an event sequence for three types of events:
number=1,number=2, andnumber=3. It then checks if an event wherenumber=3occurs after an event wherenumber=1, with any number of other events allowed in between.Sample statement:
SELECT SEQUENCE_MATCH('(?1).*(?3)', ts, number = 1, number = 2, number = 3) FROM sequence_test;Sample result:
+----------------------------------------------------------------------+ | sequence_match('(?1).*(?3)', ts, number = 1, number = 2, number = 3) | +----------------------------------------------------------------------+ | 1 | +----------------------------------------------------------------------+This example filters an event sequence for two types of events:
number=1andnumber=2. It then checks if an event wherenumber=1is immediately followed by an event wherenumber=2, and the time difference between them is 3,600 ms or greater.Sample statement:
SELECT SEQUENCE_MATCH('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) FROM sequence_test;Sample result:
+------------------------------------------------------------------+ | sequence_match('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) | +------------------------------------------------------------------+ | 1 | +------------------------------------------------------------------+
SEQUENCE_COUNT
The SEQUENCE_COUNT() function counts the number of occurrences of a specific sequence of events.
Syntax
SEQUENCE_COUNT(pattern, timestamp, cond1, cond2, ...)Parameters
Parameter | Description |
pattern | The regular expression used to match the order of event occurrences. The supported syntax for pattern is
|
timestamp | The timestamp field of the event, which indicates when the event occurred. This field supports the BIGINT, DATETIME, TIMESTAMP, and DATE data types. |
cond | The constraints for an event. Each constraint is typically a comparison operation on a field. The data type is BOOLEAN. You can pass up to 32 cond constraints. Note The function first filters for events that meet the cond constraints, and then matches them against the pattern. |
Examples
This example filters an event sequence for three types of events:
number=1,number=2, andnumber=3. It then counts the number of times an event wherenumber=1is immediately followed by an event wherenumber=3.Sample statement:
SELECT SEQUENCE_COUNT('(?1)(?3)', ts, number = 1, number = 2, number = 3) FROM sequence_test;Sample result:
+--------------------------------------------------------------------+ | sequence_count('(?1)(?3)', ts, number = 1, number = 2, number = 3) | +--------------------------------------------------------------------+ | 0 | +--------------------------------------------------------------------+This example filters an event sequence for three types of events:
number=1,number=2, andnumber=3. It then counts the number of times an event wherenumber=3occurs after an event wherenumber=1, with any number of other events allowed in between.Sample statement:
SELECT SEQUENCE_COUNT('(?1).*(?3)', ts, number = 1, number = 2, number = 3) FROM sequence_test;Sample result:
+----------------------------------------------------------------------+ | sequence_count('(?1).*(?3)', ts, number = 1, number = 2, number = 3) | +----------------------------------------------------------------------+ | 1 | +----------------------------------------------------------------------+This example filters an event sequence for two types of events:
number=1andnumber=2. It then counts the number of times an event wherenumber=1is immediately followed by an event wherenumber=2, and the time difference between them is 3,600 ms or greater.Sample statement:
SELECT SEQUENCE_COUNT('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) FROM sequence_test;Sample result:
+------------------------------------------------------------------+ | sequence_count('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) | +------------------------------------------------------------------+ | 2 | +------------------------------------------------------------------+