AnalyticDB for MySQL provides two path analysis functions—SEQUENCE_MATCH() and SEQUENCE_COUNT()—for detecting and quantifying ordered event sequences in your data.
SEQUENCE_MATCH(): Returns1if a matching event sequence exists, or0if it does not.SEQUENCE_COUNT(): Returns the number of times a matching event sequence occurs.
Prerequisites
Before you begin, ensure that you have:
An AnalyticDB for MySQL cluster running V3.2.1.0 or later
To check or update the cluster version, go to the AnalyticDB for MySQL console, open the Cluster Information page, and find the Configuration Information section. For instructions, see View and update the minor version.
Test dataset
The examples in this topic use the following table and data.
Create the test table.
CREATE TABLE sequence_test( `uid` INT COMMENT 'user id', `ts` DATETIME COMMENT 'date time', `number` INT NULL COMMENT 'event number' );Insert sample rows.
INSERT INTO sequence_test VALUES (1, '2022-11-02 10:41:00', 1); INSERT INTO sequence_test VALUES (2, '2022-11-02 13:28:02', 2); INSERT INTO sequence_test VALUES (3, '2022-11-02 16:15:01', 1); INSERT INTO sequence_test VALUES (4, '2022-11-02 19:05:04', 2); INSERT INTO sequence_test VALUES (5, '2022-11-02 20:08:44', 3);
Pattern syntax
Both SEQUENCE_MATCH() and SEQUENCE_COUNT() use the same pattern syntax to describe the event order to match.
| Element | Description |
|---|---|
(?n) | Matches the event that satisfies the nth condition (cond). The value of n starts at 1 and cannot exceed the total number of cond arguments. |
.* | Allows any number of other events to occur between two specified events. |
(?t operator value) | Requires the time difference between two adjacent events to satisfy the specified condition. See the table below for details. |
(?t operator value) details:
| Component | Description |
|---|---|
t | A reserved keyword that refers to the timestamp field. |
operator | Supported operators: >=, >, <, <=, ==, <>, != |
value | The time threshold. Accepts floating-point, positive, and negative numbers. The time unit depends on the data type of the timestamp argument: DATETIME or TIMESTAMP uses milliseconds (ms); DATE uses days (d); BIGINT depends on the event time unit. |
The function first filters for events that meet the cond constraints, then matches the remaining events against the pattern.
SEQUENCE_MATCH
SEQUENCE_MATCH() checks whether a specific event sequence appears in the data. It returns 1 if the pattern matches, or 0 if it does not.
Syntax
SEQUENCE_MATCH(pattern, timestamp, cond1, cond2, ...)Parameters
| Parameter | Description |
|---|---|
pattern | The pattern to match. See Pattern syntax. |
timestamp | The column that records when each event occurred. Supported data types: BIGINT, DATETIME, TIMESTAMP, DATE. |
cond | A Boolean condition that identifies an event type. Each condition is typically a column comparison (for example, number = 1). Up to 32 conditions are supported. |
Returned values
Returns a Boolean value of type BIGINT:
1: the pattern matched0: the pattern did not match
Examples
Example 1: Immediate sequence (no match)
Check whether an event where number=1 is immediately followed by an event where number=3, with no other events in between.
SELECT SEQUENCE_MATCH('(?1)(?3)', ts, number = 1, number = 2, number = 3)
FROM sequence_test;Result:
+-------------------------------------------------------------------+
| sequence_match('(?1)(?3)', ts, number = 1, number = 2, number = 3)|
+-------------------------------------------------------------------+
| 0 |
+-------------------------------------------------------------------+The result is 0 because the function only evaluates events that match one of the three conditions (number=1, number=2, number=3). In the dataset, a number=2 event occurs between the number=1 and number=3 events, so the pattern (?1)(?3) (immediate sequence) does not match. To match this sequence regardless of intermediate events, use (?1).*(?3).
Example 2: Non-contiguous sequence with .* (match)
Check whether a number=3 event occurs anywhere after a number=1 event, with any events in between.
SELECT SEQUENCE_MATCH('(?1).*(?3)', ts, number = 1, number = 2, number = 3)
FROM sequence_test;Result:
+----------------------------------------------------------------------+
| sequence_match('(?1).*(?3)', ts, number = 1, number = 2, number = 3) |
+----------------------------------------------------------------------+
| 1 |
+----------------------------------------------------------------------+The result is 1. The .* element allows the number=2 event to appear between number=1 and number=3.
Example 3: Time-constrained sequence (match)
Check whether a number=1 event is immediately followed by a number=2 event with a time gap of at least 3,600 ms (3.6 seconds).
SELECT SEQUENCE_MATCH('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2)
FROM sequence_test;Result:
+------------------------------------------------------------------+
| sequence_match('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) |
+------------------------------------------------------------------+
| 1 |
+------------------------------------------------------------------+The result is 1. The gap between the first number=1 event (10:41:00) and the first number=2 event (13:28:02) exceeds 3,600 ms.
SEQUENCE_COUNT
SEQUENCE_COUNT() counts the number of times a specific event sequence occurs in the data.
Syntax
SEQUENCE_COUNT(pattern, timestamp, cond1, cond2, ...)Parameters
| Parameter | Description |
|---|---|
pattern | The pattern to match. See Pattern syntax. |
timestamp | The column that records when each event occurred. Supported data types: BIGINT, DATETIME, TIMESTAMP, DATE. |
cond | A Boolean condition that identifies an event type. Each condition is typically a column comparison (for example, number = 1). Up to 32 conditions are supported. |
Returned values
Returns the count of matching sequences as a BIGINT.
Examples
Example 1: Immediate sequence count (zero)
Count how many times a number=1 event is immediately followed by a number=3 event.
SELECT SEQUENCE_COUNT('(?1)(?3)', ts, number = 1, number = 2, number = 3)
FROM sequence_test;Result:
+--------------------------------------------------------------------+
| sequence_count('(?1)(?3)', ts, number = 1, number = 2, number = 3) |
+--------------------------------------------------------------------+
| 0 |
+--------------------------------------------------------------------+The result is 0 for the same reason as in SEQUENCE_MATCH Example 1: a number=2 event always appears between number=1 and number=3 in this dataset.
Example 2: Non-contiguous sequence count
Count how many times a number=3 event occurs after a number=1 event, with any events allowed in between.
SELECT SEQUENCE_COUNT('(?1).*(?3)', ts, number = 1, number = 2, number = 3)
FROM sequence_test;Result:
+----------------------------------------------------------------------+
| sequence_count('(?1).*(?3)', ts, number = 1, number = 2, number = 3) |
+----------------------------------------------------------------------+
| 1 |
+----------------------------------------------------------------------+Example 3: Time-constrained sequence count
Count how many times a number=1 event is immediately followed by a number=2 event with a time gap of at least 3,600 ms (3.6 seconds).
SELECT SEQUENCE_COUNT('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2)
FROM sequence_test;Result:
+------------------------------------------------------------------+
| sequence_count('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) |
+------------------------------------------------------------------+
| 2 |
+------------------------------------------------------------------+The result is 2. The dataset contains two number=1 events (at 10:41:00 and 16:15:01), each followed by a number=2 event with a gap exceeding 3,600 ms.