All Products
Search
Document Center

AnalyticDB:Path analysis functions

Last Updated:Mar 28, 2026

AnalyticDB for MySQL provides two path analysis functions—SEQUENCE_MATCH() and SEQUENCE_COUNT()—for detecting and quantifying ordered event sequences in your data.

  • SEQUENCE_MATCH(): Returns 1 if a matching event sequence exists, or 0 if it does not.

  • SEQUENCE_COUNT(): Returns the number of times a matching event sequence occurs.

Prerequisites

Before you begin, ensure that you have:

  • An AnalyticDB for MySQL cluster running V3.2.1.0 or later

Note

To check or update the cluster version, go to the AnalyticDB for MySQL console, open the Cluster Information page, and find the Configuration Information section. For instructions, see View and update the minor version.

Test dataset

The examples in this topic use the following table and data.

  1. Create the test table.

    CREATE TABLE sequence_test(
      `uid`    INT      COMMENT 'user id',
      `ts`     DATETIME COMMENT 'date time',
      `number` INT NULL COMMENT 'event number'
    );
  2. Insert sample rows.

    INSERT INTO sequence_test VALUES (1, '2022-11-02 10:41:00', 1);
    INSERT INTO sequence_test VALUES (2, '2022-11-02 13:28:02', 2);
    INSERT INTO sequence_test VALUES (3, '2022-11-02 16:15:01', 1);
    INSERT INTO sequence_test VALUES (4, '2022-11-02 19:05:04', 2);
    INSERT INTO sequence_test VALUES (5, '2022-11-02 20:08:44', 3);

Pattern syntax

Both SEQUENCE_MATCH() and SEQUENCE_COUNT() use the same pattern syntax to describe the event order to match.

ElementDescription
(?n)Matches the event that satisfies the nth condition (cond). The value of n starts at 1 and cannot exceed the total number of cond arguments.
.*Allows any number of other events to occur between two specified events.
(?t operator value)Requires the time difference between two adjacent events to satisfy the specified condition. See the table below for details.

(?t operator value) details:

ComponentDescription
tA reserved keyword that refers to the timestamp field.
operatorSupported operators: >=, >, <, <=, ==, <>, !=
valueThe time threshold. Accepts floating-point, positive, and negative numbers. The time unit depends on the data type of the timestamp argument: DATETIME or TIMESTAMP uses milliseconds (ms); DATE uses days (d); BIGINT depends on the event time unit.
Note

The function first filters for events that meet the cond constraints, then matches the remaining events against the pattern.

SEQUENCE_MATCH

SEQUENCE_MATCH() checks whether a specific event sequence appears in the data. It returns 1 if the pattern matches, or 0 if it does not.

Syntax

SEQUENCE_MATCH(pattern, timestamp, cond1, cond2, ...)

Parameters

ParameterDescription
patternThe pattern to match. See Pattern syntax.
timestampThe column that records when each event occurred. Supported data types: BIGINT, DATETIME, TIMESTAMP, DATE.
condA Boolean condition that identifies an event type. Each condition is typically a column comparison (for example, number = 1). Up to 32 conditions are supported.

Returned values

Returns a Boolean value of type BIGINT:

  • 1: the pattern matched

  • 0: the pattern did not match

Examples

Example 1: Immediate sequence (no match)

Check whether an event where number=1 is immediately followed by an event where number=3, with no other events in between.

SELECT SEQUENCE_MATCH('(?1)(?3)', ts, number = 1, number = 2, number = 3)
FROM sequence_test;

Result:

+-------------------------------------------------------------------+
| sequence_match('(?1)(?3)', ts, number = 1, number = 2, number = 3)|
+-------------------------------------------------------------------+
|                                0                                  |
+-------------------------------------------------------------------+

The result is 0 because the function only evaluates events that match one of the three conditions (number=1, number=2, number=3). In the dataset, a number=2 event occurs between the number=1 and number=3 events, so the pattern (?1)(?3) (immediate sequence) does not match. To match this sequence regardless of intermediate events, use (?1).*(?3).

Example 2: Non-contiguous sequence with .* (match)

Check whether a number=3 event occurs anywhere after a number=1 event, with any events in between.

SELECT SEQUENCE_MATCH('(?1).*(?3)', ts, number = 1, number = 2, number = 3)
FROM sequence_test;

Result:

+----------------------------------------------------------------------+
| sequence_match('(?1).*(?3)', ts, number = 1, number = 2, number = 3) |
+----------------------------------------------------------------------+
|                                  1                                   |
+----------------------------------------------------------------------+

The result is 1. The .* element allows the number=2 event to appear between number=1 and number=3.

Example 3: Time-constrained sequence (match)

Check whether a number=1 event is immediately followed by a number=2 event with a time gap of at least 3,600 ms (3.6 seconds).

SELECT SEQUENCE_MATCH('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2)
FROM sequence_test;

Result:

+------------------------------------------------------------------+
| sequence_match('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) |
+------------------------------------------------------------------+
|                                1                                 |
+------------------------------------------------------------------+

The result is 1. The gap between the first number=1 event (10:41:00) and the first number=2 event (13:28:02) exceeds 3,600 ms.

SEQUENCE_COUNT

SEQUENCE_COUNT() counts the number of times a specific event sequence occurs in the data.

Syntax

SEQUENCE_COUNT(pattern, timestamp, cond1, cond2, ...)

Parameters

ParameterDescription
patternThe pattern to match. See Pattern syntax.
timestampThe column that records when each event occurred. Supported data types: BIGINT, DATETIME, TIMESTAMP, DATE.
condA Boolean condition that identifies an event type. Each condition is typically a column comparison (for example, number = 1). Up to 32 conditions are supported.

Returned values

Returns the count of matching sequences as a BIGINT.

Examples

Example 1: Immediate sequence count (zero)

Count how many times a number=1 event is immediately followed by a number=3 event.

SELECT SEQUENCE_COUNT('(?1)(?3)', ts, number = 1, number = 2, number = 3)
FROM sequence_test;

Result:

+--------------------------------------------------------------------+
| sequence_count('(?1)(?3)', ts, number = 1, number = 2, number = 3) |
+--------------------------------------------------------------------+
|                                0                                   |
+--------------------------------------------------------------------+

The result is 0 for the same reason as in SEQUENCE_MATCH Example 1: a number=2 event always appears between number=1 and number=3 in this dataset.

Example 2: Non-contiguous sequence count

Count how many times a number=3 event occurs after a number=1 event, with any events allowed in between.

SELECT SEQUENCE_COUNT('(?1).*(?3)', ts, number = 1, number = 2, number = 3)
FROM sequence_test;

Result:

+----------------------------------------------------------------------+
| sequence_count('(?1).*(?3)', ts, number = 1, number = 2, number = 3) |
+----------------------------------------------------------------------+
|                                  1                                   |
+----------------------------------------------------------------------+

Example 3: Time-constrained sequence count

Count how many times a number=1 event is immediately followed by a number=2 event with a time gap of at least 3,600 ms (3.6 seconds).

SELECT SEQUENCE_COUNT('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2)
FROM sequence_test;

Result:

+------------------------------------------------------------------+
| sequence_count('(?1)(?t>=3600)(?2)', ts, number = 1, number = 2) |
+------------------------------------------------------------------+
|                              2                                   |
+------------------------------------------------------------------+

The result is 2. The dataset contains two number=1 events (at 10:41:00 and 16:15:01), each followed by a number=2 event with a gap exceeding 3,600 ms.