As a complex event processing (CEP) statement, MATCH_RECOGNIZE is used to identify events that conform to specified rules from input data streams and generate output events in the specified way.

Syntax

SELECT [ ALL | DISTINCT ]
{ * | projectItem [, projectItem ]* }
FROM tableExpression
[MATCH_RECOGNIZE (
[PARTITION BY {partitionItem [, partitionItem]*}]
[ORDER BY {orderItem [, orderItem]*}]
[MEASURES {measureItem AS col [, measureItem AS col]*}]
[ONE ROW PER MATCH|ALL ROWS PER MATCH|ONE ROW PER MATCH WITH TIMEOUT ROWS|ALL ROWS PER MATCH WITH TIMEOUT ROWS]
[AFTER MATCH SKIP]
PATTERN (patternVariable[quantifier] [ patternVariable[quantifier]]*) WITHIN intervalExpression
DEFINE {patternVariable AS patternDefinationExpression [, patternVariable AS patternDefinationExpression]*}
)];
			
Name Description
PARTITION BY The column used for partitioning. This parameter is optional.
ORDER BY The column used for sorting. You can specify multiple columns. However, the first column used for sorting must be the EVENT TIME or PROCESS TIME column. This parameter is optional.
MEASURES The way to construct an output event based on the input events that are successfully matched.
ONE ROW PER MATCH Specifies that only one output event will be generated upon each successful match.
ONE ROW PER MATCH WITH TIMEOUT ROWS Specifies that an output event will be generated upon each successful match or each timeout. The timeout interval is defined by the WITHIN statement in the PATTERN statement.
ALL ROW PER MATCH Specifies that an output event will be generated for each input event upon each successful match.
ALL ROW PER MATCH WITH TIMEOUT ROWS Specifies that an output event will be generated for each input event upon each successful match or each timeout. The timeout interval is defined by the WITHIN statement in the PATTERN statement.
[ONE ROW PER MATCH|ALL ROWS PER MATCH|ONE ROW PER MATCH WITH TIMEOUT ROWS|ALL ROWS PER MATCH WITH TIMEOUT ROWS] This parameter is optional. The default value is ONE ROW PER MATCH.
AFTER MATCH SKIP TO NEXT ROW Specifies that the next match following a successful match will start from the next event following the first event in the sequence of successfully matched events.
AFTER MATCH SKIP PAST LAST ROW Specifies that the next match following a successful match will start from the next event following the last event in the sequence of successfully matched events.
AFTER MATCH SKIP TO FIRST patternItem Specifies that the next match following a successful match will start from the first event corresponding to patternItem in the sequence of successfully matched events.
AFTER MATCH SKIP TO LAST patternItem Specifies that the next match following a successful match will start from the last event corresponding to patternItem in the sequence of successfully matched events.
PATTERN The rule to which the sequence of events to be identified conforms. The rule is defined in parentheses () and consists of a series of custom patternVariables.
Note
  • If two patternVariables are separated with a space, the events that conform to the patternVariables are next to each other and there are no other events between these events.
  • If two patternVariables are separated with an arrow sign (->), other events may exist between the events that conform to the patternVariables.
  • Quantifier

    The quantifier specifies the number of occurrences of events that meet the patternVariable definition.

    Value Description
    * Zero or multiple times
    + Once or multiple times
    ? Zero or once
    {n} n times
    {n,} Greater than or equal to n times
    {n, m} Greater than or equal to n times, and smaller than or equal to m times
    {,m} Smaller than or equal to m times

    A greedy match is performed by default. For example, if the pattern is A -> B+ and the input is a b1, b2, b3, the output is a b1, a b1 b2, a b1 b2 b3. To perform a non-greedy match, append the quantifier with a question mark (?).

    • *?
    • +?
    • {n}?
    • {n,}?
    • {n, m}?
    • {,m}?

    Then, the output generated for the pattern and input in the preceding example changes to a b1, a b2, a b1 b2, a b3, a b2 b3, a b1 b2 b3.

    Note
    • The WITHIN statement defines the maximum time span of events that conform to the specified rule in a sequence.
    • Static window format: INTERVAL ‘string’ timeUnit [ TO timeUnit ]. Example: INTERVAL ‘10’ SECOND, INTERVAL ‘45’ DAY, INTERVAL ‘10:20’ MINUTE TO SECOND, INTERVAL ‘10:20.10’ MINUTE TO SECOND, INTERVAL ‘10:20’ HOUR TO MINUTE, INTERVAL ‘1-5’ YEAR TO MONTH.
    • Dynamic window format: INTERVAL intervalExpression. Example: INTERVAL A.windowTime + 10, where A indicates the first patternVariable in the pattern definition. The intervalExpression definition can use a patternVariable in the pattern definition. However, only the first patternVariable in the pattern definition can be used currently. In intervalExpression, you can use a UDF. The result of intervalExpression indicates the window size. The result must be of the long type and in the unit of milliseconds.
    • The DEFINE statement defines the meanings of patternVariables in the PATTERN statement. If a patternVariable is not defined in the DEFINE statement, the patternVariable is valid for each event.
  • MEASURES and DEFINE statement functions
    Function Description
    Row Pattern Column References This function is in the format of patternVariable.col. It is used to access the specified column of an event that conforms to patternVariable.
    PREV This function can be used only in the DEFINE statement and generally works with Row Pattern Column References. The PREV function is used to access the specified column of the previous event with a specified offset before the event that conforms to the specified pattern. Example: DOWN AS DOWN.price < PREV(DOWN.price). PREV(A.price) indicates the price column value of the previous event before the current event. Note that DOWN.price is equivalent to PREV(DOWN.price, 0) and PREV(DOWN.price) is equivalent to PREV(DOWN.price, 1).
    FIRST or LAST These functions generally work with Row Pattern Column References. The FIRST or LAST function is used to access the event with a specified offset in the sequence of events that conform to the specified pattern. For example, FIRST(A.price, 3) indicates the fourth event in the sequence of events that conform to pattern A, and LAST(A.price, 3) indicates the last but three event in the sequence of events that conform to pattern A.
  • Output columns
    Function Output column
    ONE ROW PER MATCH The columns specified by PARTITION BY and MEASURES are included. The columns specified by PARTITION BY do not need to be specified in MEASURES again.
    ONE ROW PER MATCH WITH TIMEOUT ROWS An output event will be generated upon a successful match or a timeout. The timeout interval is defined by the WITHIN statement in the PATTERN statement.
    Note
    • When you define the PATTERN statement, we recommend that you also define the WITHIN statement. If the WITHIN statement is not defined, the state size may grow larger.
    • The first column specified by ORDER BY must be the EVENT TIME or PROCESS TIME column.

Example

  • Example syntax
    SELECT *
    FROM Ticker MATCH_RECOGNIZE (
    PARTITION BY symbol
    ORDER BY tstamp
    MEASURES STRT.tstamp AS start_tstamp,
    LAST(DOWN.tstamp) AS bottom_tstamp,
    LAST(UP.tstamp) AS end_tstamp
    ONE ROW PER MATCH
    AFTER MATCH SKIP TO NEXT ROW
    PATTERN (STRT DOWN+ UP+) WITHIN INTERVAL '10' SECOND
    DEFINE
    DOWN AS DOWN.price < PREV(DOWN.price),
    UP AS UP.price > PREV(UP.price)
    ) MR
    ORDER BY MR.symbol, MR.start_tstamp;
    					
  • Test data
    timestamp (TIMESTAMP) card_id (VARCHAR) location (VARCHAR) action (VARCHAR)
    2018-04-13 12:00:00 1 WW Tom
    2018-04-13 12:05:00 1 WW1 Tom
    2018-04-13 12:10:00 1 WW2 Tom
    2018-04-13 12:20:00 1 WW Tom
  • Test case syntax
    CREATE TABLE datahub_stream (
        `timestamp`               TIMESTAMP,
        card_id                   VARCHAR,
        location                  VARCHAR,
        `action`                  VARCHAR,
        WATERMARK wf FOR `timestamp` AS withOffset(`timestamp`, 1000)
    ) WITH (
        type = 'datahub'
        ...
    );
    CREATE TABLE rds_out (
        start_timestamp               TIMESTAMP,
        end_timestamp                 TIMESTAMP,
        card_id                       VARCHAR,
        event                         VARCHAR
    ) WITH (
        type= 'rds'
        ...
    );
    
    -- Case description
    -- When payments with a card with a unique ID (card_id) occur at two different locations within 10 minutes, an alert is triggered. This helps monitor credit card identity fraud.
    
    -- Define the computational logic as follows:
    insert into rds_out
    select 
    `start_timestamp`, 
    `end_timestamp`, 
    card_id, `event`
    from datahub_stream
    MATCH_RECOGNIZE (
        PARTITION BY card_id   -- Partition data by card_id. The data with the same card ID is allocated to the same compute node.
        ORDER BY `timestamp`   -- Sort events by time in a window.
        MEASURES               -- Define how to construct an output event based on the input events that are successfully matched.
            e2.`action` as `event`,                
            e1.`timestamp` as `start_timestamp`,   -- Define the time of the first event as start_timestamp.
            LAST(e2.`timestamp`) as `end_timestamp`-- Define the time of the last event as end_timestamp.
        ONE ROW PER MATCH           -- Generate an output event upon a successful match.
        AFTER MATCH SKIP TO NEXT ROW-- Start the next match from the next row upon a successful match.
        PATTERN (e1 e2+) WITHIN INTERVAL '10' MINUTE  -- Define two events e1 and e2.
        DEFINE                     -- Define the meanings of patternVariables in the PATTERN statement.
            e1 as e1.action = 'Tom',    -- Mark the action of e1 as Tom.
            e2 as e2.action = 'Tom' and e2.location <> e1.location -- Mark the action of e2 as Tom. The locations of e1 and e2 are different.
    
    );
    					
  • Test result
    start_timestamp (TIMESTAMP) end_timestamp (TIMESTAMP) card_id (VARCHAR) event (VARCHAR)
    2018-04-13 20:00:00.0 2018-04-13 20:05:00.0 1 Tom
    2018-04-13 20:05:00.0 2018-04-13 20:10:00.0 1 Tom