Abstract: This article is authored by Tang Zhanfeng and Ouyang Wulin, big data engineers at Bank of Hangzhou. It describes what Flink dynamic CEP is and its key concepts, use cases, technological principles, and usage. The article is sectioned into five parts:
- What's Flink dynamic CEP
- Financial use cases
- Underlying technologies
- How to use Flink dynamic CEP
- Application of Flink dynamic CEP at Bank of Hangzhou
In the finance industry, big data technologies are entering the maturity stage. Data freshness is essential for real-time transaction monitoring and analytics, fraud detection (money laundering, etc.), and regulatory compliance. In the rapidly evolving business landscape, static rules cannot keep up because they require service restart to implement rule updates, leading to service disruptions and delayed responses. To address this challenge, Bank of Hangzhou utilized Flink dynamic CEP to update rules on the fly without service downtime, effectively adapting to rapidly changing business requirements.
FlinkCEP is the Complex Event Processing (CEP) library implemented on top of Flink. It's used to detect event patterns in a data stream, allowing users to act on matched events. Flink dynamic CEP extends the capabilities of FlinkCEP to support on-the-fly rule updates, improving the flexibility and responsiveness of the system and significantly reducing operational overhead and complexity.
Flink dynamic CEP is an advanced stream processing feature of Apache Flink. It supports on-the-fly rule modificationsin a DataStream program, quickly applying the latest rule to capture, clean, and analyze data streams. Flink dynamic CEP helps users can recognize critical data patterns in real time.
①Pattern: A pattern is a rule and the method of defining the rule. A pattern can be either a singleton or a looping pattern. A singleton pattern accepts a single event while a looping pattern can accept more than one events. Multiple simple patterns comprise a complex pattern sequence, referred to as a PatternProcessor.
②Event streams: An event stream can come from heterogenous upstream systems, such as Kafka or databases (e.g., real-time transaction data streams). After a Flink dynamic CEP job starts, Flink attempts to detect the defined PatternProcessor in the input event stream and generate the processed result.
③Dynamic matching: Flink dynamic CEP identifies event stream changes in real time and continuously sends them to downstream operators. The downstream operator parses and deserializes the received events, generate the actual PatternProcessor to use, and then find matching patterns according to the updated patterns defined in the latest PatternProcessor.
Flink CEP is used to detect events that match predefined patterns. However, rapidly evolving business scenarios, such as in transaction or bookkeeping, may necessitate modifying or adding patterns. For example, the typical threshold of risky money transfers is three times in one minute; in special cases, an appropriate threshold should be higher. Unfortunately, traditional Flink CEP does not support dynamic rule modifications. To implement an updated rule, the user must rewrite the Java code and restart the Flink CEP job. Delay-sensitive businesses like risk control in banking cannot tolerate code re-development, packaging, and re-deployment. What's more, since a Flink CEP job typically involves multiple patterns, updating a single pattern will result in a significant maintenance that affects other users and poses operational challenges. Flink dynamic CEP rises to the challenges, supporting pattern updates with no downtime. Below are the feature's highlights:
①Dynamic rule updates: Traditionally, updating a pattern requires redeploying and restarting a Flink CEP job, leading to service downtime and impacting the timeliness and usability of the system. Flink dynamic CEP supports on-the-fly pattern updates, with no need to restart a Flink CEP job.
②Support for multiple rules: Traditional Flink CEP creates multiple cep operators for multiple patterns. This results in extra data copies and higher data processing overhead. Flink dynamic CEP supports having multiple patterns in one cep operator, reducing data copies and improving processing efficiency.
③Support for condition parameterization: Flink dynamic CEP allows users to parameterize conditions in JSON-formatted pattern descriptions. This improves the extensibility of custom conditions and enables users to dynamically add new condition classes.
Flink dynamic CEP is useful in building an intelligent monitoring system, supporting online risk detection (such as money laundering or fraud), and empowering real-time marketing for business growth. Financial use cases of Flink dynamic CEP are as follows:
Flink dynamic CEP can monitor transactional activities of bank accounts and help identify potential money laundering activities. For example, users can add patterns to detect frequent deposits and withdrawals within a short time frame or to trace funds flow among accounts involved in money laundering schemes. Furthermore, Flink dynamic CEP can be integrated with big data and machine learning technologies to create risk monitoring models that improve the accuracy of detecting suspicious transactions and identifying customers potentially linked to money laundering. In addition, Flink dynamic CEP is useful in processing end-to-end transactions. When combined with knowledge graphs and real-time intelligence, it can help build a bank's customer relationship graph that integrates suspicious patterns in transactional data to provide a holistic view of funds flow.
In countries where telecom fraud is rampant, an effective anti-fraud system in finance can halt the movement of funds, significantly reducing financial losses for the victim. Such a system must be distributed, real-time, and capable of supporting flexible rule updates and complex rule matching. Flink dynamic CEP offers an ideal solution. Built on top of Apache Flink, a distributed engine for stream processing, it provides the capabilities to match complex patterns and modify patterns with no downtime.
When applying for credit cards, customers typically go through multiple steps, such as filling in basic information and authenticating their identity. They may choose to exit the process or experience failures or timeouts at any stage for various reasons. By leveraging the user behavior log as input, a Flink dynamic CEP job can identify various patterns in the data, perform computations, and generate output. Banks can then use the insights to inform marketing strategies, such as issuing coupons or prompting customer managers to reach out.
Drawing on the provided context and inspired by Alibaba Cloud's FLIP-200 Flink improvement proposal, as well as the open-source ververica-cep demo, the data architecture engineering team at Bank of Hangzhou implemented Flink dynamic CEP within the department. This section delves into the details of the implementation and the underlying technologies.
In dynamic Flink CEP, the OperatorCoordinator is responsible for coordinating the operators. The OperatorCoordinator runs on the JobManager and sends events to operators on the TaskManagers. The DynamicCEPOperatorCoordinator is the implementation class of the OperatorCoordinator. It calls the PatternProcessorDiscoverer interface to retrieve the latest PatternProcessor. The architecture of Flink dynamic CEP is as follows:
As illustrated in the figure above, the OperatorCoordinator uses the PatternProcessorDiscoverer interface to fetch the latest serialized PatternProcessor from the database, which it then sends to the associated DynamicCEPOps. The DynamicCEPOp parses and deserializes the received events, generating the final PatternProcessor for use and constructing the corresponding NFAs. Upstream events are processed in the NFAs before being sent downstream. This approach enables dynamic pattern updates, centralizes access to the rule database through the OperatorCoordinator, and ensures pattern consistency across downstream subtasks by leveraging Flink's inherent characteristics.
Built on the architecture introduced earlier, a Flink CEP job has an essential method: CEP.dynamicPatterns(). Alibaba Cloud's Realtime Compute for Apache Flink offers the CEP.dynamicPatterns() API, which is defined as follows:
public static <T, R> SingleOutputStreamOperator<R> dynamicPatterns(
DataStream<T> input,
PatternProcessorDiscovererFactory<T> patternProcessorDiscovererFactory,
TimeBehaviour timeBehaviour,
TypeInformation<R> outTypeInfo)
The following table describes the method's input parameters:
Parameter name | Description |
---|---|
DataStream input | The input event stream. |
PatternProcessorDiscovererFactory | The factory object which constructs the PatternProcessorDiscoverer to retrieve the latest pattern and generate a PatternProcessor interface. |
TimeBehaviour TimeBehaviour | The time attribute that defines how the Flink CEP job processes events. Valid values:TimeBehaviour.ProcessingTime: Events are processed based on the processing time.TimeBehaviour.EventTime: Events are processed based on the event time. |
TypeInformation OutTypeInfo | The type information of the output stream. |
The PatternProcessorDiscovererFactory interface is responsible for creating the PatternProcessorDiscoverer that fetches the latest PatternProcessor from the database. The PatternProcessor and PatternProcessorDiscoverer interfaces, along with their implementation classes and the DynamicCEPOperatorCoordinator, play crucial roles in dynamic Flink CEP. The following sections will explain these three components in detail.
public interface PatternProcessor<IN> extends Serializable, Versioned{
String getId();
default Long getTimestamp(){
return Long.MIN_VALUE;
}
Pattern<IN,?> getPattern(ClassLoader classLoader);
PatternProcessFunction<IN,?> getPatternProcessFunction();
}
The PatternProcessor interface is used to define a rule in dynamic Flink CEP. Its implementation class contains a specific pattern that describes how to match events and a PatternProcessFunction that specifies how to process matched events. Additionally, it includes identification properties, such as id and an optional version.
@PublicEvolving
public class DefaultPatternProcessor<T> implements PatternProcessor<T> {
/** The ID of the pattern processor. */
private final String id;
/** The version of the pattern processor. */
private final Integer version;
/** The pattern of the pattern processor. */
private final String patternStr;
private final @Nullable PatternProcessFunction<T, ?> patternProcessFunction;
public DefaultPatternProcessor(
final String id,
final Integer version,
final String pattern,
final @Nullable PatternProcessFunction<T, ?> patternProcessFunction,
final ClassLoader userCodeClassLoader) {
this.id = checkNotNull(id);
this.version = checkNotNull(version);
this.patternStr = checkNotNull(pattern);
this.patternProcessFunction = patternProcessFunction;
}
@Override
public String toString() {
return "DefaultPatternProcessor{"
\+ "id='"
\+ id
\+ '\''
\+ ", version="
\+ version
\+ ", pattern="
\+ patternStr
\+ ", patternProcessFunction="
\+ patternProcessFunction
\+ '}';
}
@Override
public String getId() {
return id;
}
@Override
public int getVersion() {
return version;
}
@Override
public Pattern<T, ?> getPattern(ClassLoader classLoader) {
try {
return (Pattern<T, ?>) CepJsonUtils.convertJSONStringToPattern(patternStr, classLoader);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
@Override
public PatternProcessFunction<T,?> getPatternProcessFunction(){
return patternProcessFunction;
}
}
The DefaultPatternProcessor class serves as the default implementation of the PatternProcessor. It receives parameters such as id, version, pattern strings, the PatternProcessFunction, and the ClassLoader. It uses checkNotNull to verify that all parameters are provided except for patternProcessFunction. In the getPattern() method, the convertJSONStringToPattern() method is used to convert a JSON string into a pattern that Flink CEP can recognize. The following code snippet overrides the convertJSONStringToPattern() method to accept a specified classloader as an input parameter:
public static Pattern<?, ?> convertJSONStringToPattern(
String jsonString, ClassLoader userCodeClassLoader) throws Exception {
if (userCodeClassLoader == null) {
LOG.warn(
"The given userCodeClassLoader is null. Will try to use ContextClassLoader of current thread.");
return convertJSONStringToPattern(jsonString);
}
GraphSpec deserializedGraphSpec = objectMapper.readValue(jsonString, GraphSpec.class);
return deserializedGraphSpec.toPattern(userCodeClassLoader);
}
toPattern(), one of the PatternProcessor's core methods, involves the GraphSpec class which serves as a tool for pattern serialization and deserialization. The toPattern() method processes a graph of nodes and edges. A node can be an individual pattern or an embedded GraphSpec. An edge defines inter-node relationships and the direction of data flow. The graph is closely connected with rule DAGs stored in the database. Below is the implementation of the toPattern() method:
public Pattern<?, ?> toPattern(final ClassLoader classLoader) throws Exception {
// Construct cache of nodes and edges for later use
final Map<String, NodeSpec> nodeCache = new HashMap<>();
for (NodeSpec node : nodes) {
nodeCache.put(node.getName(), node);
}
final Map<String, EdgeSpec> edgeCache = new HashMap<>();
for (EdgeSpec edgeSpec : edges) {
edgeCache.put(edgeSpec.getSource(), edgeSpec);
}
String currentNodeName = findBeginPatternName();
Pattern<?, ?> prevPattern = null;
String prevNodeName = null;
while (currentNodeName != null) {
NodeSpec currentNodeSpec = nodeCache.get(currentNodeName);
EdgeSpec edgeToCurrentNode = edgeCache.get(prevNodeName);
Pattern<?, ?> currentPattern =
currentNodeSpec.toPattern(
prevPattern,
afterMatchStrategy.toAfterMatchSkipStrategy(),
prevNodeName == null
? ConsumingStrategy.STRICT
: edgeToCurrentNode.getType(),
classLoader);
if (currentNodeSpec instanceof GraphSpec) {
ConsumingStrategy strategy =
prevNodeName == null
? ConsumingStrategy.STRICT
: edgeToCurrentNode.getType();
prevPattern =
buildGroupPattern(
strategy, currentPattern, prevPattern, prevNodeName == null);
} else {
prevPattern = currentPattern;
}
prevNodeName = currentNodeName;
currentNodeName =
edgeCache.get(currentNodeName) == null
? null
: edgeCache.get(currentNodeName).getTarget();
}
// Add window semantics
if (window != null && prevPattern != null) {
prevPattern.within(this.window.getTime(), this.window.getType());
}
return prevPattern;
}
The toPattern() method, crucial to the GraphSpec class, deserializes the serialized GraphSpec object into a pattern. The internal logic is as follows:
①Constructing node and edge caches: nodeCache and edgeCache maps are created to hold NodeSpec and EdgeSpec instances separately. This enables efficient retrieval of nodes or edges later on.
②Finding the beginning node: The currentNodeName variable is initialized using the findBeginPatternName() method, which ensures the graph processing starts at the beginning node.
③Creating pattern iterations:
Iterate over all nodes in a loop. Start at the beginning node and build the pattern forward based on edge information. In each iteration, retrieve the current node's NodeSpec from nodeCache, and the EdgeSpec from the previous node to the current node from edgeCache (if the EdgeSpec exists). Then, use the NodeSpec and EdgeSpec to assemble a new pattern or update the current pattern. Note that consuming strategies influence the selection of the pattern combination methods like Pattern.begin(), Pattern.next(), Pattern.followedBy(), or Pattern.followedByAny(). Next, update the prevPattern and prevNodeName to prepare for the subsequent iteration. Finally, return the constructed pattern object.
This section introduces in detail how to implement the PatternProcessor interface, its core methods, and the process of pattern construction. The next section will introduce the PatternProcessorDiscoverer interface and its implementation.
public abstract interface PatternProcessorDiscoverer<T> extends Closeable
{
public abstract void discoverPatternProcessorUpdates(PatternProcessorManager<T> paramPatternProcessorManager);
}
The PatternProcessorDiscoverer interface discovers the pattern processor updates.
Based on the periodicPatternProcessorDiscoverer class provided by Alibaba Cloud that periodically scans external storage, we've defined the JDBCPeriodicPatternProcessorDiscoverer class to pull the latest rule from a JDBC database:
public class JDBCPeriodicPatternProcessorDiscoverer<T>
extends PeriodicPatternProcessorDiscoverer<T> {
private static final Logger LOG =
LoggerFactory.getLogger(JDBCPeriodicPatternProcessorDiscoverer.class);
private final String tableName;
private final String userName;
private final String password;
private final String jdbcUrl;
private final String tenant;
private final List<PatternProcessor<T>> initialPatternProcessors;
private final ClassLoader userCodeClassLoader;
private Connection connection;
private Statement statement;
private ResultSet resultSet;
private Map<String, Tuple4<String, Integer, String, String>> latestPatternProcessors = new ConcurrentHashMap<>();
/**
\* Creates a new using the given initial {@link PatternProcessor} and the time interval how
\* often to check the pattern processor updates.> *
*
\* @param jdbcUrl The JDBC url of the database.> * @param jdbcDriver The JDBC driver of the database.> * @param initialPatternProcessors The list of the initial {@link PatternProcessor}.> * @param intervalMillis Time interval in milliseconds how often to check updates.>
*/
public JDBCPeriodicPatternProcessorDiscoverer(
final String jdbcUrl,
final String jdbcDriver,
final String tableName,
final String userName,
final String password,
@Nullable final String tenant,
final ClassLoader userCodeClassLoader,
@Nullable final List<PatternProcessor<T>> initialPatternProcessors,
@Nullable final Long intervalMillis)
throws Exception {
super(intervalMillis);
this.tableName = requireNonNull(tableName);
this.initialPatternProcessors = initialPatternProcessors;
this.userCodeClassLoader = userCodeClassLoader;
this.userName = userName;
this.password = password;
this.jdbcUrl = jdbcUrl;
this.tenant = tenant;
Class.forName(requireNonNull(jdbcDriver));
this.connection = DriverManager.getConnection(requireNonNull(jdbcUrl), userName, password);
this.statement = this.connection.createStatement();
}
JDBCPeriodicPatternProcessorDiscoverer has two key methods: arePatternProcessorsUpdated() and getLatestPatternProcessors().
@Override
public boolean arePatternProcessorsUpdated() throws SQLException {
if (latestPatternProcessors == null
&& !CollectionUtil.isNullOrEmpty(initialPatternProcessors)) {
return true;
}
LOG.info("Start check is pattern processor updated.");
if (statement == null) {
try {
this.connection = DriverManager.getConnection(requireNonNull(jdbcUrl), userName, password);
this.statement = this.connection.createStatement();
} catch (SQLException e) {
LOG.error("Connect to database error!", e);
throw e;
}
}
try {
String sql = buildQuerySql();
LOG.info("Statement execute sql is {}", sql);
resultSet = statement.executeQuery(sql);
Map<String, Tuple4<String, Integer, String, String>> currentPatternProcessors = new ConcurrentHashMap<>();
while (resultSet.next()) {
LOG.debug("check getLatestPatternProcessors start :{}", resultSet.getString(1));
String id = resultSet.getString("id");
if (currentPatternProcessors.containsKey(id)
&& currentPatternProcessors.get(id).f1 >= resultSet.getInt("version")) {
continue;
}
currentPatternProcessors.put(
id,
new Tuple4<>(
requireNonNull(resultSet.getString("id")),
resultSet.getInt("version"),
requireNonNull(resultSet.getString("pattern")),
resultSet.getString("function")));
}
if (latestPatternProcessors == null
|| isPatternProcessorUpdated(currentPatternProcessors)) {
LOG.debug("latest pattern processors size is {}", currentPatternProcessors.size());
latestPatternProcessors = currentPatternProcessors;
return true;
} else {
return false;
}
} catch (SQLException e) {
LOG.error(
"Pattern processor discoverer failed to check rule changes, will recreate connection.", e);
try {
statement.close();
connection.close();
connection = DriverManager.getConnection(requireNonNull(this.jdbcUrl), this.userName, this.password);
statement = connection.createStatement();
} catch (SQLException ex) {
LOG.error("Connect pattern processor discovery database error.", ex);
throw new RuntimeException("Cannot recreate connection to database.");
}
}
return false;
}
The arePatternProcessorsUpdated() method checks whether the PatternProcessor in the database has been updated. Here's how it works:
First, it checks the initialPatternProcessors list for unprocessed PatternProcessors. If any is found, it returns true. Then, the method establishes a database connection and calls the buildQuerySql() method to generate and execute an SQL query that retrieves PatternProcessors for all or specific tenants from the table specified by tableName. Then, after the SQL is executed, it processes the results. For each currentPatternProcessor, the method checks if it already exists or has an earlier version. If an earlier version already exists, the currentPatternProcessor is ignored; If not, the currentPatternProcessors map is updated. If the latestPatternProcessors is null or has an update, the currentPatternProcessors is used to update the latestPatternProcessors, and true is returned to indicate an update has occurred.
@Override
public List<PatternProcessor<T>> getLatestPatternProcessors() throws Exception {
LOG.debug("Start convert pattern processors to default pattern processor.");
return latestPatternProcessors.values().stream()
.map(
patternProcessor -> {
try {
String patternStr = patternProcessor.f2;
GraphSpec graphSpec =
CepJsonUtils.convertJSONStringToGraphSpec(patternStr);
LOG.debug("Latest pattern processor is {}",
CepJsonUtils.convertGraphSpecToJSONString(graphSpec));
PatternProcessFunction<T, ?> patternProcessFunction = null;
String id = patternProcessor.f0;
int version = patternProcessor.f1;
if (!StringUtils.isNullOrWhitespaceOnly(patternProcessor.f3)) {
patternProcessFunction =
(PatternProcessFunction<T, ?>)
this.userCodeClassLoader
.loadClass(patternProcessor.f3)
.getConstructor(String.class, int.class, String.class)
.newInstance(id, version, tenant);
}
return new DefaultPatternProcessor<>(
patternProcessor.f0,
patternProcessor.f1,
patternStr,
patternProcessFunction,
this.userCodeClassLoader);
} catch (Exception e) {
LOG.error(
"Get the latest pattern processors of the discoverer failure. - ", e);
e.printStackTrace();
}
return null;
}).filter(pre -> pre != null).collect(Collectors.toList());
}
The getLatestPatternProcessors() method retrieves the latest PatternProcessors from the database. It uses the stream API to convert PatternProcessor data stored in the ConcurrentHashMap to a list of PatternProcessor objects.
The key steps are as follows: Based on the class name (patternProcessor.f3) in the PatternProcessor, the PatternProcessFunction is loaded by the classLoader and instantiated. If the class name is not null or empty, it will be converted to a Java class, and the getConstructor() method will be called to offer the PatternProcessor's id, version, and tenant information. This information is then used to build the DefaultPatternProcessor instance with information including the pattern string, pattern processor function, and classLoader. Finally, a list of the latest PatternProcessor instances is returned, which can be used by Flink CEP to identify matched events.
The DynamicCepOperatorCoordinator class calls the PatternProcessorDiscoverer interface to retrieve the latest serialized PatternProcessor from the database and send it to the associated DynamicCEPOp tasks:
public class DynamicCepOperatorCoordinator<T> implements OperatorCoordinator {
private static final Logger LOG =
LoggerFactory.getLogger(DynamicCepOperatorCoordinator.class);
private final DynamicCepOperatorCoordinatorContext cepCoordinatorContext;
private final PatternProcessorDiscovererFactory discovererFactory;
private final String operatorName;
private boolean started;
private volatile boolean closed;
public DynamicCepOperatorCoordinator(String operatorName, PatternProcessorDiscovererFactory discovererFactory, DynamicCepOperatorCoordinatorContext context) {
this.cepCoordinatorContext = context;
this.discovererFactory = discovererFactory;
this.operatorName = operatorName;
this.started = false;
this.closed = false;
}
@Override
public void start() throws Exception {
Preconditions.checkState(!started, "Dynamic Cep Operator Coordinator Started!");
LOG.info("Starting Coordinator for {}:{}", this.getClass().getSimpleName(), operatorName);
cepCoordinatorContext.runInCoordinatorThreadWithFixedRate(()->{
if (discovererFactory instanceof PeriodicPatternProcessorDiscovererFactory) {
try {
PeriodicPatternProcessorDiscoverer patternProcessorDiscoverer =
(PeriodicPatternProcessorDiscoverer) discovererFactory
.createPatternProcessorDiscoverer(cepCoordinatorContext.getUserCodeClassloader());
boolean updated = patternProcessorDiscoverer.arePatternProcessorsUpdated();
if (updated && started) {
Set<Integer> subtasks = cepCoordinatorContext.getSubtasks();
if (!patternProcessorDiscoverer.getLatestPatternProcessors().isEmpty()) {
UpdatePatternProcessorEvent updatePatternProcessorEvent =
new UpdatePatternProcessorEvent(patternProcessorDiscoverer.getLatestPatternProcessors());
subtasks.forEach(subtaskId -> {
cepCoordinatorContext.sendEventToOperator(subtaskId, updatePatternProcessorEvent);
});
}
}
} catch (Exception e) {
LOG.error("Starting Coordinator failed", e);
}
}
});
started = true;
}
@Override
public void close() throws Exception {
closed = true;
cepCoordinatorContext.close();
}
@Override
public void handleEventFromOperator(int subtask, int attemptNumber, OperatorEvent event) throws Exception {
LOG.info("Received event {} from operator {}.", event, subtask);
}
@Override
public void checkpointCoordinator(long checkpointId, CompletableFuture<byte[]> resultFuture) throws Exception {
// cepCoordinatorContext.runInCoordinatorThread(() -> {
LOG.debug("Taking a state snapshot on operator {} for checkpoint {}", operatorName, checkpointId);
try {
resultFuture.complete("Dynamic cep".getBytes(StandardCharsets.UTF_8));
} catch (Throwable e) {
ExceptionUtils.rethrowIfFatalErrorOrOOM(e);
resultFuture.completeExceptionally(
new CompletionException(
String.format(
"Failed to checkpoint for dynamic cep %s",
operatorName),
e));
}
}
@Override
public void notifyCheckpointComplete(long checkpointId) {
}
@Override
public void resetToCheckpoint(long checkpointId, @Nullable byte[] checkpointData) throws Exception {
}
@Override
public void subtaskReset(int subtask, long checkpointId) {
}
@Override
public void executionAttemptFailed(int subtask, int attemptNumber, @Nullable Throwable reason) {
cepCoordinatorContext.subtaskNotReady(subtask);
}
@Override
public void executionAttemptReady(int subtask, int attemptNumber, SubtaskGateway gateway) {
cepCoordinatorContext.subtaskReady(gateway);
}
}
Central to the DynamicCepOperatorCoordinator class is the start() method, which initializes and activates the coordinator. Here's how the start() method works:
It schedules a periodic task using the cepCoordinatorContext.runInCoordinatorThreadWithFixedRate. This task is defined by the Lambda expression and runs in the coordinator thread to check for updates to the PatternProcessor on a regular basis, like 10 seconds. Then, the start() method constructs an UpdatePatternProcessorEvent, which is broadcast to the downstream operators by the cepCoordinatorContext. Note that the DynamicCepOperatorCoordinator in the JobManager runs asynchronously from the PatternProcessor in the TaskManager.
This section introduces how to develop a Flink dynamic CEP program with a Kafka source. The procedure is as follows:
To configure a database as the data source, use the appropriate connector.
public static void main(String[] args) throws Exception {
// Set up the streaming execution environment
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//Classloader initial
final ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
// Process args
// Build Kafka source with new Source API based on FLIP-27
Properties prop =new Properties();
prop.setProperty("security.protocol","SASL_PLAINTEXT");
prop.setProperty("sasl.mechanism","SCRAM-SHA-256");
prop.setProperty("sasl.jaas.config",
"org.apache.flink.kafka.shaded.org.apache.kafka.common.security.scram.ScramLoginModule" +
" required username=\"100670\" password=\"000000000\";");
KafkaSource<Event> kafkaSource = KafkaSource.<Event>builder()
.setBootstrapServers("123.4.50.105:9292,123.4.60.106:9292,123.4.50.107:9292")
.setTopics("cep_test1").setGroupId("test").setStartingOffsets(OffsetsInitializer.earliest())
.setProperties(prop).setValueOnlyDeserializer((new KafkaJsonDeserializer())).build();
env.setParallelism(1);
DataStream<Event> input = env.fromSource(kafkaSource, WatermarkStrategy.noWatermarks(), "source");
// keyBy userId and productionId
// Notes, only events with the same key will be processed to see if there is a match
KeyedStream<Event, Tuple2<String, String>> keyedStream =
input.keyBy(
new KeySelector<Event, Tuple2<String, String>>() {
@Override
public Tuple2<String, String> getKey(Event value) throws Exception {
return Tuple2.of(value.getName(), value.getName());
}
});
①Set up the execution environment.
②Configure the Kafka source, and use the KeyBy operator to partition the event stream by the name key.
long time = 1000;
SingleOutputStreamOperator<String> output = CEP.dynamicPatterns(
keyedStream,
new JDBCPeriodicPatternProcessorDiscovererFactory<>(
"jdbc:mysql//123.45.6.789:3306/cep_demo_db",
"com.mysql.cj.jdbc.Driver",
"rds_demo",
"riskcollateral",
"riskcollateral",
null,
null,
timer),
TimeBehaviour.ProcessingTime,
TypeInformation.of(new TypeHint<String>()){
}));
output.addSink(new PrintSinkFunction<>().name("cep"));
env.excute("CEPDemo");
}
}
Apache StreamPark is used as the management and O&M platform for the Flink dynamic CEP program. Do as follows to create a Flink job:
①Upload the JAR package.
②Add a job.
③Complete necessary configurations.
④Publish and start the application.
①Create the rds_demo table to store CEP rules.
②Insert an updated rule.
Insert the id, version, function, and JSON-formatted pattern into the rds_demo table (Realtime Compute for Apache Flink allows users to describe patterns in JSON. For more information, see Definitions of rules in the JSON format in dynamic Flink CEP.)
version | pattern | function | |
---|---|---|---|
1 | {"name":"end","quantifier":{"consumingStrategy}... | xxxpackage.dynamic.cep.core.DemoPatternProcessFunction |
The JSON-formatted pattern sequence is as follows:
{
"name": "end",
"quantifier": {
"consumingStrategy": "SKIP_TILL_NEXT",
"properties": [
"SINGLE"
],
"times": null,
"untilCondition": null
},
"condition": null,
"nodes": [
{
"name": "end",
"quantifier": {
"consumingStrategy": "SKIP_TILL_NEXT",
"properties": [
"SINGLE"
],
"times": null,
"untilCondition": null
},
"condition": {
"className": "xxxpackage.dynamic.cep.core.EndCondition",
"type": "CLASS"
},
"type": "ATOMIC"
},
{
"name": "start",
"quantifier": {
"consumingStrategy": "SKIP_TILL_NEXT",
"properties": [
"LOOPING"
],
"times": null,
"untilCondition": null
},
"type": "ATOMIC"
}
],
"edges": [
{
"source": "start",
"target": "end",
"type": "SKIP_TILL_NEXT"
}
],
"window": null,
"afterMatchStrategy": {
"type": "SKIP_PAST_LAST_EVENT",
"patternName": null
},
"type": "COMPOSITE",
"version": 1
}
The above JSON code describes a composite node (pattern) consisting of two atomic nodes: start and end.
This pattern is used to match an event sequence where the start node matches an event with the action being 0 and the end node matches an event defined by the xxxpackage.dynamic.cep.core.EndCondition class:
public class EndCondition extends SimpleCondition<Event> {
@Override
public boolean filter(Event value) throws Exception {
return value.getAction() != 1;
}
}
In the above code, the EndCondition checks whether the action attribute of the event is not equal to 1. If the action is not 1, the filter method returns true, indicating that the event satisfies the end node's condition.
In short, this pattern sequence matches an event sequence where the initial event's action is 0 and the end event's action isn't 1. Once the end node's condition is met, pattern matching ends.
The function field is specified by the fully qualified name of the DemoPatternProcessFunction class. This field specifieshow to handle matched events, as defined in the code snippet below:
public class DemoPatternProcessFunction<IN> extends PatternProcessFunction<IN, String> {
String id;
int version;
String tenant;
public DemoPatternProcessFunction(String id, int version, String tenant) {
this.id = id;
this.version = version;
this.tenant = tenant;
}
@Override
public void processMatch(
final Map<String, List<IN>> match, final Context ctx, final Collector<String> out) {
StringBuilder sb = new StringBuilder();
sb.append("A match for Pattern of (id, version): (")
.append(id)
.append(", ")
.append(version)
.append(") is found. The event sequence: ").append("\n");
for (Map.Entry<String, List<IN>> entry : match.entrySet()) {
sb.append(entry.getKey()).append(": ").append(entry.getValue().get(0).toString()).append("\n");
}
out.collect(sb.toString());
}
}
If the PatternProcessor detects an event sequence that matches the predefined pattern, the processMatch() method will construct a string that describes the match. Then, the string is output by the downstream operator via the Collector.
Assume the following event sequence streams into Flink:
private static void sendEvents(Producer<String, String> producer, String topic) {
ObjectMapper objectMapper = new ObjectMapper();
Event[] events = {
new Event("ken", 1, 1, 0, 1662022777000L),
new Event("ken", 2, 1, 0, 1662022778000L),
new Event("ken", 3, 1, 1, 1662022779000L),
new Event("ken", 4, 1, 2, 1662022780000L),
new Event("ken", 5, 1, 1, 1662022780000L)
};
while (true) {
try {
for (Event event : events) {
String json = objectMapper.writeValueAsString(event);
ProducerRecord<String, String> record = new ProducerRecord<>(topic, json);
producer.send(record, (metadata, exception) -> {
if (exception != null) {
LOG.error("Failed to send data to Kafka: ", exception);
} else {
System.out.println(metadata.topic());
LOG.info("Data sent successfully to topic {} at offset {}",
metadata.topic(), metadata.offset());
}
});
}
} catch (Exception e) {
LOG.error("Error while sending events to Kafka: ", e);
}
}
}
Insert the above events into the Kafka topic. The start node matches the initial two events whose action attributes are 0. The end node matches the fourth event whose action isn't 1, which ends pattern matching. The fifth event doesn't affect the finished match.
Bank of Hangzhou applies Flink dynamic CEP to the behavioral sequence module of the event center.
The event center is a platform that processes and analyzes event tracking data for informed decision making. The behavioral sequence module applies Bank of Hangzhou's in-house optimized version of dynamic Flink CEP.
Add a behavioral sequence. After filling in the basic information and configuring the event expiration time, users can add an event or event set.
The following image shows a behavioral sequence template:
As shown in the figure below, Events 1 to 5 are atomic events representing tracked user clicks, which are streamed into Flink sequentially. Events completed within the predefined expiration duration (20 minutes in this example) are matched. For example, if only the first four events are completed within 20 minutes, then Events 1 to 4 will be matched. The results, along with information like the username, cause of an error, or user's phone number, can be output to a message queue (Kafka, RocketMQ, etc.), offering insights into business or facilitating personalized recommendations.
Applying Flink dynamic CEP to Bank of Hangzhou's event center enables users to add or change rules or event sequences on-the-fly without restarting Flink jobs. Changes made through the web client will be shared with the rule database and take effect.
Reference
[1] Flink CEP: New features and application to real-time risk monitoring https://developer.aliyun.com/article/1157197
[2] Get started with Flink dynamic CEP https://www.alibabacloud.com/help/en/flink/getting-started/getting-started-with-dynamic-flink-cep
[3] Apache Flink FLIP-200: Support Multiple Rule and Dynamic Rule Changing (Flink CEP) https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=195730308
[4]Apache Flink. (v1.15.4). FlinkCEP – Complex event processing for Flink https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/libs/cep/
A Guide to Preventing Fraud Detection in Real-Time with Apache Flink
171 posts | 47 followers
FollowApache Flink Community - February 27, 2025
Apache Flink Community - April 17, 2024
Apache Flink Community - March 20, 2025
XianYu Tech - March 11, 2020
Apache Flink Community China - September 27, 2020
Apache Flink Community - April 16, 2024
171 posts | 47 followers
FollowRealtime Compute for Apache Flink offers a highly integrated platform for real-time data processing, which optimizes the computing of Apache Flink.
Learn MoreAlibaba Cloud equips financial services providers with professional solutions with high scalability and high availability features.
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreAlibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.
Learn MoreMore Posts by Apache Flink Community