Real-Time Fraud Detection Pipeline Architecture on Alibaba Cloud

This article examines how Alibaba Cloud's streaming, feature-store, and inference services compose into a sub-second fraud detection pipeline that sco...

This article examines how Alibaba Cloud’s streaming, feature store, and inference services compose a sub-second fraud detection pipeline that scores transactions against rule logic and machine learning models before they complete.

Fraud detection is a latency-bounded problem. A transaction stream — payments, account access, withdrawals, transfers — must be scored, classified, and acted upon within the time budget that the originating system can tolerate before the transaction is completed or rejected. For card-present payments, this budget is typically tens of milliseconds; for card-not-present and account-based transactions, hundreds of milliseconds; for asynchronous workflows such as withdrawal reviews, seconds to minutes. The architecture chosen must hold the latency budget under sustained throughput while delivering fraud scores with sufficient feature richness to discriminate legitimate transactions from anomalous ones.

The pipeline composes five capability layers: event ingestion, stream processing, feature serving, decision scoring, and response delivery. Alibaba Cloud exposes the primitives for each layer, with the architectural challenge lying in how they are composed and tuned rather than in selecting them in isolation.

Fraud_Detection_Pipeline_Diagram

Event ingestion and ordering guarantees

Fraud detection requires the ordered delivery of transaction events from upstream systems. Out-of-order or duplicate events produce false positives in velocity rules and false negatives in pattern detection.

Message Queue for Apache RocketMQ provides FIFO topics with strict partition ordering and at-least-once delivery semantics, with consumer-side idempotency keys used to achieve effective exactly-once processing downstream. Producer applications publish transaction events partitioned by account identifier or device identifier, ensuring all events for a given entity flow through a single partition and preserve temporal order at the consumer.

For transaction systems that prefer an HTTP entry point rather than a native producer SDK, API Gateway routes synchronous requests into Function Compute or directly into RocketMQ via a managed integration. Message persistence ensures that downstream processing can be resumed after consumer restart without event loss. Schema validation and field enrichment occur at this layer through Function Compute, so that downstream stream-processing jobs receive normalised events with consistent field types and units.

Stream processing and feature computation with Flink

Real-time Compute for Apache Flink consumes the event stream and computes features in two categories. Stateless features are derived from the event itself, such as transaction amount, merchant category, and transaction type. Stateful features are computed over windows of historical events, such as transaction count in the last 60 seconds, sum of transaction amounts over 15 minutes, or distinct device count over 24 hours.

Stateful feature computation uses Flink’s keyed state and windowing primitives. Tumbling windows compute fixed-period aggregations; sliding windows compute overlapping aggregations for higher temporal resolution; session windows close on activity gaps, which suit features tied to user session behaviour rather than wall-clock periods. The choice of window type and width depends on the fraud pattern being modelled, with shorter windows producing more reactive features at higher state cost.

Watermark tolerance configuration determines how late-arriving events are handled. A tolerance set to the P95 ingestion latency of the upstream system balances feature freshness against the rate of dropped late events. Excessively tight watermarks discard legitimate events from slower channels, while excessively loose ones delay feature output beyond the decision latency budget. Computed features are written to Tair for low-latency online serving and to Lindorm for historical retention, with the same feature value written to both stores in a single sink to maintain consistency between online and offline representations.

Feature serving with Tair and Lindorm

The decision layer requires very low latency access to current feature values, since feature lookup time consumes part of the latency budget that model inference and rule evaluation must share.

Tair, a Redis-compatible in-memory database, serves as the online feature store. Current feature values keyed by entity identifier are read by the decision layer with single-digit-millisecond latency in typical configurations. Tair persistence guarantees ensure that feature values survive instance restart, and cluster mode provides horizontal scaling for high feature read throughput.

Lindorm serves as the offline and historical feature store. Its wide-column model accommodates sparse, high-dimensional feature sets, and its time-series engine retains feature evolution over long periods for model retraining and pattern analysis. Tair and Lindorm fulfil distinct roles in the pipeline: Tair holds the current state required for inline scoring, while Lindorm holds the longitudinal record required for offline model development and forensic investigation.

Decision scoring with rules and PAI-EAS models

Fraud decisions typically combine deterministic rule evaluation and machine learning model inference. Rules capture known patterns with explicit thresholds — velocity limits, geographic mismatches, blocklist matches — while models capture patterns learned from labelled historical data.

Function Compute orchestrates the decision flow. On receipt of a transaction event, the function reads features from Tair, evaluates the rule set against the feature values, and invokes the model endpoint when the rule outcome is inconclusive. PAI-EAS hosts the trained fraud model behind a low-latency inference endpoint, with autoscaling configured to match transaction throughput. The combined output of rules and model produces a decision — approve, decline, or refer to manual review — which is returned synchronously to the originating transaction system and published asynchronously to downstream consumers for audit and notification.

Model deployment follows a staged rollout pattern through PAI-EAS traffic splitting, where a new model version receives a small fraction of traffic for monitoring before full promotion. Comparison metrics across deployment stages — false positive rate, false negative rate, and decision latency at each percentile — are collected through Cloud Monitor and Log Service.

Response delivery and audit logging

Decisions must be delivered to consumers with the same reliability guarantees as the input stream. EventBridge routes decision events to multiple subscribers: the transaction system that issued the request, downstream alerting systems for manual review queues, and the analytics layer for ongoing model improvement.

Log Service (SLS) captures the full decision context — input event, computed features, rule evaluation results, model score, and final decision — for every transaction. This audit record serves multiple purposes: regulatory reporting, model retraining datasets, investigation of disputed transactions, and detection of model drift over time. SLS indexes the audit records for fast query, with hot data retained for operational investigation and cold data archived to OSS for long-term retention. Analyst dashboards built on Hologres or MaxCompute provide near-real-time visibility into decision volumes, model performance, and emerging fraud patterns.

Closing observations

The architecture’s effectiveness depends less on any single layer than on the latency and consistency contracts between them. Feature freshness must align with the windows that the model was trained on; feature serving latency must leave sufficient budget for model inference; the response path must preserve the same ordering and delivery guarantees as the ingestion path.

Three operational disciplines determine production performance. Feature parity between training and serving — the values computed in Flink and stored in Tair must match the values used in offline model training — prevents the most common source of unexplained model degradation. Rule and model versioning, tracked through deployment automation and surfaced in audit records, supports incident investigation when decision behaviour changes after a release. Continuous monitoring of decision latency at each layer, with alerts when any layer approaches its budget, allows operators to identify capacity issues before they cascade into transaction declines.

Disclaimer: The views expressed herein are for reference only and don’t necessarily represent the official views of Alibaba Cloud.

Community

Real-Time Fraud Detection Pipeline Architecture on Alibaba Cloud

Read previous post:

Read next post:

PM - C2C_Yuan

You may also like

Comments

PM - C2C_Yuan

Related Products

Realtime Compute for Apache Flink

Platform For AI

Epidemic Prediction Solution

Tair (Redis® OSS-Compatible)