All Products
Search
Document Center

CloudFlow:Reliably process distributed multi-step transactions

Last Updated:Oct 30, 2023

This topic describes how to use Serverless workflow to guarantee that distributed transactions are reliably processed in a complex flow, helping you focus on your business logic.

Overview

In complex scenarios involving order management, such as e-commerce websites, hotel booking, and flight reservations, applications need to access multiple remote services, and have high requirements for the operational semantics of transactions. In other words, all steps must succeed or fail without intermediate states. In applications with small traffic and centralized data storage, the atomicity, consistency, isolation, durability (ACID) properties of relational databases can guarantee that transactions are reliably processed. However, in large-traffic scenarios, distributed microservices are usually used for high availability and scalability. To guarantee reliable processing of multi-step transactions, the service providers usually need to introduce queues and persistent messages and display the flow status to the distributed architecture. This brings additional development and O&M costs. To resolve the preceding problems, Serverless workflow provides guarantee on reliable processing of distributed transactions in complex flows.

Scenarios

Assume that an application provides the train ticket, flight, and hotel booking feature and ensures that the transactions are reliably processed in three steps. Three remote calls are required to implement this feature (for example, you must call the 12306 API to book a train ticket). If all the three calls are successful, the order is successful. However, any of the three remote calls may fail. Therefore, the application must have compensation logic for different failure scenarios to roll back completed operations. The following figure shows the details.

  • If BuyTrainTicket is successful but ReserveFlight fails, the application calls CancelTrainTicket and notifies the user that the order failed.
  • If both BuyTrainTicket and ReserveFlight are successful but ReserveHotel fails, the application calls CancelFlight and CancelTrainTicket and notifies the user that the order failed.
longtxn-saga_train_flight_hotel

Implementation in Serverless workflow

In the following example, a function deployed in Function Compute is orchestrated into a flow in Serverless workflow to implement a reliable multi-step complex flow in three steps:

  1. Create a function in Function Compute.
  2. Create a flow.
  3. Execute the flow and view the result.

Step 1: Create a function in Function Compute to simulate the BuyTrainTicket, ReserveFlight, and ReserveHotel operations

Create a function in Python 2.7. For more information, see Quickly create a function. We recommend that you name the service and function in Function Compute to the following names respectively:
  • Service: fnf-demo
  • Function: Operation

The Operation function simulates the operations such as ReserveFlight, and ReserveHotel. The Operation result (success or failure) is determined by the input.

import json
import logging
import uuid

def handler(event, context):
  evt = json.loads(event)
  logger = logging.getLogger()
  id = uuid.uuid4()
  op = "operation"
  if 'operation' in evt:
    op = evt['operation']
    if op in evt:
      result = evt[op]
      if result == False:
        logger.info("%s failed" % op)
        exit()
  logger.info("%s succeeded, id %s" % (op, id))
  return '{"%s":"success", "%s_txnID": "%s"}' % (op, op, id)         

Step 2: Create a flow

In the Serverless workflow console, perform the following steps to create a flow:

  1. Configure a Resource Access Management (RAM) user for the flow.
    {
        "Statement": [
            {
                "Action": "sts:AssumeRole",
                "Effect": "Allow",
                "Principal": {
                    "Service": [
                        "fnf.aliyuncs.com"
                    ]
                }
            }
        ],
        "Version": "1"
    }                               
  2. Define the flow.
    version: v1
    type: flow
    steps:
      - type: task
        resourceArn: acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation
        name: BuyTrainTicket
        inputMappings:
        - target: operation
          source: buy_train_ticket
        - target: buy_train_ticket
          source: $input.buy_train_ticket_result
        catch: 
        - errors:
          - FC.Unknown
          goto: OrderFailed
      - type: task
        resourceArn: acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation
        name: ReserveFlight
        inputMappings:
        - target: operation
          source: reserve_flight
        - target: reserve_flight
          source: $input.reserve_flight_result
        catch:  # When the FC.Unknown error thrown by the ReserveFlight task is captured, Serverless Workflow jumps to the CancelTrainTicket task.
        - errors:
          - FC.Unknown
          goto: CancelTrainTicket
      - type: task
        resourceArn: acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation
        name: ReserveHotel
        inputMappings:
        - target: operation
          source: reserve_hotel
        - target: reserve_hotel
          source: $input.reserve_hotel_result
        retry:  # Serverless Workflow retries the task step up to three times in the exponential backoff mode upon an FC.Unknown error. The initial retry interval is 1s, and the next retry interval is twice the previous retry interval for the rest of the retries.
        - errors:
          - FC.Unknown
          intervalSeconds: 1
          maxAttempts: 3
          multiplier: 2
        catch:  # When the FC.Unknown error thrown by the ReserveHotel task is captured, Serverless Workflow jumps to the CancelFlight task.
          - errors:
            - FC.Unknown
            goto: CancelFlight
      - type: succeed
        name: OrderSucceeded
      - type: task
        resourceArn: acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation
        name: CancelFlight
        inputMappings:
        - target: operation
          source: cancel_flight
        - target: reserve_flight_txnID
          source: $local.reserve_flight_txnID
      - type: task
        resourceArn: acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation
        name: CancelTrainTicket
        inputMappings:
        - target: operation
          source: cancel_train_ticket
        - target: reserve_flight_txnID
          source: $local.reserve_flight_txnID
      - type: fail
        name: OrderFailed                              

Step 3: Execute the flow and view the result

Execute the flow you created in the console. The inputs for the StartExecution operation must be in JSON format. The following JSON objects can simulate the success or failure of each step. For example, "reserve_hotel_result":"fail" indicates a failure to reserve a hotel. StartExecution is an asynchronous operation. After the operation is called, Serverless workflow returns an execution name for you to query the flow execution status.

{
  "buy_train_ticket_result":"success",
  "reserve_flight_result":"success",
  "reserve_hotel_result":"fail"
}                       

After the flow execution starts, in the Serverless workflow console, click the target execution name. On the page that appears, view the execution process and results in the Definition and Visual Workflow section. As shown in the following figure, due to "reserve_hotel_result":"fail", ReserveHotel fails, and Serverless workflow calls CancelFlight and CancelTrainTicket in sequence based on the flow definition. In Serverless workflow, each step is persistent. In this way, failures such as network interruption or unexpected process exits do not affect the transactions in the flow.

Screen Shot 2019-06-26 at 12.14.50 PM

An execution event is generated for each flow execution. You can call the GetExecutionHistory operation to query the execution events in the console or by using the SDK or command-line interface (CLI).

Screen Shot 2019-06-26 at 12.17.26 PM

Error handling and retries

  1. In the preceding example, remote calls of ReserveFlight and ReserveHotel fail due to network or service errors. Retry upon transient errors can improve the success rate of the ordering flow. Serverless workflow automatically retries task steps. For example, define the ReserveHotel step based on the following code to retry the step in exponential backoff mode after the FC.Unknown is captured. If ReserveHotel still fails after the maximum number of retries, based on the catch definition of the step, Serverless Workflow captures the FC.Unknown error thrown by the ReserveHotel function and then jumps to the CancelFlight operation and implements the defined compensation logic.
      - type: task
        resourceArn: acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation
        name: ReserveHotel
        inputMappings:
        - target: operation
          source: reserve_hotel
        retry:  # Serverless Workflow retries the task step up to three times in the exponential backoff mode upon an FC.Unknown error. The initial retry interval is 1s, and the next retry interval is twice the previous retry interval for the rest of the retries.
        - errors:
          - FC.Unknown
          intervalSeconds: 1
          maxAttempts: 3
          multiplier: 2
        catch:  # When the FC.Unknown error thrown by the ReserveHotel task is captured, Serverless Workflow jumps to the CancelFlight task.
          - errors:
            - FC.Unknown
            goto: CancelFlight           
  2. The following figure shows that, after the retry parameter is defined, the ReserveHotel task step is retried the specified maximum number of times.Screen Shot 2019-06-26 at 12.19.55 PM

Data transfer between steps

  1. After ReserveHotel fails, CancelFlight and CancelTrainTicket are called. To cancel these two tasks, the transaction IDs (txnID) returned by ReserveFlight and BuyTrainTicket are required. The following section describes how to use the inputMapping object to pass the outputs of the previous steps to the CancelFlight step.
      - type: task
        resourceArn: acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation
        name: CancelFlight
        inputMappings:
        - target: operation
          source: cancel_flight
        - target: reserve_flight_txnID
          source: $local.reserve_flight_txnID
                        
  2. Outputs of each step of the flow are stored in the local object of EventDetail in the StepExited event.
      {  
         "input":{
            "operation":"reserve_hotel",
            "reserve_hotel_result":"fail"
         },
         "local":{
            "buy_train_ticket":"success",
            "buy_train_ticket_txnID":"d37412b3-bb68-4d04-9d90-c8c15643d45e",
            "reserve_flight_result":"success",
            "reserve_flight_txnID":"024caecf-cfa3-43a6-b561-9b6fe0571b55"
         },
         "resourceArn":"acs:fc:{region}:{accountID}:services/fnf-demo/functions/Operation",
         "cause":"{\"errorMessage\":\"Process exited unexpectedly before completing request (duration: 12ms, maxMemoryUsage: 9.18MB)\"}",
         "error":"FC.Unknown",
         "retryCount":3,
         "goto":"CancelFlight"
      }         
  3. Based on EventDetail and inputMappings, the inputs of the CancelFlight step are converted into the following JSON object. In this way, the inputs of the CancelFlight function contain the reserve_flight_txnID field.
      "input":{
        "operation":"cancel_flight",
        "reserve_flight_txnID":"024caecf-cfa3-43a6-b561-9b6fe0571b55"
      }