You can integrate Serverless workflow with multiple Alibaba Cloud services. When cloud services are used as execution nodes of task steps in Serverless workflow, you can troubleshoot execution errors based on your business scenarios by catching errors or retrying tasks. This ensures the stable execution of your tasks in production scenarios. This topic describes how to troubleshoot errors in Serverless Workflow in different business scenarios.

Troubleshooting methods

You can use task steps in Serverless workflow to catch errors and retry or redirect tasks after errors are caught. For more information, see Task steps.

  • Retry a task after an error is caught.
    steps:
      - type: task
        name: hello
        resourceArn: acs:fc:{region}:{accountID}:xxx
        retry:
          - errors:
              - FnF.ALL
            intervalSeconds: 10
            maxIntervalSeconds: 300
            maxAttempts: 3
            multiplier: 2
    Table 1. Parameters for retrying a task after an error is caught
    Parameter Description
    retry Specifies that the task is retried after an error is caught.
    errors The list of errors to be caught.
    intervalSeconds The initial interval between retry attempts. Maximum value: 86400. Default value: 1. Unit: seconds.
    maxAttempts The maximum number of retry attempts. Default value: 3.
    multiplier The multiplier by which the retry interval increases during each attempt. Default value: 2. In the preceding sample code, the second retry attempt is performed after 20 seconds, and the third retry attempt is performed after 40 seconds.
  • Redirect a task after an error is caught.
    steps:
      - type: task
        name: hello
        resourceArn: acs:fc:{region}:{accountID}:xxx
        errorMappings:
          - target: errMsg
            source: $local.cause # This value is reserved for the system and can be directly used when an error occurs in this step. 
          - target: errCode
            source: $local.error # This value is reserved for the system and can be directly used when an error occurs in this step. 
        catch:
          - errors:
            - FnF.ALL
            goto: final
    Table 2. Parameters for redirecting a task after an error is caught
    Parameter Description
    errorMappings The error fields in this step that can be passed in the redirection.
    catch The policy based on which errors are caught in the task.
    errors The list of errors to be caught.
    goto The object to which the task is redirected after the task throws an error.

Use Function Compute as an execution node of a task in Serverless workflow

When Function Compute serves as an execution node of a task in Serverless workflow, take note of the following error types:
  • Exceptions prompted by Function Compute
  • Function code errors
To catch these errors, you can specify errors in a task in Serverless workflow.

Common system errors prompted by Function Compute or Serverless Workflow

The following code describes common error types:
- errors:
  - FC.ResourceThrottled
  - FC.ResourceExhausted
  - FC.InternalServerError
  - FC.Unknown
  - FnF.TaskTimeout
  - FnF.ALL
Table 3. Common error types
Error type Description
FC.{ErrorCode} Function Compute returns HTTP status codes other than 200. The following common error types are included:
  • FC.ResourceThrottled: Your functions are throttled due to high concurrency. All your functions are controlled by a total concurrency value. Serverless workflow invokes Function Compute when the task node is executed. The total concurrency value is combined with the concurrency values of other invocation methods. You can apply to modify the value.
  • FC.ResourceExhausted: Your functions are throttled due to insufficient resources. Contact us when errors of this type occur.
  • FC.InternalServerError: A system error occurs on Function Compute. Execute the flow again.
Note {Error code} indicates the error code of Function Compute. For more information, see Error codes.
FC.Unknown Function Compute has invoked the function, but an error occurred during the function execution and the error was not caught. Example: UnhandledInvocationError.
{CustomError} Function Compute has invoked the function, but the function threw an exception.
FnF.TaskTimeout The execution of a step in Serverless workflow times out.
FnF.ALL All errors in Serverless workflow are caught.
FnF.Timeout The overall execution in Serverless workflow times out.

Custom error types

In addition to common errors in Function Compute and Serverless workflow, you can also customize error types. You can edit function code to throw an exception and pass the state or error of an execution to Serverless workflow. Then, Serverless workflow retries or redirects the task based on the flow. The following function code shows how to customize an error type in Python and specifies how to retry tasks that throw this type of error in Serverless workflow. To handle an error of a custom type, perform the following operations:
  1. Customize an error type in function code.
    ...
    class ErrorNeedsRetry(Exception):
        pass
      
    def handler(event, context):
        try:
            # do sth
        except ServerException:
            raise ErrorNeedsRetry("custom error message")
  2. Modify the task step in Serverless workflow to catch the error and retry the task.
    retry:
      - errors:
          - ErrorNeedsRetry
      intervalSeconds: 10
      maxAttempts: 3
      multiplier: 2

Use another cloud service such as MNS as an execution node of a task

If a cloud service of a third party is used as an execution node of a task, Serverless workflow directly calls an API operation of the service to distribute the task.

When Message Service (MNS) is used as the execution node, Serverless workflow calls the SendMessage operation of MNS to send messages. For more information, see SendMessage. In most cases, you can call API operations to execute such tasks. Results of function executions are not expected. After an error is caught, retry attempts are performed in Serverless workflow for up to the specified number of times. When you use cloud services such as MNS and Visual Intelligence API as execution nodes, you do not need to handle errors in the flow.