The data consumer terminal of Log Service custom ETL function is running on the Alibaba Cloud Function Compute service. You can use function templates provided by Log Service or user-defined functions according to different ETL purposes.

This document explains how to implement a user-defined Log Service ETL function.

Function event

The function event is a collection of input parameters used to run a function, and is in the format of a serialized JSON Object string.

Field descriptions
  • jobName field

    The name of the Log Service ETL job. A Log Service trigger on the Function Compute service corresponds to a Log Service ETL job.

  • taskId field

    For an ETL job, taskId is the identifier of a deterministic function call.

  • cursorTime field

    The unix_timestamp when Log Service receives the last log of the data contained in this function call.

  • source field

    This field is generated by Log Service. Log Service regularly triggers function  This field is generated by Log Service. Log Service regularly triggers function execution based on the task interval defined in the ETL job. The source field is an important part of the function event. This field defines the data to be consumed by this function call.

    This data source range is composed of the following fields (for more information about the related field definitions, see Log Service glossary).

    Field Description
    endpoint The Service endpoint of the region where the Log Service project resides. Service endpoint
    projectName Project name
    logstoreName Logstore name 
    Shardid Identifies a definite shard in the Logstore
    beginCursor The shard location from which to start consuming data
    endCursor The shard location where data consumption ends
    Note The [beginCursor, endCursor) of a shard is a left-closed and right-opened interval.
  • parameter field

    This JSON Object field is set when you create the ETL  job (Log Service trigger of Function Compute).  This field is parsed during user-defined function operations to obtain runtime parameters required by the function.

    Set this field in the Function Configuration field when you create a Log Service trigger in the Function Compute console.
    Figure 1. Function configuration


Example of function event

{
    "source": {
        "endpoint": "http://cn-shanghai-intranet.log.aliyuncs.com", 
        "projectName": "fc-1584293594287572", 
        "logstoreName": "demo", 
        "shardId": 0, 
        "beginCursor": "MTUwNTM5MDI3NTY1ODcwNzU2Ng==", 
        "endCursor": "MTUwNTM5MDI3NTY1ODcwNzU2OA=="
    }, 
    "parameter": {
        ...
    }, 
    "jobName": "fedad35f51a2a97b466da57fd71f315f539d2234", 
    "taskId": "9bc06c96-e364-4f41-85eb-b6e579214ae4",
    "cursorTime": 1511429883
}

When debugging a function, you can obtain the cursor by using the GetCursor API and manually assemble a function event for testing according to the preceding format.

Function development

You can implement functions by using many languages such as Java, Python, and Node.js. Log Service provides the corresponding runtime SDKs in various languages to facilitate function integration.

In this section, use Java 8 runtime as an example to  show how to develop a Log Service ETL function. As this involves details of Java 8 function programming, read the Java programming guide for Function Compute first.

Java function Template

Currently, Log Service provides user-defined ETL function templates based on the Java 8 execution environment. You can use these templates to implement the custom requirements.

The templates have already implemented the following functions:

  • Parse the source, taskId, and jobName fields in the function event.
  • Use the Log Service Java SDK to pull data based on the data source defined in source and call the processData API to process each batch of data.

In the template, you must also implement the following functions:

  • Use UserDefinedFunctionParameter.java  to parse the parameter field in the function event.
  • Use the processData API of UserDefinedFunction.java to customize the data business logic in the function.
  • Replace UserDefinedFunction with a name that properly describes your function.
processData method implementation

In processData, you must consume, process, and deliver the data batch according to your specific needs.

SeeLogstoreReplication, which reads data from one Logstore and writes it to another Log Service Logstore.

Notes
Note
  1. If data is successfully processed by using processData, true is returned. If an exception occurs when data is processed and the exception persists after the retry, false is returned. However, in this case, the function continues to run and Log Service judges it as a successful ETL task, ignoring the incorrectly processed data
  2. When a fatal error occurs or the business logic determines that function execution must be terminated prematurely, use the Throw Exception method to exit function execution. Log Service can detect a function operation exception and call function execution again based on the ETL job rules.

Instructions

  • When shard traffic is high, configure sufficient memory for the function to prevent an abnormal termination because of function OOM.
  • If time-consuming operations are performed in a function or shard traffic is high, set a short function trigger interval and long function operation timeout threshold.
  • Grant sufficient permissions to function services. For example, to write Object Storage Service (OSS) data in the function, you must grant the OSS write permission to the function service.

ETL logs

  • ETL scheduling logs

    Scheduling logs only record the start time and end time of the ETL task, whether or not the ETL task is successful, and the successfully returned information of the ETL task. If an ETL task encounters an error, it generates an ETL error log and sends an alert email or text message to the system administrator. When creating a trigger, set the trigger log Logstore and activate the index query function for this Logstore.

    Function execution statistics can be written out and returned by functions, such as the Java 8 function outputStream. The default template provided by Log Service writes a serialized JSON  Object string. The string is recorded in the ETL task scheduling logs, which facilitates your statistics and query.

  • ETL process logs

    This log records the key points and errors for each step in the ETL execution process, including step start and end times, initialization operation completion, and module error information. The ETL process log keeps you up to date on the ETL operation situation at all times. If an error occurs, you can immediately locate the cause in the process log.

    You can use context.getLogger() to record the process logs to the specific project and Logstore of Log Service. We recommend that you enable the index and query functions for this Logstore.