edit-icon download-icon

Development guide for ETL function

Last Updated: Apr 04, 2018

The data consumer terminal of Log Service custom ETL function is running on the Alibaba Cloud Function Compute service. You can use function templates provided by Log Service or user-defined functions according to different ETL purposes.

This document introduces how to implement a user-defined Log Service ETL function.

Function event

The function event is a collection of input parameters used to run a function, and is in the format of a serialized JSON Object string.

Field descriptions

  • jobName field

    The name of the Log Service ETL job. A Log Service trigger on the Function Compute service corresponds to a Log Service ETL job.

  • taskId field

    For an ETL job, taskId is the identifier of a deterministic function call.

  • cursorTime field

    The unix_timestamp when Log Service receives the last log of the data contained in this function call.

  • source field

    This field is generated by Log Service. Log Service regularly triggers function execution based on the task interval defined in the ETL job. The source field is an important part of the function event. This field defines the data to be consumed by this function call.

    This data source range is composed of the following fields (for more information about the related field definitions, see Log Service glossary).

    FieldDescription
    endpoint The Service endpoint of the region where the Log Service project resides.
    projectName The project name.
    logstoreName The Logstore name.
    shardIdA specific shard in the Logstore.
    beginCursor The shard location where to start consuming data.
    endCursor The shard location where to finish consuming data.

    Note: The [beginCursor, endCursor) of a shard is a left-closed and right-opened interval.

  • parameter field

    This JSON Object field is set when you create the ETL job (Log Service trigger of Function Compute). When the user-defined function is running, this field is parsed to obtain the operating parameters required by the function.

    Set this field in the Function Configuration field when you create a Log Service trigger in the Function Compute console.

    1

Example of function event

  1. {
  2. "source": {
  3. "endpoint": "http://cn-shanghai-intranet.log.aliyuncs.com",
  4. "projectName": "fc-1584293594287572",
  5. "logstoreName": "demo",
  6. "shardId": 0,
  7. "beginCursor": "MTUwNTM5MDI3NTY1ODcwNzU2Ng==",
  8. "endCursor": "MTUwNTM5MDI3NTY1ODcwNzU2OA=="
  9. },
  10. "parameter": {
  11. ...
  12. },
  13. "jobName": "fedad35f51a2a97b466da57fd71f315f539d2234",
  14. "taskId": "9bc06c96-e364-4f41-85eb-b6e579214ae4",
  15. "cursorTime": 1511429883
  16. }

When debugging a function, you can obtain the cursor by using the GetCursor API and manually assemble a function event for testing according to the preceding format.

Function development

You can implement functions by using many languages such as Java, Python, and Node.js. Log Service provides the corresponding runtime SDKs in various languages to facilitate function integration.

In this section, use Java 8 runtime as an example to show how to develop a Log Service ETL function. As this involves details of Java 8 function programming, read the Java programming guide for Function Compute first.

Java function templates

Currently, Log Service provides user-defined ETL function templates based on the Java 8 execution environment. You can use these templates to implement the custom requirements.

The templates have implemented the following functions:

  • Parse the source, taskId, and jobName fields in the function event.
  • Use the Log Service Java SDK to pull data based on the data source defined in source and call the processData API to process each batch of data.

In the template, you must also implement the following functions:

  • Use UserDefinedFunctionParameter.java to parse the parameter field in the function event.
  • Use the processData API of UserDefinedFunction.java to customize the data business logic in the function.
  • Replace UserDefinedFunction with a name that properly describes your function.

processData method implementation

In processData, you must consume, process, and ship a batch of data as per your needs.

See LogstoreReplication, which reads data from one Logstore and writes it to another Log Service Logstore.

Notes:

  • If data is successfully processed by using processData, true is returned. If an exception occurs when data is processed and the exception persists after the retry, false is returned. However, in this case, the function continues to run and Log Service judges it as a successful ETL task, ignoring the incorrectly processed data.
  • When a fatal error occurs or the business logic determines that function execution must be terminated prematurely, use the Throw Exception method to exit function execution. Log Service can detect a function operation exception and call function execution again based on the ETL job rules.

Instructions

  • When shard traffic is high, configure sufficient memory for the function to prevent an abnormal termination because of function OOM.
  • If time-consuming operations are performed in a function or shard traffic is high, set a short function trigger interval and long function operation timeout threshold.
  • Grant sufficient permissions to function services. For example, to write Object Storage Service (OSS) data in the function, you must grant the OSS write permission to the function service.

ETL logs

ETL scheduling logs

Scheduling logs only record the start time and end time of the ETL task, whether or not the ETL task is successful, and the successfully returned information of the ETL task. If an ETL task encounters an error, Function Compute service generates the ETL error log and sends an alarm email or SMS to the system administrator. When creating a trigger, set the trigger log Logstore and enable the query and index functions for this Logstore.

Function execution statistics can be written out and returned by functions, such as the Java 8 function outputStream. The default template provided by Log Service writes a serialized JSON Object string. The string is recorded in the ETL task scheduling logs, which facilitates your statistics and query.

ETL process logs

ETL process logs record the key points and errors for each step in the ETL execution process, including the start time, end time, initialization completion, and module error information in a step. You can use the ETL process logs to detect the ETL operation situation all the time and troubleshoot the error in time.

You can use context.getLogger() to record the process logs to the specific project and Logstore of Log Service. We recommend that you enable the index and query functions for this Logstore.

Thank you! We've received your feedback.