Log Service allows you to use Function Compute to transform streaming data. You can configure an extract-transform-load (ETL) job to detect data updates and invoke functions. Then the incremental data in a Logstore is consumed and transformed.
Scenarios
You can use Log Service triggers to integrate Function Compute and Log Service in the following scenarios:
- Data cleansing and processing
Log Service allows you to quickly collect, process, query, and analyze logs.
- Data shipping
Log Service allows you to ship data to its destination and build data pipelines between big data services on the cloud.
- Field preprocessing and shipping
- Column store creation and shipping
- Custom processing and result storage
ETL functions
- Function types
- Template functions
For more information, see aliyun-log-fc-functions.
- Custom functions
Function formats are related to function implementations. For more information, see Create a custom function.
- Template functions
- Trigger mechanisms
An ETL job is used to invoke functions. After you create an ETL job for a Logstore in Log Service, a timer is started to poll data from the shards of the Logstore based on the job configurations. If new data is written to the Logstore, a triple data record in the
<shard_id,begin_cursor,end_cursor >
format is generated as a function event. Then the ETL function is invoked.Note If no new data is written to the Logstore and the storage system is updated, the cursor information will change. The ETL function is invoked for each shard but no data is transformed. In this case, you can use the cursor information to obtain data from the shards. If no data is obtained, the ETL function is invoked but no data is transformed. You can ignore the function invocations. For more information, see Create a custom function.An ETL job invokes functions based on the time mechanism. For example, you set the invocation interval in an ETL job to 60 seconds for a Logstore. If data is continuously written to Shard 0, the ETL function is invoked every 60 seconds to transform data that is located in the cursor range of the last 60 seconds.