The data processing feature provided by Log Service uses consumer groups to consume log data and orchestrates code by using more than 200 built-in functions to process log data. This topic describes how log data is scheduled during data processing and how the processing rule engine works.

Scheduling principles

The data processing feature of Log Service uses a consumer group to consume log data in the source Logstore in streaming mode, processes each log based on the specified processing rule, and then writes the processed log data to the destination Logstore.

Scheduling principles
  • Scheduling mechanism

    For each processing rule, the data processing scheduler starts one or more running instances. Each running instance behaves as a consumer to consume data in one or more shards of the source Logstore. The scheduler determines the number of concurrent running instances based on the memory and CPU used by running instances. The maximum number of running instances that the scheduler can start is the same as the number of shards in the source Logstore.

  • Running instances

    Running instances read source log data from the shards allocated to them based on your configuration. After processing log data by using the specified processing rule, running instances write the processed log data to the destination Logstore. You can configure processing rules to enrich log data by using external resources. Based on the consumer group mechanism, running instances can record data consumption checkpoints in shards. In this case, they can continue to consume data from the last recorded checkpoint if consumption is interrupted unexpectedly.

  • Task stopping
    • If you do not set the end time of a processing task, running instances do not exit by default and the task does not stop.
    • If you set the end time of a processing task, running instances automatically exit after processing all logs received until the end time, and the task stops.
    • If a task is stopped and then restarted, running instances continue to consume data from the last recorded checkpoint by default.

Working principles of the rule engine: Basic operations

You can use built-in functions provided by LOG domain specific language (DSL) to write processing rules. Each function can be considered as a processing step. The rule engine calls the functions of a processing rule in sequence. For example, the following four functions define four steps for a processing rule:
e_set("log_type", "access_log")
e_drop_fields("__action")
e_if(e_search("ret: pass"), e_set("result", "pass"))
e_if(e_search("ret: unknown"), DROP)
The following figure shows the corresponding logic.
  • Basic logic

    The rule engine calls each event function defined in the rule in sequence. Each function processes and modifies each event, and returns a processed event.

    For example, the e_set("log_type", "access_log") function adds the log_type field whose value is access_log to each event. Then, the next function receives each processed event that contains the log_type field.

  • Condition-based judgment

    You can set conditions in steps. If an event does not meet a condition in a step, the event skips this step.

    For example, the e_if(e_search("ret: pass"), e_set("result", "pass")) function first checks whether the value of the ret field contains pass for each event. If no, no operations are performed and the event skips this step. If yes, the function sets the value of the result field to pass for the event.

  • Processing stopping

    If a function does not return a processed event, the source event is deleted.

    For example, the e_if(e_search("ret: unknown"), DROP) function discards an event in which the value of the ret field is unknown. After the event is discarded, the rule engine no longer calls subsequent functions for this event, but automatically starts to process the next event.

Working principles of the rule engine: Output, duplication, and splitting

The rule engine also supports event output, duplication, and splitting. For example, the following four functions define four steps for a processing rule:
e_coutput("archive_Logstore") )
e_split("log_type")
e_if(e_search("log_type: alert"), e_output("alert_Logstore") )
e_set("result", "pass")
Assume that the following log is to be processed:
log_type: access,alert
content: admin login to database.
The following figure shows the corresponding logic.Logic diagram for the working principles of the rule engine
  • Event output

    Event output can be considered as a special way to stop processing an event. For example, in step 3, the e_output("alert_Logstore") function is called to write an event in which the value of the log_type field is alert to the specified destination Logstore in advance and delete the event. In this case, the rule engine no longer calls subsequent functions for this event.

  • Event duplication and output

    The e_coutput function duplicates the current event and writes the duplicated event to the specified destination Logstore. In this case, the rule engine continues to call subsequent functions for the original event. For example, in step 1, the duplicates of all received logs are written to the destination Logstore archive_Logstore.

  • Event splitting for concurrent processing

    In the source event, the value of the log_type field is access and alert. In step 2, the e_split("log_type") function splits the event into two events based on the value of the log_type field. The only difference between the two split events is the value of the log_type field, that is, the value is access in an event and alert in the other.

    The events generated after the splitting are respectively processed in the subsequent steps.