This topic describes the terms that are related to the data transformation feature.

Basic terms

  • Extract, transform, and load (ETL)

    ETL means to extract data from a system, transform data, and then write data to another system. The main part is to transform data. In Log Service, ETL means to load data from the source Logstore, transform data, and then write data to the destination Logstore. You can also load data from Object Storage Service (OSS), Relational Database Service (RDS), or other Logstores to enrich data.

  • Event, data, and log

    During data transformation, both events and data refer to logs. For example, the event time is the log time, and the drop_event_fields function is used to discard specific log fields.

  • Log time

    The log time is the time when an event occurs. It is also known as the event time. The corresponding reserved field in Log Service is __time__. The value of this field is extracted from the time information in logs. The value is an integer that follows the Unix time standard. It indicates the number of seconds that have elapsed since 00:00:00 Thursday, January 1, 1970 UTC.

  • Log receiving time
    The log receiving time is the time when the Log Service server receives logs. By default, the time is not saved in logs. However, if you turn on the Log Public IP switch for a Logstore, this time is recorded in the tag field __receive_time__. The complete name of this field is __tag__:__receive_time__. The value is an integer that follows the Unix time standard. It represents the number of seconds that have elapsed since 00:00:00 on January 1, 1970, 00:00:00 UTC.
    Note In most scenarios, logs are sent to Log Service in real time. Therefore, the log time and the log receiving time are basically the same. If you import historical logs, the log time and log receiving time are different. For example, if you import logs that are generated in the past 30 days by using a software development kit (SDK), the log receiving time is the current time. It is different from the log time.
  • Tags
    Tags are used to identify logs. Different from other fields, each tag field is prefixed with __tag__:. Log Service supports two types of tags:
    • Custom tags: the tags that you add when you call the PutLogs operation to write data.
    • System tags: the tags that are added by Log Service, including __client_ip__ and __receive_time__.

Configuration-related terms

  • Source Logstore

    During data transformation, a source Logstore is the Logstore whose data is read and transformed.

    You can configure only one source Logstore for a transformation task. However, you can configure the same source Logstore for multiple transformation tasks.

  • Destination Logstore

    During data transformation, a destination Logstore is the Logstore to which data is written.

    You can configure one or more destination Logstores for a transformation task in a static or dynamic manner. For more information, see Dispatch data to multiple target Logstores.

  • DSL for Log Service

    The domain specific language (DSL) for Log Service is a Python-compatible scripting language that is used for the data transformation feature of Log Service. The DSL for Log Service is constructed based on Python. More than 200 built-in functions are written in the DSL to simplify common data transformation tasks. The DSL for Log Service also allows you to extend Python scripts. For more information, see Language introduction.

  • Transformation rule

    A transformation rule is a data transformation script that is orchestrated by using the DSL for Log Service.

  • Transformation task

    A transformation task is the minimum scheduling unit of data transformation. You must configure a source Logstore, one or more destination Logstores, a transformation rule, transformation time range, and other parameters for a transformation task.

Transformation rule-related terms

  • Resources

    Resources refer to third-party data sources that are referenced during data transformation. The resources include but are not limited to local resources, Object Storage Service (OSS), Relational Database Service (RDS), and Logstores other than the source and destination Logstores. These resources are referenced for data enrichment. For more information, see Resource functions.

  • Dimension table

    A dimension table contains certain external dimensions that are used for data enrichment. For example, a dimension table can contain the information of company users, products, or geographic locations. In most cases, dimension tables are included in resources and may be dynamically updated.

  • Enrichment or mapping

    Enrichment or mapping is the process of mapping one or more fields of logs to external information to supplement the incomplete information in logs.

    For example, a request log entry contains the status field that specifies the HTTP status code. You can map the field to the HTTP status description and generate the status_desc field. The following table shows the field values before and after enrichment.
    Field before enrichment Field after enrichment
    status status_desc
    200 Success
    300 Redirect
    400 Permission error
    500 Server error

    If a source log entry contains the user_id field, you can use the user_id field to map relevant fields from an external user account dimension table. For example, you can map the user name, gender, registration time, and email fields, and write the fields to one or more destination Logstores. For more information, see Data mapping and enrichment functions.

  • Splitting

    Event splitting is the process of splitting a log entry into multiple log entries if the source log entry contains multiple pieces of information.

    For example, a log entry contains the following fields:
    __time__: 1231245
    __topic: "win_logon_log"
    content: 
    [ {
      "source": "1.2.3.4",
      "dest": "1.2.3.4"
      "action": "login",
      "result": "pass"
    },{
      "source": "1.2.3.5",
      "dest": "1.2.3.4"
      "action": "logout",
      "result": "pass"
    }
    ]
    It can be split into the following two log entries:
    __time__: 1231245
    __topic: "win_logon_log"
    content: 
    {
      "source": "1.2.3.4",
      "dest": "1.2.3.4"
      "action": "login",
      "result": "pass"
    }
    __time__: 1231245
    __topic: "win_logon_log"
    content: 
    {
      "source": "1.2.3.5",
      "dest": "1.2.3.4"
      "action": "logout",
      "result": "pass"
    }
  • Grok

    Grok patterns are alternatives to complex regular expressions.

    For example, the grok("%{IPV4}") pattern is used to map IPv4 addresses. It is equivalent to the regular expression "(? <![ 0-9])(?:(?:[0-1]?[ 0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[ 0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[ 0-9]{1,2}|2[0-4][0-9]|25[0-5])[.](?:[0-1]?[ 0-9]{1,2}|2[0-4][0-9]|25[0-5]))(?![ 0-9])". For more information, see Grok function.

  • Content capturing by using a regular expression

    You can use a regular expression to capture the specified content in a field and include the content in a new field. During data transformation, the e_regex function can be used for this purpose.

    For example, the e_regex("content", "(? P<email>[a-zA-Z][a-zA-Z0-9_.+-=:]+@\w+\.com)") function extracts the email address from the content field and includes the extracted email address in the email field. The email address information is extracted by using a common regular expression. We recommend that you use the following grok pattern to simplify the extraction: e_regex("content", grok("%{EMAILADDRESS:email}"). For more information, see Regular expressions.