This topic describes a complete data transformation process to walk you through the data transformation feature and related operations. Website access logs are used as an example to describe the process.

Prerequisites

  • A project named web-project is created. For more information, see Create a project.
  • A Logstore named website_log is created in the web-project project, and the Logstore is used as the source Logstore. For more information, see Create a Logstore.
  • Website access logs are collected and stored in the website_log Logstore. For more information, see Data collection overview.
  • Destination Logstores are created in the web-project project. The following table lists the details about the destination Logstores.
    Destination Logstore Description
    website-success Logs for successful access are stored in the website-success Logstore, which is configured in the target-success storage destination.
    website-fail Logs for failed access are stored in the website-fail Logstore, which is configured in the target-fail storage destination.
    website-etl Other access logs are stored in the website-etl Logstore, which is configured in the target0 storage destination.
  • If you use a Resource Access Management (RAM) user, you must grant the user the permissions to transform data. For more information, see Authorize a RAM user to manage a data transformation task.
  • Indexes are configured for the source and destination Logstores. For more information, see Configure indexes.
    Note Data transformation does not require indexes. However, if you do not configure indexes, you cannot perform query or analysis operations.

Background information

All access logs of a website are stored in a Logstore. You need to specify different topics for the logs to distinguish between logs for successful access and logs for failed access. In addition, you need to distribute the two types of logs to different Logstores for analysis. Log sample:
body_bytes_sent:1061
http_user_agent:Mozilla/5.0 (Windows; U; Windows NT 5.1; ru-RU) AppleWebKit/533.18.1 (KHTML, like Gecko) Version/5.0.2 Safari/533.18.5
remote_addr:192.0.2.2
remote_user:vd_yw
request_method:DELETE
request_uri:/request/path-1/file-5
status:207
time_local:10/Jun/2021:19:10:59

Step 1: Create a data transformation job

  1. Log on to the Log Service console.
  2. Go to the data transformation page.
    1. In the Projects section, click the project that you want to view.
    2. Choose Log Storage > Logstores. On the Logstores tab, click the Logstore that you want to view.
    3. On the query and analysis page, click Data Transformation.
  3. In the upper-right corner of the page, select a time range for the required log data.
    Make sure that the Raw Logs tab displays log data.
  4. In the editor, enter transformation statements.
    e_if(e_search("status:[200,299]"),e_compose(e_set("__topic__","access_success_log"),e_output(name="target-success")))
    e_if(e_search("status:[400,499]"),e_compose(e_set("__topic__","access_fail_log"),e_output(name="target-fail")))
    The e_if function indicates that the specified operations are performed if the condition is met. For more information, see e_if
    • Condition: e_search("status:[200,299]")

      If the value of the status field meets the condition, the operations 1 and 2 are performed. For more information, see e_search.

    • Operation 1: e_set("__topic__","access_success_log")

      The function adds the __topic__ field and assigns the value access_success_log to the field. For more information, see e_set

    • Operation 2: e_output(name="target-success", project="web-project", logstore="website-success")

      The function stores the transformed data in the website-success Logstore. For more information, see e_output.

  5. Preview transformation results.
    1. Click Quick Rule Creation.
      You can select either Quick or Advanced. For more information, see Preview mode overview
    2. Click Preview Data.
      View the transformation results.
      Note During the preview, logs are written to a Logstore named internal-etl-log instead of the destination Logstores. The first time that you preview transformation results, Log Service automatically creates the internal-etl-log Logstore in the current project. This Logstore is dedicated. You cannot modify the configurations of this Logstore or write other data to this Logstore. This Logstore is not charged.
      Preview transformation results
  6. Create a data transformation job.
    1. Click Save as Transformation Rule.
    2. In the Create Data Transformation Rule panel, configure the following parameters.
      Save the settings of the transformation rule
      Parameter Description
      Rule Name The name of the transformation rule,
      Authorization Method The method used to authorize the data transformation job to read data from the source Logstore. Valid values:
      • Default Role: authorizes the data transformation job to assume the system role AliyunLogETLRole to read data from the source Logstore.

        You must click You must authorize the system role AliyunLogETLRole. Then, you must configure other parameters as prompted to complete the authorization. For more information, see Access data by using a default role.

        Note
        • If the authorization is complete within your Alibaba Cloud account, you can skip this operation.
        • If you use an Alibaba Cloud account that has assumed the role, you can skip this operation.
      • Custom Role: authorizes the data transformation job to assume a custom role to read data from the source Logstore.

        You must grant the custom role the permissions to read from the source Logstore. Then, you must enter the Alibaba Cloud Resource Name (ARN) of the custom role in the Role ARN field. For more information about authorization, see Access data by using a custom role.

      • AccessKey Pair: authorizes the data transformation job to use the AccessKey pair of an Alibaba Cloud account or a RAM user to read data from the source Logstore.
        • Alibaba Cloud account: The AccessKey pair of an Alibaba Cloud account has permissions to read from the source Logstore. You can directly enter the AccessKey ID and AccessKey secret of the Alibaba Cloud account in the AccessKey ID and AccessKey Secret fields. For more information about how to obtain an AccessKey pair, see AccessKey pair.
        • RAM user: You must grant the RAM user the permissions to read from the source Logstore. Then, you can enter the AccessKey ID and AccessKey secret of the RAM user in the AccessKey ID and AccessKey Secret fields. For more information about authorization, see Access data by using AccessKey pairs.
      Storage Target
      Target Name The name of the storage target. Storage Target includes Target Project and Target Logstore.
      Make sure that the value of this parameter is the same as the value of name configured in 4.
      Note By default, Log Service uses the storage destination that is numbered 1 to store the logs that do not meet the specified conditions. In this example, set the value to target0.
      Target Region The region where the destination project resides.

      If you want to perform data transformation across regions, we recommend that you use HTTPS for data transmission. This ensures the privacy of log data.

      For cross-region data transformation, the data is transmitted over the Internet. If the Internet connections are unstable, data transformation delays may occur. You can select DCDN Acceleration to accelerate the cross-region data transmission. Before you can select DCDN Acceleration, make sure that the global acceleration feature is enabled for the project. For more information, see Enable the global acceleration feature.

      Note You are charged for the amount of Internet traffic that is generated when data after compression is transmitted across regions. For more information, see Billable items
      Target Project The name of the destination project to which transformed data is saved.
      Target Logstore The name of the destination Logstore to which transformed data is saved,
      Authorization Method The method used to authorize the data transformation job to write transformed data to the destination Logstore. Valid values:
      • Default Role: authorizes the data transformation job to assume the system role AliyunLogETLRole to write transformed data to the destination Logstore.
        You must click You must authorize the system role AliyunLogETLRole. Then, you must configure other parameters as prompted to complete the authorization. For more information, see Access data by using a default role.
        Note
        • If you use a RAM user, you must use an Alibaba Cloud account to assign the AliyunLogETLRole role to the user.
        • If you use an Alibaba Cloud account that has assumed the role, you can skip this operation.
      • Custom Role: authorizes the data transformation job to assume a custom role to write transformed data to the destination Logstore.

        You must grant the custom role the permissions to write to the destination Logstore. Then, you must enter the ARN of the custom role in the Role ARN field. For more information about authorization, see Access data by using a custom role.

      • AccessKey Pair: authorizes the data transformation job to use the AccessKey pair of an Alibaba Cloud account or a RAM user to write transformed data to the destination Logstore.
        • Alibaba Cloud account: The AccessKey pair of an Alibaba Cloud account has permissions to write to the destination Logstore. You can directly enter the AccessKey ID and AccessKey secret of the Alibaba Cloud account in the AccessKey ID and AccessKey Secret fields. For more information about how to obtain an AccessKey pair, see AccessKey pair.
        • RAM user: You must grant the RAM user the permissions to write to the destination Logstore. Then, you can enter the AccessKey ID and AccessKey secret of the RAM user in the AccessKey ID and AccessKey Secret fields. For more information about authorization, see Access data by using AccessKey pairs.
      Processing Range
      Time Range The time range within which the data is transformed. Valid values:
      Note The value of Time Range depends on the time when logs are received.
      • All: transforms data in the source Logstore from the first log entry until the job is manually stopped.
      • From Specific Time: transforms data in the source Logstore from the log entry that is received at the specified start time until the job is manually stopped.
      • Within Specific Period: transforms data in the source Logstore from the log entry that is received at the specified start time to the log entry that is received at the specified end time.
    3. Click OK.

After logs are distributed to the destination Logstores, you can perform query and analysis operations on the destination Logstores. For more information, see Query and analyze logs.

Step 2: View the data transformation job

  1. In the left-side navigation pane, choose Jobs > Data Transformation.
  2. In the data transformation job list, find and click the job.
  3. On the Data Transformation Overview page, view the details of the job.

    You can view the details and status of the job. You can also modify, start, stop, or delete the job. For more information, see Manage a data transformation job.

    Data transformation job