Data Lake Analytics (DLA) provides the automatic ActionTrail log data cleansing feature. This feature converts the log files that ActionTrail delivers to Object Storage Service (OSS) into the data tables that you can directly query in DLA. After the conversion, DLA automatically partitions and compresses data in these data tables. This facilitates analysis and audit operations on Alibaba Cloud services.

Pain points of log analysis

ActionTrail is an Alibaba Cloud service that can be used to query operations logs of resources within your Alibaba Cloud account. It can also be used to deliver these logs to Log Service or OSS. ActionTrail applies to a variety of scenarios, such as security analysis, resource change tracking, and compliance audit. You can view operations logs of each Alibaba Cloud service in the ActionTrail console. For logs that are generated within 30 days, ActionTrail allows you to deliver these logs to Log Service for data analysis. For logs that are generated more than 30 days, ActionTrail allows you to deliver these logs to OSS. If you want to directly analyze log data in OSS, you may have the following paint points:

  • Complex log data format

    Log data stored in ActionTrail is in the JSON format. Multiple data records are saved as an array in one row, for example, [{"eventId":"event0"...},{"eventId":"event1"...}].

    Theoretically, you can directly analyze data in this format. In practical applications, you must split data of each row into multiple data records before you analyze the data.

  • Excessive log files

    When you use an Alibaba Cloud account or a RAM user to frequently perform operations on an Alibaba Cloud service, a large number of operations log files are generated every day. For example, if you use an Alibaba Cloud account or a RAM user to perform operations on DLA, thousands of log files are generated for the account or RAM user every day and hundreds of thousands of log files are generated within one month. As a result, big data analysis is inconvenient, time-consuming, and resource-intensive.

Prerequisites

Before you use the automatic ActionTrail log data cleansing feature, make sure that the following prerequisites are met.
Note ActionTrail, OSS, and DLA must be deployed in the same region.

Step 1: Create a schema

  1. Log on to the DLA console.
  2. In the top navigation bar, select the region where the DLA virtual cluster (VC) is deployed.
  3. In the left-side navigation pane, choose Data Lake Management > Data into the lake. On the Data into the lake page, click Go To the Wizard in the ActionTrail Log Cleaning section.
  4. On the ActionTrail Log Cleaning page, configure the parameters as prompted.
    ActionTrail File Root The directory where log data that ActionTrail delivers to OSS is saved.

    The directory name must end with AliyunLogs/Actiontrail/.

    • Select Location: allows you to specify the directory where log data that ActionTrail delivers to OSS is saved.
    • Auto Discovery: enables DLA to automatically specify the directory where log data that ActionTrail delivers to OSS is saved.
    Schema Name The name of the schema. This parameter specifies the name of the DLA database that is mapped to the database in OSS.
    Data Storage Location After Cleaning The OSS directory to which the result data is written back after OSS log data is cleansed.
    • The value of this parameter is automatically specified.
    • You can also specify this parameter if required.
    Data Cleaning Time The time at which DLA cleanses OSS log data every day.

    By default, data cleansing is enabled at 00:30. To prevent your business from being affected during data cleansing, we recommend that you set this parameter to a time at off-peak hours based on your business requirements.

  5. After you configure the preceding parameters, click Create to create a schema.

After the schema is created, the log data that ActionTrail delivers to OSS has not been synchronized to DLA because the table for data synchronization has not been created in DLA. You must click Synchronize Now on the Configuration tab of the Metadata management page to create a table for data synchronization.

Step 2: Synchronize log data

After you create a schema, click Synchronize Now to synchronize log data. You can also perform the following steps to synchronize log data if required:

  1. Log on to the DLA console.
  2. In the top navigation bar, select the region where the DLA VC is deployed.
  3. In the left-side navigation pane, choose Data Lake Management > Metadata management.
  4. On the Metadata management page, find the destination data source and click Library table details in the Actions column.
  5. On the Metadata management page, click the Configuration tab.
  6. Click Synchronize Now to start data synchronization.

    On the Configuration tab, click Update to update the schema configuration.

  7. Click the Table tab to view the data synchronization information.

    After ActionTrail log data is synchronized to DLA, you can use the standard SQL syntax of DLA to analyze the log data.