This topic describes how to analyze a large number of historical logs by using Log Service, Object Storage Service (OSS), and Data Lake Analytics (DLA) at low costs.

Introduction

Logs are a special type of data that plays an important role in analyzing historical data, diagnosing errors, and tracing system activities. Logs are essential data sources for data analysts, developers, and O&M personnel. A commonly used method to reduce costs is to retain logs over a specified period of time and analyze only the logs retained during this period of time. These logs are called hot logs. This method meets short-term requirements. However, in the long run, this method causes a large number of historical logs to be discarded. Some data is missing when you analyze and query data.

DLA provides a lightweight and refined solution to analyze historical logs. In this solution, Log Service collects hot logs and ships these logs to OSS, OSS stores the logs, and DLA analyzes the logs stored in OSS.

This solution provides the following benefits:
  • Log Service is an end-to-end logging service for real-time data and is widely used in big data scenarios. It provides a variety of features, such as log data collection, intelligent data query and analysis, and data consumption and shipping. Log Service comprehensively improves the capabilities to process and analyze large amounts of log data. Log Service supports logs in various formats and can ship these logs to OSS.
  • OSS is a cost-effective storage system. You can specify the period of time to store log files in OSS. OSS can be used with Log Service to separately store hot and cold data at low costs.
  • DLA can partition the logs that are shipped to OSS by year, month, or day. This helps you obtain the logs required for analysis in a short period of time and reduces the number of bytes that are scanned. This way, DLA can efficiently analyze a large number of historical logs at low costs.

Procedure

Your application services are deployed on an ECS instance. After log data is generated by the application services, Log Service ships the log data to OSS. The metadata discovery feature of DLA is used to automatically discover the log data that Log Service ships to OSS. After that, the log data is available for you to query and analyze. This section describes how to ship log data to OSS and enable DLA to automatically discover the data.

Before you perform operations, make sure that the following prerequisites are met:
  • An ECS instance is prepared to generate real-time log data. An ECS instance on which log data has been generated can also be used.
  • Log Service is activated. A project and a Logstore are created. For more information, see Getting Started. In this procedure, the project is named sls-oss-dla-test, and the Logstore is named sls-dla-logstore.
  • An OSS bucket is created. For more information, see Create buckets.
  • DLA is activated.
  1. Use the ECS instance to generate logs in a simulated environment.
    1. Define a script file named gen-log.sh in the /root/sls-crawler/ directory on the ECS instance. This script file is used to generate log data.
      [root@iZbp1d6epzrizknw2xq**** sls-crawler]# cat gen-log.sh 
      #/bin/sh
       filename=abc_`date +%s`.txt
      echo ${filename}
      for i in  {1..1000000}
         do
              datatimestr=`date '+%Y-%m-%d %H:%M:%S'`
              echo  "111111|1|100000000|0.1|0.0000000000001|true|aabb|valueadd" >>/root/sls-crawler/full_type_logs/${filename}
         done
      [root@iZbp1d6epzrizknw2xq**** sls-crawler]# 
      Note In actual business scenarios, your application servers will generate log files. You can also obtain log files from Log Service by calling the Log Service API.
    2. Run the -sudo crontab -e command to add the following configuration to your crontab file. After the configuration is added, you can run the gen-log.sh file to generate log data to the /root/sls-crawler/full_type_logs/ directory.
      * * * * * sleep 10; sh /root/sls-crawler/gen-log.sh
  2. Logtail collects log data from the ECS instance and sends the log data to a Logstore. For more information, see Collect logs in delimiter mode.
    In this step, the ECS instance iZbp1d6epzrizknw2xq**** in Step 1 is used. The /root/sls-crawler/full_type_logs/ directory in Step 1 is used to save log data.
    After the configuration is complete, wait a few minutes. Then, check whether log data is shipped to OSS on the OSS Shipper page of the Log Service console.
  3. Ship log data from Log Service to OSS.
    In this step, set OSS Bucket to dla-crawler-hangzhou and set OSS Prefix to sls-daily/shipper-json-full-type13.
    After log shipping starts, you can view the log data that is shipped to OSS in the specified OSS directory.
  4. Use the metadata discovery feature of DLA to discover the log data that Log Service ships to OSS. Then, query and analyze the log data. For more information, see Log data shipped from Log Service to OSS.
    When you configure parameters on the Meta information discovery page of the DLA console, select Manual selection for Data source configuration and select sls-dla-logstore as the Logstore.