All Products
Search
Document Center

Background and preparations

Last Updated: Apr 23, 2019

Background

Service logs record all the running details of a service. Logs are necessary for troubleshooting, status monitoring, and prediction and warning, and play an important role in big data analysis.

Alibaba Cloud Object Storage Service (OSS) is a secure, cost-effective, and highly reliable cloud storage service provided by Alibaba Cloud that can process a great amount of data. More and more users tend to store a great amount of logs in OSS. DLA can read and analyze the logs and locate the service failure causes without moving the logs in OSS.

Taking Apache web server logs, NGINX access logs, and Apache log4j logs as an example, this topic describes how to read and analyze logs using DLA.

Prerequisites

Before reading logs stored in OSS using DLA, you must have prepared test data in OSS by performing the following steps:

  1. Activate OSS

  2. Create buckets

  3. Upload logs

    Upload the log files webserver.log, ngnix_log, and log4j_sample.log to the log directory of the OSS instance.

    1

  • Prepare the test data based on the webserver.log file of Apache web server:

    1. 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326
    2. 127.0.0.1 - - [26/May/2009:00:00:00 +0000] "GET /someurl/?track=Blabla(Main) HTTP/1.1" 200 5864 - "Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/1.0.154.65 Safari/525.19"

    Regular expression

    1. ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\"[^\"]*\") ([^ \"]*|\"[^\"]*\"))?
  • Prepare the test data based on the ngnix_log file of NGINX:

    1. 127.0.0.1 - - [14/May/2018:21:58:04 +0800] "GET /?stat HTTP/1.1" 200 182 "-" "aliyun-sdk-java/2.6.0(Linux/2.6.32-220.23.2.ali927.el5.x86_64/amd64;1.6.0_24)" "-"
    2. 127.0.0.1 - - [14/May/2018:21:58:04 +0800] "GET /?prefix=&delimiter=%2F&max-keys=100&encoding-type=url HTTP/1.1" 200 7202 "https://help.aliyun.com/product/70174.html" "aliyun-sdk-java/2.6.0(Linux/2.6.32-220.23.2.ali927.el5.x86_64/amd64;1.6.0_24)" "-"

    Regular expression

    1. ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) (\".*?\") (-|[0-9]*) (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) (-|\\[[^\\]]*\\]) (\".*?\") (-|[0-9]*) (-|[0-9]*)
  • Taking the log4j_sample.log file that is generated by Hadoop by default as an example, prepare the test data for Apache log4j:

    1. 2018-11-27 17:45:23,128 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: Minimum allocation = <memory:1024, vCores:1>
    2. 2018-11-27 17:45:23,128 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: Maximum allocation = <memory:8192, vCores:4>
    3. 2018-11-27 17:45:23,154 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration: max alloc mb per queue for root is undefined
    4. 2018-11-27 17:45:23,154 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerConfiguration: max alloc vcore per queue for root is undefined

    Regular expression

    1. ^(\\d{4}-\\d{2}-\\d{2})\\s+(\\d{2}.\\d{2}.\\d{2}.\\d{3})\\s+(\\S+)\\s+(\\S+)\\s+(.*)$

Notes

Logs you want to read by using DLA must meet the following conditions:

  • Logs must be plain text and each row is mapped to a record in the table.

  • The content of each row is of the same format and can be matched by using a regular expression.

When you create an external table based on logs in DLA, the most tedious step is to write a regular expression. The regular expression is described as follows:

  • Fields in regular expressions are separated by parentheses () while fields in logs are generally separated by space.

  • The number of columns defined in the table creation statement must be same as the number of fields in the regular expression.

  • Generally, numbers can be matched by using ([0-9]&#42;) or (-|[0-9]&#42;) while strings can be matched by using (1&#42;) or (". &#42;?").

Procedure

On the DMS for Data Lake Analytics page, compile SQL statements to create an OSS schema, an OSS table, and read data from the OSS file. Alternatively, connect to DLA through the MySQL client or MySQL CLI tool, and then compile SQL statements to create an OSS schema, an OSS table, and read data from the OSS file.

This topic describes how to compile SQL statements in DMS to create an OSS schema, an OSS table, and to read data from the OSS file.

  1. Create OSS schemas

  2. Create external tables based on OSS logs and read data from the logs