Log Service allows you to create a collection configuration and create indexes through the data import wizard to collect Apache access logs. Then, you can use the default dashboard and query and analysis statements to analyze website access in real time.

Prerequisites

  • Log Service is activated.
  • A project and a Logstore are created.

Background information

Apache is free and open-source cross-platform web server software used to build and host websites. You can analyze Apache access logs to obtain data such as page views (PVs), unique visitors (UVs), client IP distribution, error requests, client types, and source pages, to monitor and analyze access to your website.

In Log Service, you can create a collection configuration and create indexes through the data import wizard. Also, Log Service creates a default dashboard for you to analyze Apache access logs.

To better analyze Apache access logs, we recommend that you use the following log format configuration:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %D %f %k %p %q %R %T %I %O" customized
Note Check whether certain fields in your logs contain spaces, for example, %t, %{User-Agent}i, and %{Referer}i. If so, enclose each of these fields in a pair of \" in the log format configuration to ensure that Log Service can correctly parse Apache logs.

The following table describes the fields.

Field Field name Description
%h remote_addr The IP address of the client.
%l remote_ident The name of the client that generates the log. The value is obtained from the identd field.
%u remote_user The username of the client.
%t time_local The local time on the server.
%r request The content of the request, including the request method, request URI, and HTTP.
%>s status The returned HTTP status code.
%b response_size_bytes The size of the response packet.
%{Rererer}i http\u0008_referer The source page of the request.
%{User-Agent}i http_user_agent The browser used by the client.
%D request_time_msec The processing time of the request. Unit: milliseconds.
%f filename The request file name with a path.
%k keep_alive The number of keep-alive requests.
%p remote_port The port number of the server.
%q request_query The query string of the request. If the request does not contain a query string, the value is a null string.
%R response_handler The processing program used by the server to respond.
%T request_time_sec The processing time of the request. Unit: seconds.
%I bytes_received The number of bytes received by the server. The mod_logio module must be enabled.
%O bytes_sent The number of bytes sent by the server. The mod_logio module must be enabled.

Procedure

  1. Log on to the Log Service console. On the homepage, click the target project name.
  2. In the left-side navigation pane, find the target Logstore and click the plus sign (+) next to Data Import.
  3. In the Import Data dialog box, select Apache - Text Log.
  4. Select a Logstore.
    You can select an existing Logstore, or create a project and a Logstore.
  5. Create a server group.
    Before creating a server group, make sure that Logtail is installed.
    • Servers of Alibaba Group: By default, Logtail is installed on these servers. If Logtail is not installed on a server, contact Alibaba Cloud as prompted.
    • ECS instances: Select ECS instances and click Install. ECS instances that are running in Windows do not support one-click installation of Logtail. In this case, you need to manually install Logtail. For more information, see Install Logtail in Windows.
    • On-premises servers: Install Logtail as prompted. For more information about how to install Logtail, see Install Logtail in Linux or Install Logtail in Windows based on your operating system.
    After installing Logtail, click Complete Installation to create a server group. If you have created a server group, click Use Existing Server Groups.
  6. Configure the server group.
  7. Create a Logtail configuration.
    1. Set Config Name.
    2. Set Log Path.
    3. Set Log format.
      Set Log format based on the format defined in your Apache log configuration file. To better query and analyze log data, we recommend that you set Log format to Custom.
    4. Set APACHE Logformat Configuration.
      If you set Log format to common or combined, the system automatically enters the preset configuration in this field. If you set Log format to Custom, enter your custom configuration in this field. We recommend that you enter the recommended log format configuration provided in the preceding section.APACHE Logformat Configuration
    5. Confirm APACHE Key Name.
      Log Service automatically parses your Apache key names. Check whether they are correct.
      Note The %r field is extracted as the request_method, request_uri, and request_protocol keys.
      APACHE Key Name
    6. Optional: Set advanced options as required and click Next.
      Parameter Description
      Upload Raw Log Specifies whether to upload the raw log. If you turn on this switch, the raw log content is uploaded as the __raw__ field with the parsed log content.
      Topic Generation Mode
      • Null - Do not generate topic: The default value, which specifies that the topic is set to a null string. You can query logs without entering the topic.
      • Machine Group Topic Attributes: sets the topic based on a machine group to differentiate log data generated on different frontend servers.
      • File Path RegEx: uses Custom RegEx to extract a part of the log path as the topic. This mode is used to differentiate log data generated by different users or instances.
      Custom RegEx The custom regular expression specified if you set Topic Generation Mode to File Path RegEx.
      Log File Encoding
      • utf8: specifies UTF-8 encoding.
      • gbk: specifies GBK encoding.
      Maximum Directory Monitoring Depth The maximum depth of the monitored directory when logs are collected from the log source, that is, at most how many levels of directories can be monitored. Valid values: [0, 1000]. A value of 0 indicates that only the current directory is monitored.
      Timeout Specifies whether the system considers that a log file has timed out if the file is not updated within the specified period. You can set Timeout as follows:
      • Never: specifies that all log files are continuously monitored without timeout.
      • 30 Minute Timeout: specifies that if a log file is not updated within 30 minutes, the system considers that the log file has timed out and no longer monitors the file.
      Filter Configuration The filter conditions that logs must completely meet before they can be collected.
      For example:
      • Collect logs that meet a condition: Set a condition Key:level Regex:WARNING|ERROR, which indicates that only logs whose level is WARNING or ERROR are collected.
      • Filter logs that do not meet a condition:
        • Set a condition Key:level Regex:^(?!.*(INFO|DEBUG)).* , which indicates that logs whose level is INFO or DEBUG are not collected.
        • Set a condition Key:url Regex:.*^(?!.*(healthcheck)).* , which indicates that logs with healthcheck in url are not collected. For example, logs in which the key is url and the value is /inner/healthcheck/jiankong.html are not collected.
      For more examples, see regex-exclude-word and regex-exclude-pattern.
    7. Configure log query and analysis.
      By default, Log Service creates indexes for you. To create or modify indexes, choose Index Attributes > Modify on the Search & Analysis page.
      You can preview the collected log data if servers in the serve group have normal heartbeats. Preview the collected log data
  8. Optional: Analyze logs on the dashboard.

    Log Service creates a default dashboard named LogstoreName-apache-dashboard for you. After the preceding configuration is completed, you can view real-time data on the dashboard, including the distribution of client IP addresses and the percentage of each HTTP status code.

    • Client IP distribution: Use the following SQL statement to collect statistics on the distribution of client IP addresses:
      * | select ip_to_province(remote_addr) as address,
                  count(1) as c
                  group by address limit 100
    • Percentages of HTTP status codes: Use the following SQL statement to count the percentage of each HTTP status code returned in the last day:
      status>0 | select status,
                      count(1) as pv 
                      group by status
      Percentages of HTTP status codes
    • Percentages of request methods: Use the following SQL statement to count the percentage of each request method used in the last day:
      * | select request_method,
                  count(1) as pv 
                  group by request_method
      Percentages of request methods
    • PVs and UVs: Use the following SQL statement to count the number of PVs and UVs:
      * | select date_format(date_trunc('hour', __time__), '%m-%d %H:%i')  as time,
                  count(1) as pv,
                  approx_distinct(remote_addr) as uv
                  group by time
                  order by time
                  limit 1000
      PVs and UVs
    • Inbound and outbound traffic: Use the following SQL statement to collect statistics on the inbound and outbound traffic:
      * | select date_format(date_trunc('hour', __time__), '%m-%d %H:%i') as time, 
                  sum(bytes_sent) as net_out, 
                  sum(bytes_received) as net_in 
                  group by time 
                  order by time 
                  limit 10000mit 10
      Inbound and outbound traffic
    • Percentages of browser types: Use the following SQL statement to count the percentage of each browser type used in the last day:
      * | select  case when http_user_agent like '%Chrome%' then 'Chrome' 
                  when http_user_agent like '%Firefox%' then 'Firefox' 
                  when http_user_agent like '%Safari%' then 'Safari'
                  else 'unKnown' end as http_user_agent,count(1) as pv
                  group by  http_user_agent
                  order by pv desc
                  limit 10
      Percentages of browser types
    • Top 10 source pages: Use the following SQL statement to count the top 10 source pages with the most PVs in the last day:
      * | select  http_referer, 
                  count(1) as pv 
                  group by http_referer 
                  order by pv desc limit 10
      Top 10 source pages
    • Top 10 most visited pages: Use the following SQL statement to count the top 10 visited pages with the most PVs in the last day:
      * | select split_part(request_uri,'?',1) as path, 
                  count(1) as pv  
                  group by path
                  order by pv desc limit 10
      Top 10 most visited pages
    • Top 10 requested pages with the longest latency: Use the following SQL statement to count the top 10 requested pages with the longest latency in the last day:
      * | select request_uri as top_latency_request_uri,
                  request_time_sec 
                  order by request_time_sec desc limit 10 10
      Top 10 requested pages with the longest latency