Log Service allows you to create a collection configuration through the data import wizard to collect NGINX access logs. Also, Log Service automatically creates indexes and an NGINX dashboard to help you quickly collect and analyze NGINX access logs.

NGINX is a free, open-source, and high-performance HTTP server used to build and host websites. You can perform statistical analysis on NGINX access logs to obtain data such as the page views (PVs) and access time periods of a website. In traditional methods such as CNZZ, a JavaScript script is inserted into the front-end page of a website and is triggered when a user visits the website. However, this method can only record access requests. Stream computing and offline statistics and analysis can also be used to analyze NGINX access logs. However, you must set up an environment. This method also has difficulty in balancing the timeliness and flexibility of log analysis.

Log Service allows you to query and analyze logs in real time, and provides an NGINX dashboard to help you conveniently analyze NGINX access logs and collect statistics on website access. This topic describes how to collect and analyze NGINX access logs in Log Service.

Scenario

After using NGINX to build a website, you can collect and analyze NGINX access logs to obtain data such as PVs, unique visitors (UVs), top pages, request methods, error requests, client types, and source pages, to monitor and analyze access to your website.

NGINX log format

To better analyze NGINX access logs, we recommend that you use the following log_format configuration:

    log_format  main  '$remote_addr - $remote_user [$time_local] "$request" $http_host '
                        '$status $request_length $body_bytes_sent "$http_referer" '
                        '"$http_user_agent"  $request_time $upstream_response_time';

The following table describes the fields.

Field Description
remote_addr The IP address of the client.
remote_user The username of the client.
time_local The local time on the server.
request The content of the request, including the request method, request URI, and HTTP.
http_host The HTTP address requested by the client.
status The returned HTTP status code.
request_length The size of the request packet.
body_bytes_sent The size of the response packet.
http_referer The source page of the request.
http_user_agent The browser used by the client.
request_time The processing time of the request.
upstream_response_time The processing latency of upstream services.

Collect NGINX access logs

Before collecting logs, make sure that a project and a Logstore are created. For more information, see Manage projects and Manage a Logstore.

  1. Log on to the Log Service console, and then click the target project name.
  2. In the left-side navigation pane, find the target Logstore and click the plus sign (+) next to Data Import.

    You can also click Import Data in the upper-right corner of the Overview page and select a Logstore in subsequent steps.

    Figure 1. Start the Import Data wizard
    Start the Import Data wizard
  3. In the Import Data dialog box, select Nginx - Text Log.

    Log Service can import data of various types, such as data from cloud services, custom code, and data from on-premises open-source or commercial software. To analyze NGINX access logs, choose On-premises Open Source/Commercial Software > Nginx - Text Log.

  4. Create a server group.
    Before creating a server group, make sure that Logtail is installed.
    • Servers of Alibaba Group: By default, Logtail is installed on these servers. If Logtail is not installed on a server, contact Alibaba Cloud as prompted.
    • Elastic Compute Service (ECS) instances: Select ECS instances and click Install. ECS instances that are running in Windows do not support one-click installation of Logtail. In this case, you need to manually install Logtail. For more information, see Install Logtail in Windows.
    • On-premises servers: Install Logtail as prompted. For more information about how to install Logtail, see Install Logtail in Linux or Install Logtail in Windows based on your operating system.
    After installing Logtail, click Complete Installation to create a server group. If you have created a server group, click Use Existing Server Groups.
  5. Configure the server group.

    Move the server group from Source Server Groups to Applied Server Groups.

  6. Create a Logtail configuration.
    1. Set Config Name and Log Path.
    2. Enter the recommended log_format configuration in the NGINX Log Configuration field.
      Figure 2. Configure the data source
      Configure NGINX access logs as the data source
    3. Confirm NGINX Key.

      Log Service automatically extracts the key names from the NGINX log format. Check whether they are correct.

      Note In the log_format configuration, the $request field is extracted as the request_method and request_uri keys.
      Figure 3. Extract NGINX key names
      Default key names in NGINX access logs
    4. Set Drop Failed to Parse Logs as required and click Next.

      If you turn on this switch, the NGINX access logs that fail to be parsed are not uploaded to Log Service. If you turn off this switch, the raw NGINX access logs are uploaded to Log Service when log parsing fails.

  7. Configure log query and analysis.
    By default, Log Service creates indexes for you. To create or modify indexes, choose Index Attributes > Modify on the Search & Analysis page.
    Figure 4. View key/value indexes
    Indexes of NGINX access logs

    You can preview the collected log data if servers in the serve group have normal heartbeats.

    Figure 5. Preview logs
    Preview NGINX access logs

    Click Next. Log Service configures the index attributes and creates the nginx-dashboard dashboard for you to collect and analyze NGINX access logs.

    Note It takes up to 3 minutes for the NGINX log collection configuration to take effect after being created.

Analyze NGINX access logs

After indexing is enabled, Log Service creates indexes and an NGINX dashboard for NGINX access logs by default. You can use the following methods to analyze NGINX access logs:
  • Use SQL statements to analyze NGINX access logs.

    On the Search & Analysis page of the target Logstore, enter a query and analysis statement to view the raw NGINX access logs that meet the specified conditions or view the analysis results of matching logs in charts. In addition, the Search & Analysis page provides features such as quick analysis and saved search. For more information, see Query logs and Diagnose and optimize website access in this topic.

  • View the data on the created NGINX dashboard to analyze NGINX access logs.

    The NGINX dashboard created by Log Service displays the detailed data of various metrics, such as PVs and UVs. For more information about how to use dashboards, see Create and delete a dashboard.

Figure 6. Dashboard for NGINX access logs
Default dashboard for NGINX access logs
  • PVs and UVs

    Count the number of PVs and UVs in the last day.

    Figure 7. PVs and UVs
    PVs and UVs based on NGINX access logs

    Use the following SQL statement:

      * | select approx_distinct(remote_addr) as uv ,
             count(1) as pv , 
             date_format(date_trunc('hour', __time__), '%m-%d %H:%i')  as time
             group by date_format(date_trunc('hour', __time__), '%m-%d %H:%i')
             order by time
             limit 1000
  • Top 10 most visited pages

    Count the top 10 visited pages with the most PVs in the last day.

    Figure 8. Top 10 most visited pages
    Top 10 most visited pages based on NGINX access logs

    Use the following SQL statement:

     * | select split_part(request_uri,'?',1) as path, 
         count(1) as pv  
         group by split_part(request_uri,'?',1) 
         order by pv desc limit 10
  • Percentages of request methods (http_method_percentage)

    Count the percentage of each request method used in the last day.

    Figure 9. Percentages of request methods
    Percentages of request methods based on NGINX access logs

    Use the following SQL statement:

     * | select count(1) as pv,
             request_method
             group by request_method
  • Percentages of HTTP status codes (http_status_percentage)

    Count the percentage of each HTTP status code returned in the last day.

    Figure 10. Percentages of HTTP status codes
    Percentages of HTTP status codes based on NGINX access logs

    Use the following SQL statement:

     * | select count(1) as pv,
             status
             group by status
  • Percentages of browser types

    Count the percentage of each browser type used in the last day.

    Figure 11. Percentages of browser types
    Percentages of browser types based on NGINX access logs

    Use the following SQL statement:

     * | select count(1) as pv,
         case when http_user_agent like '%Chrome%' then 'Chrome' 
         when http_user_agent like '%Firefox%' then 'Firefox' 
         when http_user_agent like '%Safari%' then 'Safari'
         else 'unKnown' end as http_user_agent
         group by  http_user_agent
         order by pv desc
         limit 10
  • Top 10 source pages

    Count the top 10 source pages with the most PVs in the last day.

    Figure 12. Top 10 source pages
    Top 10 source pages based on NGINX access logs

    Use the following SQL statement:

     * | select count(1) as pv,
             http_referer
             group by http_referer
             order by pv desc limit 10

Diagnose and optimize website access

Apart from some default access metrics, you may need to diagnose access requests based on NGINX access logs to find out the requests and pages with a long latency. In this case, you can use the quick analysis feature provided on the Search & Analysis page.

  • Average latency and maximum latency

    Count the average latency and maximum latency every 5 minutes to obtain the overall latency.

    Use the following SQL statement:

      * | select from_unixtime(__time__ -__time__% 300) as time, 
              avg(request_time) as avg_latency ,
              max(request_time) as max_latency  
              group by __time__ -__time__% 300
  • Requested page with the maximum latency

    After obtaining the maximum latency, find out the requested page with the maximum latency to further optimize the response speed of the page.

    Use the following SQL statement:

      * | select from_unixtime(__time__ - __time__% 60) , 
              max_by(request_uri,request_time)  
              group by __time__ - __time__%60
  • Distribution of requests with a latency

    Divide all the requests with a latency into 10 groups based on the time range and count the number of such requests in each time range.

    Use the following SQL statement:

    * |select numeric_histogram(10,request_time)
  • Top 10 requests with the longest latency

    Count the top 10 requests with the longest latency and the latency of each request.

    Use the following SQL statement:

    * | select max(request_time,10)
  • Page optimization

    Assume that the /url2 page has the maximum latency. To optimize the response speed of the /url2 page, count the number of PVs and UVs, times of each request method used, times of each HTTP status code returned, times of each browser type used, average latency, and maximum latency for the /url2 page.

    Use the following SQL statement:

       request_uri:"/url2" | select count(1) as pv,
              approx_distinct(remote_addr) as uv,
              histogram(method) as method_pv,
              histogram(status) as status_pv,
              histogram(user_agent) as user_agent_pv,
              avg(request_time) as avg_latency,
              max(request_time) as max_latency

After obtaining the preceding data, you can make a targeted and detailed assessment on the website access.