All Products
Search
Document Center

Simple Log Service:Collect, query, and analyze NGINX monitoring logs

Last Updated:Jun 25, 2024

You can configure the built-in stub_status module of NGINX to enable a dedicated status page to display the key metrics of your NGINX server in real time. The metrics include Active connections, Reading, Writing, and Waiting. You can use Logtail plug-ins to collect NGINX monitoring logs. After the logs are collected, you can query and analyze the logs. This way, you can continuously monitor your NGINX cluster.

Prerequisites

Logtail is installed on your server. For more information, see Install Logtail on a Linux server or Install Logtail on a Windows server.

Note

For a Linux server, install Logtail V0.16.0 or later. For a Windows server, install Logtail V1.0.0.8 or later.

Step 1: Configure the stub_status module

Note

In this topic, Linux is used as an example to describe the configuration procedure.

  1. Run the following commands to install and start NGINX:

    sudo yum install nginx
    sudo systemctl start nginx
  2. Run the following command to check whether the NGINX stub_status module is supported. For more information, see Module ngx_http_stub_status_module.

    nginx -V 2>&1 | grep -o with-http_stub_status_module
    with-http_stub_status_module 

    If the following information is returned, the module is supported:

    with-http_stub_status_module
  3. Configure the stub_status module on your server.

    1. Run the following command to open the /etc/nginx/nginx.conf file:

      vim /etc/nginx/nginx.conf
    2. Press the i key on your keyboard to enter the edit mode.

    3. Add the following code to the server {..} section. For more information about nginx_status, see Enable Nginx Status Page.

      location /nginx_status {
          stub_status on;    # Enable the stub_status module.
          access_log   off;
          allow ${Server IP address};
          deny all;          # Deny access requests from all other IP addresses to the status page.
       }
    4. Press the Esc key on the keyboard to exit the edit mode. Then, enter :wq to save and close the file.

  4. Run the following command on your server to verify the configuration results:

    curl http://${Server IP address}/nginx_status

    If the following output is returned, the configuration is successful.

    Active connections: 1
    server accepts handled requests
    2507455 2507455 2512972
    Reading: 0 Writing: 1 Waiting: 0                       

Step 2: Collect NGINX monitoring logs

  1. Log on to the Simple Log Service console.

  2. On the right side of the page that appears, click the Quick Data Import card.

    image

  3. Click Custom Data Plug-in.

  4. Select the project and Logstore. Then, click Next.

  5. Create a machine group.

    • If a machine group is available, click Use Existing Machine Groups.

    • If no machine groups are available, perform the following steps to create a machine group. In this example, an Elastic Compute Service (ECS) instance is used.

      1. On the ECS Instances tab, select Manually Select Instances. Then, select the ECS instance that you want to use and click Create.

        For more information, see Install Logtail on ECS instances.

        Important

        If you want to collect logs from an ECS instance that belongs to a different Alibaba Cloud account than Log Service, a server in a data center, or a server of a third-party cloud service provider, you must manually install Logtail. For more information, see Install Logtail on a Linux server or Install Logtail on a Windows server.

        After you manually install Logtail, you must configure a user identifier for the server. For more information, see Configure a user identifier.

      2. After Logtail is installed, click Complete Installation.

      3. In the Create Machine Group step, configure the Name parameter and click Next.

        Log Service allows you to create IP address-based machine groups and custom identifier-based machine groups. For more information, see Create an IP address-based machine group and Create a custom identifier-based machine group.

  6. Confirm that the machine group is displayed in the Applied Server Groups section and click Next.

    Important

    If you apply a machine group immediately after you create the machine group, the heartbeat status of the machine group may be FAIL. This issue occurs because the machine group is not connected to Simple Log Service. To resolve this issue, you can click Automatic Retry. If the issue persists, see What do I do if no heartbeat connections are detected on Logtail?

  7. In the Configure Data Source step, configure Configuration Name and Plug-in Configuration. Then, click Next.

    • inputs is required and is used to configure the data source settings for the Logtail configuration.

      Important

      You can specify only one type of data source in inputs.

    • processors is optional and is used to configure the data processing settings for the Logtail configuration to parse data. You can specify one or more processing methods.

      If your logs cannot be parsed based only on the setting of inputs, you can configure processors in the Plug-in Configuration field to add plug-ins for data processing. For example, you can extract fields, extract log time, mask data, and filter logs. For more information, see Use Logtail plug-ins to process data.

    {
    "inputs": [
     {
          "type": "metric_http",
          "detail": {
              "IntervalMs": 60000,
              "Addresses": [
                  "http://${Server IP address}/nginx_status",
                  "http://${Server IP address}/nginx_status",
                  "http://${Server IP address}/nginx_status"
              ],
              "IncludeBody": true
          }
     }
    ],
    "processors": [
     {
          "type": "processor_regex",
          "detail": {
              "SourceKey": "content",
              "Regex": "Active connections: (\\d+)\\s+server accepts handled requests\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+Reading: (\\d+) Writing: (\\d+) Waiting: (\\d+)[\\s\\S]*",
              "Keys": [
                  "connection",
                  "accepts",
                  "handled",
                  "requests",
                  "reading",
                  "writing",
                  "waiting"
              ],
              "FullMatch": true,
              "NoKeyError": true,
              "NoMatchError": true,
              "KeepSource": false
          }
     }
    ]
    }                                

    The following table describes the key parameters.

    Parameter

    Type

    Required

    Description

    type

    string

    Yes

    The type of the data source. Set the value to metric_http.

    IntervalMs

    int

    Yes

    The interval between two consecutive requests. Unit: milliseconds.

    Addresses

    Array

    Yes

    The URLs that you want to monitor.

    IncludeBody

    boolean

    No

    Specifies whether to collect the body information of requests. Default value: false. If you set this parameter to true, the body information is collected and stored in the content field.

You can view the collected logs 1 minute after the Logtail configuration is created. The following example shows a collected log. By default, Simple Log Service generates the nginx_status dashboard to display the results of query and analysis on the collected logs.

_address_:http://10.10.XX.XX/nginx_status  
_http_response_code_:200  
_method_:GET  
_response_time_ms_:1.83716261897  
_result_:success  
accepts:33591200  
connection:450  
handled:33599550  
reading:626  
requests:39149290  
waiting:68  
writing:145                  

Step 3: Query and analyze logs

  1. Log on to the Simple Log Service console.

  2. In the Projects section, click the project that you want to manage.

    image

  3. In the left-side navigation pane, click Log Storage. In the Logstores list, click the Logstore that you want to manage.

    image

  4. Enter a query statement in the search box, click Last 15 Minutes, and then specify a query time range.

    For more information, see Step 1: Enter a query statement.

    • Query logs

      • Query the information about an IP address.

        _address_ : 10.10.0.0
      • Query the requests whose response time exceeds 100 milliseconds.

        _response_time_ms_ > 100
      • Query the requests for which the HTTP status code 200 is not returned.

        not _http_response_code_ : 200
    • Analyze logs

      • Obtain the average numbers of waiting connections, reading connections, writing connections, and connections at 5-minute intervals.

        *| select  avg(waiting) as waiting, avg(reading)  as reading,  avg(writing)  as writing,  avg(connection)  as connection,  from_unixtime( __time__ - __time__ % 300) as time group by __time__ - __time__ % 300 order by time limit 1440                       
      • Obtain the top 10 servers that have the largest number of waiting connections.

        *| select  max(waiting) as max_waiting, _address_, from_unixtime(max(__time__)) as time group by address order by max_waiting desc limit 10                        
      • Obtain the number of IP addresses.

        * | select  count(distinct(_address_)) as total                       
      • Obtain the number of IP addresses from which failed requests are initiated.

        not _result_ : success | select  count(distinct(_address_))                        
      • Obtain the IP addresses from which the 10 most recent failed requests are initiated.

        not _result_ : success | select _address_ as address, from_unixtime(__time__) as time  order by __time__ desc limit 10                       
      • Obtain the total number of requests at 5-minute intervals.

        *| select  avg(handled) * count(distinct(_address_)) as total_handled, avg(requests) * count(distinct(address)) as total_requests,  from_unixtime( __time__ - __time__ % 300) as time group by __time__ - __time__ % 300 order by time limit 1440                       
      • Obtain the average request latency at 5-minute intervals.

        *| select  avg(_response_time_ms_) as avg_delay,  from_unixtime( __time__ - __time__ % 300) as time group by __time__ - __time__ % 300 order by time limit 1440                      
      • Obtain the numbers of successful requests and failed requests.

        not _http_response_code_ : 200  | select  count(1)                     
        _http_response_code_ : 200  | select  count(1)