edit-icon download-icon

Analysis – collect and analyze Nginx monitoring data

Last Updated: Feb 03, 2018

Like PHP-FPM, Docker, and Apache, Nginx has a built-in status page that helps view and monitor Nginx status. This article introduces how to collect Nginx status information by using Log Service Logtail, query and count the collected status information, build dashboards, and create custom alarms, providing all-round monitoring for your Nginx cluster.

Prepare the environment

Follow these steps to enable the Nginx status plug-in.

  1. Verify if Nginx has the status function.

    Run the following command to check whether or not Nginx has the status function:

    1. nginx -V 2>&1 | grep -o with-http_stub_status_module
    2. with-http_stub_status_module

    Nginx has the status function if the output is with-http_stub_status_module.

  2. Configure Nginx status.

    Enable the status function in the Nginx configuration file (the default path is /etc/nginx/nginx.conf). A sample configuration is as follows:

    1. location /private/nginx_status {
    2. stub_status on;
    3. access_log off;
    4. allow 11.132.232.238;
    5. deny all;
    6. }

    Note: This configuration only allows hosts with the IP address 11.132.232.238 to access the nginx status function.

  3. Verify if the hosts with Logtail installed have access to nginx status.

    Run the following command for verification:

    1. $curl http://11.132.232.59/private/nginx_status
    2. Active connections: 1
    3. server accepts handled requests
    4. 2507455 2507455 2512972
    5. Reading: 0 Writing: 1 Waiting: 0

Collect data

  1. Install Logtail

    Install Logtail by following the instructions in Install Logtail. Make sure that the version is 0.16.0 or later. If not, upgrade Logtail to the latest version by following Install Logtail.

  2. Configure collection information

    Method

    1. Create a Logstore in the Log Service console. In the data wizard, select Nginx Monitor under Self-built software.

    2. Follow the prompts to configure the URL and corresponding parameters of Nginx monitor. The collection configuration is based on HTTP collection function.

      Note:

      • Modify the Addresses field value in the sample configuration to the URL list that you want to monitor.
      • If the returned information of Nginx status differs from the default one, modify processors to support HTTP body parsing. For more information, see Data processing configuration.

      Sample configuration:

      1. {
      2. "inputs": [
      3. {
      4. "type": "metric_http",
      5. "detail": {
      6. "IntervalMs": 60000,
      7. "Addresses": [
      8. "http://11.132.232.59/private/nginx_status",
      9. "http://11.132.232.60/private/nginx_status",
      10. "http://11.132.232.62/private/nginx_status"
      11. ],
      12. "IncludeBody": true
      13. }
      14. }
      15. ],
      16. "processors": [
      17. {
      18. "type": "processor_regex",
      19. "detail": {
      20. "SourceKey": "content",
      21. "Regex": "Active connections: (\\d+)\\s+server accepts handled requests\\s+(\\d+)\\s+(\\d+)\\s+(\\d+)\\s+Reading: (\\d+) Writing: (\\d+) Waiting: (\\d+)[\\s\\S]*",
      22. "Keys": [
      23. "connection",
      24. "accepts",
      25. "handled",
      26. "requests",
      27. "reading",
      28. "writing",
      29. "waiting"
      30. ],
      31. "FullMatch": true,
      32. "NoKeyError": true,
      33. "NoMatchError": true,
      34. "KeepSource": false
      35. }
      36. }
      37. ]
      38. }

Preview data

After you apply the configuration, wait one minute and then click Preview. You can see that the status data has been collected successfully. The HTTP collection of Logtail uploads the body parsing data, URL, status code, method name, response time, and request result (successful or not).

Note: If no data is collected, check whether the configuration is a valid json. If yes, see Query Logtail collection errors for troubleshooting.

  1. _address_:http://11.132.232.59/private/nginx_status
  2. _http_response_code_:200
  3. _method_:GET
  4. _response_time_ms_:1.83716261897
  5. _result_:success
  6. accepts:33591200
  7. connection:450
  8. handled:33599550
  9. reading:626
  10. requests:39149290
  11. waiting:68
  12. writing:145

Query and analysis

Custom query

See Log Service query.

  1. To query the status of a specific IP address: _address_ : 10.168.0.0.

  2. To query the requests that have a response time over 100 ms: _response_time_ms_ > 100.

  3. To query requests whose status code is not 200: not _http_response_code_ : 200.

Analysis and statistics

For more information about statistics and analysis syntax, see Log Service statistics syntax.

  • Count the average waiting time, reading time, writing time, and connection time every five minutes:

    1. *| select avg(waiting) as waiting, avg(reading) as reading, avg(writing) as writing, avg(connection) as connection, from_unixtime( __time__ - __time__ % 300) as time group by __time__ - __time__ % 300 order by time limit 1440
  • Count the top 10 entries with longest waiting time:

    1. *| select max(waiting) as max_waiting, address, from_unixtime(max(__time__)) as time group by address order by max_waiting desc limit 10
  • Count the current total number of Nginx and the number of invalid:

    1. * | select count(distinct(address)) as total
    1. not _result_ : success | select count(distinct(address))
  • Count the 10 latest failed requests:

    1. not _result_ : success | select _address_ as address, from_unixtime(__time__) as time order by __time__ desc limit 10
  • Count the total number of request processing every five minutes:

    1. *| select avg(handled) * count(distinct(address)) as total_handled, avg(requests) * count(distinct(address)) as total_requests, from_unixtime( __time__ - __time__ % 300) as time group by __time__ - __time__ % 300 order by time limit 1440
  • Count the average request latency every five minutes:

    1. *| select avg(_response_time_ms_) as avg_delay, from_unixtime( __time__ - __time__ % 300) as time group by __time__ - __time__ % 300 order by time limit 1440
  • Count the number of valid/invalid requests:

    1. not _http_response_code_ : 200 | select count(1)
    1. _http_response_code_ : 200 | select count(1)

Dashboard

By default, Log Service provides dashboards for Nginx monitoring data. For how to build a dashboard, see Log Service dashboard settings.

仪表盘

Set alarms

  1. Save the query not _http_response_code_ : 200 | select count(1) as invalid_count as a quick query, and name it as invalid_nginx_status.

  2. Create alarm rules based on this quick query. Samples are as follows:

    Option Value
    Alarm rule name invalid_nginx_alarm
    Quick query name invalid_nginx_status
    Data query time (mins) 15
    Check interval (mins) 5
    Number of triggers 1
    Field name invalid_count
    Comparison operator greater than
    Check threshold 0
    Notification type Notification center
    Notification content An exception occurred while obtaining the Nginx status. Go to Log Service for detailed exception message. Project: xxxxxxxx, logstore : nginx_status
Thank you! We've received your feedback.