All Products
Search
Document Center

Enterprise Distributed Application Service:Monitor an NGINX Ingress for an application

Last Updated:Sep 22, 2023

Enterprise Distributed Application Service (EDAS) provides five dashboards that allow you to view monitoring data on NGINX Ingresses. The monitoring data is displayed on the Infrastructure Monitoring, Overview, TopN, URL Analysis, and Log Analysis tabs. You can view the basic monitoring data based on the Application Real-Time Monitoring Service (ARMS) Prometheus Service data source. Alternatively, you can install the later version of the Prometheus component to view the monitoring data on the gateway logs of the Loki log data source.

View the monitoring data on an NGINX Ingress

  1. Log on to the EDAS console. In the left-side navigation pane, choose Traffic Management > Application Routing.

  2. On the Application Routing (Kubernetes Ingress) page, find the desired Ingress and click Monitor in the Operation column to go to the monitoring page.

  3. On the Ingress monitoring page, select a region and view monitoring data on the Infrastructure Monitoring, Overview, TopN, URL Analysis, and Log Analysis tabs.

    Note
    • On the Ingress monitoring page, choose Overview > Go to the opening. In the Latest Version of Monitoring Not Enabled dialog box, click Enable.

    • In the upper-right corner of the Ingress monitoring page, click Disable Monitoring to disable the Ingress monitoring feature.

    • Click the Infrastructure Monitoring tab to view the data of basic monitoring metrics for the current Kubernetes cluster. The basic monitoring metrics include basic business requests, the system load of the NGINX Ingress controller service, and the number of times the NGINX configuration file is reloaded.

      The Infrastructure Monitoring tab is integrated with the basic monitoring panel of Prometheus Service. Basic monitoring metrics help you view the overall statistics on business requests for the current cluster. In addition, you can identify and analyze business issues based on the monitoring data on the URL Analysis and Log Analysis tabs. Monitoring data on the system load of the NGINX Ingress controller service and the number of times the NGINX configuration file is reloaded can help you manage the resources of the NGINX Ingress controller service and check anomalous configurations.

      Infrastructure Monitoring
      • Requests

        Parameter

        Description

        Total number of requests

        The overall queries per second (QPS) based on global requests processed by all controllers of the NGINX Ingress controller service.

        Number of requests

        The number of requests processed by the selected Ingresses in the last 2 minutes.

        Number of requests (by Ingress)

        The line chart of the number of HTTP requests processed by Ingresses.

        ReqPS(by Ingress)

        The line chart of the number of requests per second processed by Ingresses.

        QPS

        The line chart of the number of requests per second processed by pods based on an NGINX Ingress controller.

        ReqPS (by Path)

        The line chart of the number of failed requests per second based on a path. When an HTTP status code between 4xx and 5xx is returned, it means a request failed.

      • Connections

        Parameter

        Description

        Number of connections

        The line chart of the number of pod connections based on an NGINX Ingress controller.

        Number of new connections

        The number of new connections to pods of the selected NGINX Ingress controller service.

        Total connections

        The total number of connections to pods of the selected NGINX Ingress controller service.

      • Status

        Parameter

        Description

        HTTP 1/2xx

        The number of HTTP 1xx or 2xx status codes returned in a specified time period.

        HTTP 3xx

        The number of HTTP 3xx status codes returned in a specified time period.

        HTTP 4xx

        The number of HTTP 4xx status codes returned in a specified time period.

        HTTP 5xx

        The number of HTTP 5xx status codes returned in a specified time period.

        Success rate

        The success rate of requests processed by Ingresses in a specified time period.

        HTTP status code

        The line chart of HTTP status codes.

        Request failure rate

        The line chart of the ratio of 404 or 5xx status code.

        Error rate (by Path)

        The line chart of the failure rate of requests based on a path. When an HTTP status code between 4xx and 5xx is returned, it means a request failed.

      • Config

        Parameter

        Description

        Number of times Reload is configured.

        The number of times the NGINX configuration file was reloaded in the last minute.

        Number of Pods that failed to configure.

        Indicates whether the last reload operation of the NGINX configuration file was successful. If the last reload operation failed, the number of controller pods is displayed. If the last reload operation was successful, N/A is displayed.

      • Latency

        Parameter

        Description

        delay time

        The line chart of percentile latencies of all requests.

        Delayed heat map

        The latency heatmap of requests processed by the selected Ingresses in a specified time period.

        Request processing delay (by Path)

        The line chart of request latencies based on a path.

        Upstream service delay (by Path)

        The line chart of upstream service latencies based on a path.

      • Controller load

        Parameter

        Description

        Network I/O

        The line chart of inbound and outbound traffic based on an NGINX Ingress controller.

        CPU

        The line chart of CPU utilization based on an NGINX Ingress controller.

        Memory

        The line chart of the memory usage based on an NGINX Ingress controller.

      • Key Prometheus monitoring metrics

        Metric

        Description

        nginx_ingress_controller_request_duration_seconds_bucket

        The performance analysis of latencies of monitoring requests.

        nginx_ingress_controller_response_duration_seconds_bucket

        The performance analysis of latencies of upstream services.

        nginx_ingress_controller_response_size_bucket

        The size of the request response packet.

        nginx_ingress_controller_nginx_process_connections

        The number of connections of NGINX services.

        nginx_ingress_controller_request_duration_seconds_count

        The number of requests based on the following labels: request latency, status code, path, method, and host.

        nginx_ingress_controller_request_size_sum

        The size of request packets based on the following labels: request latency, status code, path, method, and host.

        nginx_ingress_controller_response_size_count

        The number of request responses based on the following labels: request latency, status code, path, method, and host.

        nginx_ingress_controller_response_size_sum

        The size of response packets based on the following labels: request latency, status code, path, method, and host.

        nginx_ingress_controller_config_hash

        The hash value in the NGINX configuration file.

        nginx_ingress_controller_config_last_reload_successful

        Check whether the NGINX configuration file is reloaded.

        nginx_ingress_controller_ingress_upstream_latency_seconds_sum

        The latency of upstream service requests processed by Ingresses.

        nginx_ingress_controller_nginx_process_cpu_seconds_total

        The CPU utilization of the NGINX service.

        nginx_ingress_controller_requests

        The number of requests.

        nginx_ingress_controller_success

        The number of times the NGINX configuration file is reloaded.

    • Click the Overview tab to view the traffic information about a specified Ingress based on the Ingress name and the namespace to which the Ingress belongs. The traffic information includes the number of requests, the success rate of requests, the distribution of HTTP status codes, and request latencies. Overview

      Monitoring item

      Description

      PV

      The total number of requests in a specified time range.

      Inflow

      The total traffic for receiving requests in a specified time range.

      outflow

      The total traffic for responding to requests in a specified time range.

      Status code distribution

      The pie chart of HTTP status codes in a specified time range.

      Method distribution

      The pie chart of request methods in a specified time range.

      Distribution of Ingress request

      The pie chart of controllers used to process requests in a specified time range.

      PV

      The line chart of the number of requests.

      Failure rate

      The line chart of the request failure rate. Requests for which an HTTP status code between 1xx and 3xx is returned are excluded.

      Request delay

      The line chart of P50, P95, P99, and P9999 latencies of all requests.

    • Click the TopN tab to view the top N data for the current cluster. The TopN tab displays the following sections: Request statistics, Geographical statistics, and Equipment statistics.

      • Request statistics: This section allows you to view the top N data collected and analyzed based on the number, failure rate, and latency of requests and the traffic consumed for requests. Request statistics

      • Geographical statistics: This section allows you to view the top N data collected and analyzed based on the number of requests by a country and a region. Geographical statistics

      • Equipment statistics: This section allows you to view the top N data collected and analyzed based on the number of requests by a browser, device type, and operating system. Equipment statistics

      Monitoring item

      Description

      Service request Top10

      The top 10 services that receive the most requests in a specified time range.

      Service failure rate Top10

      The top 10 services that have the highest failure rate of requests in a specified time range. When an HTTP status code between 4xx and 5xx is returned, it means a request failed.

      Service delay Top10

      The top 10 services that have the longest latency in response to requests in a specified time range.

      Service traffic Top10

      The top 10 services that consume the most traffic in a specified time range.

      Ingress Top10

      The top 10 Ingresses that route the most requests, have the highest success rate, have the longest average latency, or consume the most inbound traffic and outbound traffic in a specified time range.

      Service Top10

      The top 10 services that receive the most requests, have the highest success rate, have the longest average latency, or consume the most inbound traffic and outbound traffic in a specified time range.

      Country (region) request Top10

      The top 10 regions from which the most requests are sent in a specified time range.

      City Request Top10

      The top 10 cities from which the most requests are sent in a specified time range.

      Browser request Top10

      The top 10 browsers from which the most requests are sent in a specified time range.

      Device type request Top10

      The top 10 devices from which the most requests are sent in a specified time range.

      Operating System Request Top10

      The top 10 operating systems from which the most requests are sent in a specified time range.

    • Click the URL Analysis tab to view the details based on request Uniform Resource Locators (URLs). The details include the QPS, latency, failure rate, and error details. You can also view the latest traffic of Ingresses. You can enter keywords, such as a URL or an HTTP status code, in the field next to Search to view the status of requests that match your specified traffic characteristics. URL Analysis 1

      Monitoring item

      Description

      QPS

      • The QPS based on a URL.

      • Allows you to filter the displayed results by cluster, namespace to which an Ingress belongs, Ingress, Service, and request time.

      • Supports URL aggregation.

      Duration - P95

      • The P95 latency based on a URL.

      • Allows you to filter the displayed results by cluster, namespace to which an Ingress belongs, Ingress, Service, and request time.

      • Supports URL aggregation.

      • Uses a sampling period of 5 minutes.

      Failure rate

      • The request failure rate based on a URL.

      • Allows you to filter the displayed results by cluster, namespace to which an Ingress belongs, Ingress, Service, and request time.

      • Supports URL aggregation.

      • Uses a sampling period of 1 minute.

      Error details

      • The request failure rate based on a URL and an HTTP status code.

      • Allows you to filter the displayed results by cluster, namespace to which an Ingress belongs, Ingress, Service, and request time.

      • Supports URL aggregation.

      • Uses a sampling period of 1 minute.

      Recent request log

      The request logs in a specified time range.

    • Click the Log Analysis tab to view the aggregated log information about Ingress usage. Click the Angle bracket icon icon on the left side of a log record and click the URL next to TraceId. The trace analysis page of the ARMS console appears. Then, you can troubleshoot errors in the trace. For more information about how to query traces, see Trace query. Log Analysis

      Field

      Type

      Unit

      Description

      Example

      remote_addr

      string

      N/A

      The source IP address of the HTTP client.

      42.120.xx.xx

      proxy_protocol_addr

      string

      N/A

      The IP address of the HTTP client after the proxy is enabled.

      42.120.xx.xx

      remote_user

      string

      N/A

      The username for the basic authentication of HTTP request headers.

      N/A

      time_local

      string

      N/A

      The local time in the Common Log Format (CLF).

      20/Apr/2022:14:36:26+0800

      method

      string

      N/A

      The HTTP request method.

      GET

      url

      string

      N/A

      The HTTP request URL.

      /echo/test

      version

      string

      N/A

      The version of the HTTP protocol.

      HTTP/1.1

      status

      number

      N/A

      The HTTP status code returned to the HTTP client.

      200

      body_bytes_sent

      number

      Byte

      The number of bytes returned to the HTTP client, excluding the length of response headers.

      131

      http_referer

      string

      N/A

      The Referer header in the HTTP request.

      N/A

      http_user_agent

      string

      N/A

      The UserAgent header in the HTTP request.

      curl/7.77.0

      request_length

      number

      Byte

      The length of the HTTP request. The request line, request headers, and request body are included.

      85

      request_time

      number

      Seconds

      The duration from the time when the HTTP request is sent from the HTTP client to the time when the first byte of the HTTP request is read.

      1.004

      proxy_upstream_name

      string

      N/A

      The name of the upstream of the reverse proxy. The name is in the format of upstream-<namespace>-<service name>-<service port>.

      default-test-svc-80

      upstream_addr

      string

      N/A

      The IP address and port number of the upstream of the reverse proxy.

      10.53.xx.xx:18081

      upstream_response_length

      number

      Byte

      The length of the response returned by the upstream of the reverse proxy.

      131

      upstream_response_time

      number

      Seconds

      The period of time consumed for receiving the response from the upstream of the reverse proxy.

      1.005

      upstream_status

      number

      N/A

      The HTTP status code returned by the reverse proxy.

      200

      req_id

      string

      N/A

      The value of the X-Request-ID header in the HTTP request. If the value is not specified, a randomly generated ID is used.

      365fff78ac45b5b9033a1d503576****

      host

      string

      N/A

      The host header in the HTTP request.

      example.aliyundoc.com

      proxy_alternative_upstream_name

      string

      N/A

      The name of the alternative upstream of the reverse proxy. The name is in the format of upstream-<namespace>-<service name>-<service port>.

      N/A

      http_traceparent

      string

      N/A

      The information sourced from EDAS that contains the trace_id. For more information, see W3C Trace Context.

      00-827ff36*****************b9ba52c980-3b0cf2185c670935-01

      trace_id

      string

      N/A

      The trace ID sourced from EDAS. You can go to the ARMS console and view trace analysis results based on the trace ID.

      827ff36f90b1a99d06e862b9ba52****

      content

      string

      N/A

      The original log content if the Ingress log record fails to be parsed.

      42.120.xx.xx - [42.120.xx.xx] - - [20/Apr/2022:14:36:26 +0800] "GET /echo/test HTTP/1.1" 200 131 "-" "curl/7.77.0" 85 1.004 [default-test-svc-80] 10.53.xx.xx:18081 131 1.005 200 365fff78ac45b5b9033a1d503576**** example.aliyundoc.com [] 00-827ff36*****************b9ba52c980-3b0cf2185c670935-01

      geoip_city_name

      string

      N/A

      The city information.

      Hangzhou

      geoip_continent_code

      string

      N/A

      The continent code.

      AS

      geoip_country_code

      string

      N/A

      The country code.

      CN

      geoip_timezone

      string

      N/A

      The time zone of a geo location.

      Asia/Shanghai

      user_agent_browser_family

      string

      N/A

      The browser that is used to a client.

      HTTP Library

      user_agent_os_family

      string

      N/A

      The client device to access.

      • PC

      • Smartphone

      user_agent_device_category

      string

      N/A

      The client operating system to access.

      Windows 10

    • Click the NGINX Monitoring (Community Edition) and Request Processing Performance (Community Edition) tabs to view monitoring dashboards of the community version. The monitoring data is for reference only.