Monitor an NGINX Ingress for an application - Enterprise Distributed Application Service

Enterprise Distributed Application Service (EDAS) provides five dashboards that allow you to view monitoring data on NGINX Ingresses. The monitoring data is displayed on the Infrastructure Monitoring, Overview, TopN, URL Analysis, and Log Analysis tabs. You can view the basic monitoring data based on the Application Real-Time Monitoring Service (ARMS) Prometheus Service data source. Alternatively, you can install the later version of the Prometheus component to view the monitoring data on the gateway logs of the Loki log data source.

View the monitoring data on an NGINX Ingress

Log on to the EDAS console. In the left-side navigation pane, choose Traffic Management > Application Routing.
On the Application Routing (Kubernetes Ingress) page, find the desired Ingress and click Monitor in the Operation column to go to the monitoring page.

On the Ingress monitoring page, select a region and view monitoring data on the Infrastructure Monitoring, Overview, TopN, URL Analysis, and Log Analysis tabs.

Note

On the Ingress monitoring page, choose Overview > Go to the opening. In the Latest Version of Monitoring Not Enabled dialog box, click Enable.
In the upper-right corner of the Ingress monitoring page, click Disable Monitoring to disable the Ingress monitoring feature.

Click the Infrastructure Monitoring tab to view the data of basic monitoring metrics for the current Kubernetes cluster. The basic monitoring metrics include basic business requests, the system load of the NGINX Ingress controller service, and the number of times the NGINX configuration file is reloaded.

The Infrastructure Monitoring tab is integrated with the basic monitoring panel of Prometheus Service. Basic monitoring metrics help you view the overall statistics on business requests for the current cluster. In addition, you can identify and analyze business issues based on the monitoring data on the URL Analysis and Log Analysis tabs. Monitoring data on the system load of the NGINX Ingress controller service and the number of times the NGINX configuration file is reloaded can help you manage the resources of the NGINX Ingress controller service and check anomalous configurations.

Requests

Parameter	Description
Total number of requests	The overall queries per second (QPS) based on global requests processed by all controllers of the NGINX Ingress controller service.
Number of requests	The number of requests processed by the selected Ingresses in the last 2 minutes.
Number of requests (by Ingress)	The line chart of the number of HTTP requests processed by Ingresses.
ReqPS(by Ingress)	The line chart of the number of requests per second processed by Ingresses.
QPS	The line chart of the number of requests per second processed by pods based on an NGINX Ingress controller.
ReqPS (by Path)	The line chart of the number of failed requests per second based on a path. When an HTTP status code between 4xx and 5xx is returned, it means a request failed.

Connections

Parameter	Description
Number of connections	The line chart of the number of pod connections based on an NGINX Ingress controller.
Number of new connections	The number of new connections to pods of the selected NGINX Ingress controller service.
Total connections	The total number of connections to pods of the selected NGINX Ingress controller service.

Status

Parameter	Description
HTTP 1/2xx	The number of HTTP 1xx or 2xx status codes returned in a specified time period.
HTTP 3xx	The number of HTTP 3xx status codes returned in a specified time period.
HTTP 4xx	The number of HTTP 4xx status codes returned in a specified time period.
HTTP 5xx	The number of HTTP 5xx status codes returned in a specified time period.
Success rate	The success rate of requests processed by Ingresses in a specified time period.
HTTP status code	The line chart of HTTP status codes.
Request failure rate	The line chart of the ratio of 404 or 5xx status code.
Error rate (by Path)	The line chart of the failure rate of requests based on a path. When an HTTP status code between 4xx and 5xx is returned, it means a request failed.

Config

Parameter	Description
Number of times Reload is configured.	The number of times the NGINX configuration file was reloaded in the last minute.
Number of Pods that failed to configure.	Indicates whether the last reload operation of the NGINX configuration file was successful. If the last reload operation failed, the number of controller pods is displayed. If the last reload operation was successful, N/A is displayed.

Latency

Parameter	Description
delay time	The line chart of percentile latencies of all requests.
Delayed heat map	The latency heatmap of requests processed by the selected Ingresses in a specified time period.
Request processing delay (by Path)	The line chart of request latencies based on a path.
Upstream service delay (by Path)	The line chart of upstream service latencies based on a path.

Controller load

Parameter	Description
Network I/O	The line chart of inbound and outbound traffic based on an NGINX Ingress controller.
CPU	The line chart of CPU utilization based on an NGINX Ingress controller.
Memory	The line chart of the memory usage based on an NGINX Ingress controller.

Key Prometheus monitoring metrics

Metric	Description
nginx_ingress_controller_request_duration_seconds_bucket	The performance analysis of latencies of monitoring requests.
nginx_ingress_controller_response_duration_seconds_bucket	The performance analysis of latencies of upstream services.
nginx_ingress_controller_response_size_bucket	The size of the request response packet.
nginx_ingress_controller_nginx_process_connections	The number of connections of NGINX services.
nginx_ingress_controller_request_duration_seconds_count	The number of requests based on the following labels: request latency, status code, path, method, and host.
nginx_ingress_controller_request_size_sum	The size of request packets based on the following labels: request latency, status code, path, method, and host.
nginx_ingress_controller_response_size_count	The number of request responses based on the following labels: request latency, status code, path, method, and host.
nginx_ingress_controller_response_size_sum	The size of response packets based on the following labels: request latency, status code, path, method, and host.
nginx_ingress_controller_config_hash	The hash value in the NGINX configuration file.
nginx_ingress_controller_config_last_reload_successful	Check whether the NGINX configuration file is reloaded.
nginx_ingress_controller_ingress_upstream_latency_seconds_sum	The latency of upstream service requests processed by Ingresses.
nginx_ingress_controller_nginx_process_cpu_seconds_total	The CPU utilization of the NGINX service.
nginx_ingress_controller_requests	The number of requests.
nginx_ingress_controller_success	The number of times the NGINX configuration file is reloaded.

Click the Overview tab to view the traffic information about a specified Ingress based on the Ingress name and the namespace to which the Ingress belongs. The traffic information includes the number of requests, the success rate of requests, the distribution of HTTP status codes, and request latencies.

Monitoring item	Description
PV	The total number of requests in a specified time range.
Inflow	The total traffic for receiving requests in a specified time range.
outflow	The total traffic for responding to requests in a specified time range.
Status code distribution	The pie chart of HTTP status codes in a specified time range.
Method distribution	The pie chart of request methods in a specified time range.
Distribution of Ingress request	The pie chart of controllers used to process requests in a specified time range.
PV	The line chart of the number of requests.
Failure rate	The line chart of the request failure rate. Requests for which an HTTP status code between 1xx and 3xx is returned are excluded.
Request delay	The line chart of P50, P95, P99, and P9999 latencies of all requests.

Click the TopN tab to view the top N data for the current cluster. The TopN tab displays the following sections: Request statistics, Geographical statistics, and Equipment statistics.

Request statistics: This section allows you to view the top N data collected and analyzed based on the number, failure rate, and latency of requests and the traffic consumed for requests.
Geographical statistics: This section allows you to view the top N data collected and analyzed based on the number of requests by a country and a region.
Equipment statistics: This section allows you to view the top N data collected and analyzed based on the number of requests by a browser, device type, and operating system.

Monitoring item	Description
Service request Top10	The top 10 services that receive the most requests in a specified time range.
Service failure rate Top10	The top 10 services that have the highest failure rate of requests in a specified time range. When an HTTP status code between 4xx and 5xx is returned, it means a request failed.
Service delay Top10	The top 10 services that have the longest latency in response to requests in a specified time range.
Service traffic Top10	The top 10 services that consume the most traffic in a specified time range.
Ingress Top10	The top 10 Ingresses that route the most requests, have the highest success rate, have the longest average latency, or consume the most inbound traffic and outbound traffic in a specified time range.
Service Top10	The top 10 services that receive the most requests, have the highest success rate, have the longest average latency, or consume the most inbound traffic and outbound traffic in a specified time range.
Country (region) request Top10	The top 10 regions from which the most requests are sent in a specified time range.
City Request Top10	The top 10 cities from which the most requests are sent in a specified time range.
Browser request Top10	The top 10 browsers from which the most requests are sent in a specified time range.
Device type request Top10	The top 10 devices from which the most requests are sent in a specified time range.
Operating System Request Top10	The top 10 operating systems from which the most requests are sent in a specified time range.

Click the URL Analysis tab to view the details based on request Uniform Resource Locators (URLs). The details include the QPS, latency, failure rate, and error details. You can also view the latest traffic of Ingresses. You can enter keywords, such as a URL or an HTTP status code, in the field next to Search to view the status of requests that match your specified traffic characteristics. URL Analysis 1

Monitoring item	Description
QPS	The QPS based on a URL. Allows you to filter the displayed results by cluster, namespace to which an Ingress belongs, Ingress, Service, and request time. Supports URL aggregation.
Duration - P95	The P95 latency based on a URL. Allows you to filter the displayed results by cluster, namespace to which an Ingress belongs, Ingress, Service, and request time. Supports URL aggregation. Uses a sampling period of 5 minutes.
Failure rate	The request failure rate based on a URL. Allows you to filter the displayed results by cluster, namespace to which an Ingress belongs, Ingress, Service, and request time. Supports URL aggregation. Uses a sampling period of 1 minute.
Error details	The request failure rate based on a URL and an HTTP status code. Allows you to filter the displayed results by cluster, namespace to which an Ingress belongs, Ingress, Service, and request time. Supports URL aggregation. Uses a sampling period of 1 minute.
Recent request log	The request logs in a specified time range.

Click the Log Analysis tab to view the aggregated log information about Ingress usage. Click the Angle bracket icon icon on the left side of a log record and click the URL next to TraceId. The trace analysis page of the ARMS console appears. Then, you can troubleshoot errors in the trace. For more information about how to query traces, see Trace query. Log Analysis

Field	Type	Unit	Description	Example
remote_addr	string	N/A	The source IP address of the HTTP client.	42.120.xx.xx
proxy_protocol_addr	string	N/A	The IP address of the HTTP client after the proxy is enabled.	42.120.xx.xx
remote_user	string	N/A	The username for the basic authentication of HTTP request headers.	N/A
time_local	string	N/A	The local time in the Common Log Format (CLF).	20/Apr/2022:14:36:26+0800
method	string	N/A	The HTTP request method.	GET
url	string	N/A	The HTTP request URL.	/echo/test
version	string	N/A	The version of the HTTP protocol.	HTTP/1.1
status	number	N/A	The HTTP status code returned to the HTTP client.	200
body_bytes_sent	number	Byte	The number of bytes returned to the HTTP client, excluding the length of response headers.	131
http_referer	string	N/A	The Referer header in the HTTP request.	N/A
http_user_agent	string	N/A	The UserAgent header in the HTTP request.	curl/7.77.0
request_length	number	Byte	The length of the HTTP request. The request line, request headers, and request body are included.	85
request_time	number	Seconds	The duration from the time when the HTTP request is sent from the HTTP client to the time when the first byte of the HTTP request is read.	1.004
proxy_upstream_name	string	N/A	The name of the upstream of the reverse proxy. The name is in the format of upstream-<namespace>-<service name>-<service port>.	default-test-svc-80
upstream_addr	string	N/A	The IP address and port number of the upstream of the reverse proxy.	10.53.xx.xx:18081
upstream_response_length	number	Byte	The length of the response returned by the upstream of the reverse proxy.	131
upstream_response_time	number	Seconds	The period of time consumed for receiving the response from the upstream of the reverse proxy.	1.005
upstream_status	number	N/A	The HTTP status code returned by the reverse proxy.	200
req_id	string	N/A	The value of the X-Request-ID header in the HTTP request. If the value is not specified, a randomly generated ID is used.	365fff78ac45b5b9033a1d503576****
host	string	N/A	The host header in the HTTP request.	example.aliyundoc.com
proxy_alternative_upstream_name	string	N/A	The name of the alternative upstream of the reverse proxy. The name is in the format of upstream-<namespace>-<service name>-<service port>.	N/A
http_traceparent	string	N/A	The information sourced from EDAS that contains the trace_id. For more information, see W3C Trace Context.	00-827ff36*****************b9ba52c980-3b0cf2185c670935-01
trace_id	string	N/A	The trace ID sourced from EDAS. You can go to the ARMS console and view trace analysis results based on the trace ID.	827ff36f90b1a99d06e862b9ba52****
content	string	N/A	The original log content if the Ingress log record fails to be parsed.	42.120.xx.xx - [42.120.xx.xx] - - [20/Apr/2022:14:36:26 +0800] "GET /echo/test HTTP/1.1" 200 131 "-" "curl/7.77.0" 85 1.004 [default-test-svc-80] 10.53.xx.xx:18081 131 1.005 200 365fff78ac45b5b9033a1d503576** example.aliyundoc.com [] 00-827ff36***************b9ba52c980-3b0cf2185c670935-01
geoip_city_name	string	N/A	The city information.	Hangzhou
geoip_continent_code	string	N/A	The continent code.	AS
geoip_country_code	string	N/A	The country code.	CN
geoip_timezone	string	N/A	The time zone of a geo location.	Asia/Shanghai
user_agent_browser_family	string	N/A	The browser that is used to a client.	HTTP Library
user_agent_os_family	string	N/A	The client device to access.	PC Smartphone
user_agent_device_category	string	N/A	The client operating system to access.	Windows 10

Click the NGINX Monitoring (Community Edition) and Request Processing Performance (Community Edition) tabs to view monitoring dashboards of the community version. The monitoring data is for reference only.