KServe provides a set of default Prometheus metrics to help you monitor the performance and health of services. This topic describes how to configure Managed Service for Prometheus for a service deployed by using KServe. In this example, a Qwen-7B-Chat-Int8 model that uses NVIDIA V100 GPUs is used.
Prerequisites
The Arena client of version 0.9.15 or later is installed. For more information, see Configure the Arena client.
The ack-kserve component is installed. For more information, see Install ack-kserve.
Managed Service for Prometheus is enabled for a Container Service for Kubernetes (ACK) cluster. For more information, see the Step 1: Enable Managed Service for Prometheus section of the "Managed Service for Prometheus" topic.
Step 1: Deploy an application by using KServer
Run the following command to deploy a scikit-learn-based application by using KServer:
arena serve kserve \ --name=sklearn-iris \ --image=kube-ai-registry.cn-shanghai.cr.aliyuncs.com/ai-sample/kserve-sklearn-server:v0.12.0 \ --cpu=1 \ --memory=200Mi \ --enable-prometheus=true \ --metrics-port=8080 \ "python -m sklearnserver --model_name=sklearn-iris --model_dir=/models --http_port=8080"
Expected output:
service/sklearn-iris-metric-svc created # A service named sklearn-iris-metric-svc is created. inferenceservice.serving.kserve.io/sklearn-iris created # An inference service named sklearn-iris is created by using KServer. servicemonitor.monitoring.coreos.com/sklearn-iris-svcmonitor created # A ServiceMonitor is created to integrate Managed Service for Prometheus and collect the monitoring data of the sklearn-iris-metric-svc service. INFO[0004] The Job sklearn-iris has been submitted successfully # The job is submitted to the cluster. INFO[0004] You can run `arena serve get sklearn-iris --type kserve -n default` to check the job status
The preceding output indicates that the Arena client has started a deployment process for a scikit-learn-based service by using KServer and integrated Managed Service for Prometheus with the service.
Run the following command to add the following JSON code to the
./iris-input.json
file to create inference requests:cat <<EOF > "./iris-input.json" { "instances": [ [6.8, 2.8, 4.8, 1.4], [6.0, 3.4, 4.5, 1.6] ] } EOF
Run the following command to obtain the IP address of the NGINX Ingress gateway and the hostname in the URL that allows external access to the inference service from the cluster:
NGINX_INGRESS_IP=`kubectl -n kube-system get svc nginx-ingress-lb -ojsonpath='{.status.loadBalancer.ingress[0].ip}'` SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
Run the following command to use the stress testing tool hey to access the service multiple times to generate monitoring data:
NoteFor more information about hey, see hey.
hey -z 2m -c 20 -m POST -host $SERVICE_HOSTNAME -H "Content-Type: application/json" -D ./iris-input.json http://${NGINX_INGRESS_IP}:80/v1/models/sklearn-iris:predict
Expected output:
The preceding output summarizes the performance of the system in a stress test based on the key metrics, including processing speed, data throughput, and response latency. This helps evaluate the efficiency and stability of the system.
Optional. Manually collect the metrics of the application and make sure that the metrics are properly exposed.
The following example shows how to collect monitoring metrics from a specific pod whose name contains
sklearn-iris
in an ACK cluster and view the data locally without the need to directly log on to the pod or expose the port of the pod to the external network.Run the following command to map port 8080 of the pod whose name contains sklearn-iris to port 8080 of a local host. You can specify the pod name by using the
$POD_NAME
variable. This way, requests sent to port 8080 of the local host are transparently forwarded to port 8080 of the pod.# Specify the pod name. POD_NAME=`kubectl get po|grep sklearn-iris |awk -F ' ' '{print $1}'` # Map port 8080 of the pod to port 8080 of the local host. kubectl port-forward pod/$POD_NAME 8080:8080
Expected output:
Forwarding from 127.0.0.1:8080 -> 8080 Forwarding from [::1]:8080 -> 8080
The preceding output shows that requests sent to port 8080 of the local host are forwarded to port 8080 of the pod as expected regardless of whether you connect to the local host by using an IPv4 address or an IPv6 address.
Enter the following URL in a browser to access port 8080 of the pod and view the metrics.
http://localhost:8080/metrics
Expected output:
The preceding output shows the metrics based on which the performance and status of the application in the pod are evaluated. The request is eventually forwarded to the application in the pod.
Step 2: Query the metrics of the application deployed by using KServe
Log on to the ARMS console.
In the left-side navigation pane, click Integration Management.
In the top navigation bar, select the region in which the ACK cluster resides. On the Integration Management page, click the Query Dashboards tab.
In the dashboards list, click the Kubernetes Pod dashboard to go to the Grafana page.
In the left-side navigation pane of the Grafana page, click Explore. On the Explore page, enter the
request_predict_seconds_bucket
statement to query the values of the application metrics.NoteData is collected with a delay of 5 minutes.
FAQ
Issue
How do I determine whether data of the request_predict_seconds_bucket
metric is collected? What do I do if the metric data fails to be collected?
Solution
Log on to the ARMS console.
In the left-side navigation pane, click Integration Management.
In the top navigation bar, select the region in which the ACK cluster resides. On the Integration Management page, click the Container Service tab under the Integrated Environments tab, and click an environment name to view the details page. On the Container Service page, click the Self-Monitoring tab.
In the left-side pane of the Self-Monitoring tab, click the Targets tab. If default/sklearn-iris-svcmonitor/0 (1/1 up) is displayed, the metric data is collected.
If the metric data fails to be collected, submit a ticket to seek for technical support.
References
For information about the default metrics provided by KServe, see Prometheus Metrics.