KServe exposes a set of default Prometheus metrics for monitoring model service performance and health. This topic walks through deploying a scikit-learn InferenceService with Prometheus monitoring enabled, generating inference traffic, and querying the collected metrics in ARMS.
Prerequisites
Before you begin, ensure that you have:
-
Arena client version 0.9.15 or later. For more information, see Configure the Arena client.
-
The ack-kserve component installed. For more information, see Install the ack-kserve component.
-
Alibaba Cloud Prometheus monitoring enabled. For more information, see Enable Alibaba Cloud Prometheus monitoring.
Step 1: Deploy a KServe application
-
Deploy a KServe application for scikit-learn:
Resource Type Description sklearn-iris-metric-svcKubernetes Service Exposes the metrics endpoint on port 8080 sklearn-irisKServe InferenceService The model serving resource sklearn-iris-svcmonitorServiceMonitor Integrates with Alibaba Cloud Prometheus to scrape metrics from sklearn-iris-metric-svcarena serve kserve \ --name=sklearn-iris \ --image=kube-ai-registry.cn-shanghai.cr.aliyuncs.com/ai-sample/kserve-sklearn-server:v0.12.0 \ --cpu=1 \ --memory=200Mi \ --enable-prometheus=true \ --metrics-port=8080 \ "python -m sklearnserver --model_name=sklearn-iris --model_dir=/models --http_port=8080"The
--enable-prometheus=trueflag creates the following resources:Expected output:
service/sklearn-iris-metric-svc created inferenceservice.serving.kserve.io/sklearn-iris created servicemonitor.monitoring.coreos.com/sklearn-iris-svcmonitor created INFO[0004] The Job sklearn-iris has been submitted successfully INFO[0004] You can run `arena serve get sklearn-iris --type kserve -n default` to check the job status -
Create the
./iris-input.jsonfile with the following content. This file is used as the inference request payload.cat <<EOF > "./iris-input.json" { "instances": [ [6.8, 2.8, 4.8, 1.4], [6.0, 3.4, 4.5, 1.6] ] } EOF -
Retrieve the NGINX Ingress gateway IP address and the InferenceService hostname:
NGINX_INGRESS_IP=`kubectl -n kube-system get svc nginx-ingress-lb -ojsonpath='{.status.loadBalancer.ingress[0].ip}'` SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3) -
Use the Hey stress testing tool to generate inference traffic:
hey -z 2m -c 20 -m POST -host $SERVICE_HOSTNAME -H "Content-Type: application/json" -D ./iris-input.json http://${NGINX_INGRESS_IP}:80/v1/models/sklearn-iris:predictExpected output:
-
(Optional) Verify that metrics are exposed on the pod before querying ARMS. The pod exposes metrics on port 8080, but the port is not accessible from outside the cluster. Use port forwarding to access it locally:
Metric Type Labels Description request_preprocess_secondsHistogram model_namePreprocessing latency per request request_predict_secondsHistogram model_namePrediction latency per request request_postprocess_secondsHistogram model_namePostprocessing latency per request request_explain_secondsHistogram model_nameExplain request latency The
model_namelabel lets you filter and aggregate metrics by model when multiple models run in the same cluster.# Get the pod name POD_NAME=`kubectl get po|grep sklearn-iris |awk -F ' ' '{print $1}'` # Forward port 8080 of the pod to localhost kubectl port-forward pod/$POD_NAME 8080:8080Expected output:
Forwarding from 127.0.0.1:8080 -> 8080 Forwarding from [::1]:8080 -> 8080In a browser, open
http://localhost:8080/metricsto view the raw metrics. KServe exposes the following metrics:
Step 2: Query KServe application metrics
-
Log on to the ARMS console.
-
In the left navigation pane, click Integration Management, and then click Query Dashboards.
-
On the Dashboard List page, click the Kubernetes Pod dashboard to open the Grafana page.
-
In the left navigation pane, click Explore. Enter the following search statement to query the application metric values:
Data collection has a delay of approximately 5 minutes after traffic is generated.
request_predict_seconds_bucket
FAQ
Question
How do I confirm that the metrics for request_predict_seconds_bucket are being collected?
Solution
Check the scrape target status in ARMS:
-
Log on to the ARMS console.
-
In the left navigation pane, click Integration Management. On the Integrated Environments page, click the Container Service tab, and then click the name of your cluster. Click the Self-Monitoring tab.
-
In the left navigation pane, click Targets. If
default/sklearn-iris-svcmonitor/0 (1/1 up)is listed, metric collection is working correctly.