All Products
Search
Document Center

Container Service for Kubernetes:Konfigurasikan pemantauan Prometheus untuk KServe guna memantau kinerja dan kesehatan layanan model

Last Updated:Mar 26, 2026

KServe mengekspos serangkaian metrik Prometheus bawaan untuk memantau kinerja dan kesehatan layanan model. Topik ini menjelaskan langkah-langkah penerapan InferenceService scikit-learn dengan pemantauan Prometheus yang diaktifkan, menghasilkan lalu lintas inferensi, serta melakukan kueri terhadap metrik yang dikumpulkan di ARMS.

Prasyarat

Sebelum memulai, pastikan Anda telah:

Langkah 1: Deploy aplikasi KServe

  1. Deploy aplikasi KServe untuk scikit-learn:

    ResourceTypeDescription
    sklearn-iris-metric-svcKubernetes ServiceMengekspos titik akhir metrik pada Port 8080
    sklearn-irisKServe InferenceServiceSumber daya penyajian model
    sklearn-iris-svcmonitorServiceMonitorBerintegrasi dengan Prometheus Alibaba Cloud untuk mengambil metrik dari sklearn-iris-metric-svc
    arena serve kserve \
        --name=sklearn-iris \
        --image=kube-ai-registry.cn-shanghai.cr.aliyuncs.com/ai-sample/kserve-sklearn-server:v0.12.0 \
        --cpu=1 \
        --memory=200Mi \
        --enable-prometheus=true \
        --metrics-port=8080 \
        "python -m sklearnserver --model_name=sklearn-iris --model_dir=/models --http_port=8080"

    Flag --enable-prometheus=true membuat resource berikut:

    Expected output:

    service/sklearn-iris-metric-svc created
    inferenceservice.serving.kserve.io/sklearn-iris created
    servicemonitor.monitoring.coreos.com/sklearn-iris-svcmonitor created
    INFO[0004] The Job sklearn-iris has been submitted successfully
    INFO[0004] You can run `arena serve get sklearn-iris --type kserve -n default` to check the job status
  2. Buat file ./iris-input.json dengan konten berikut. File ini digunakan sebagai muatan permintaan inferensi.

    cat <<EOF > "./iris-input.json"
    {
      "instances": [
        [6.8,  2.8,  4.8,  1.4],
        [6.0,  3.4,  4.5,  1.6]
      ]
    }
    EOF
  3. Ambil alamat IP gerbang NGINX Ingress dan hostname InferenceService:

    NGINX_INGRESS_IP=`kubectl -n kube-system get svc nginx-ingress-lb -ojsonpath='{.status.loadBalancer.ingress[0].ip}'`
    SERVICE_HOSTNAME=$(kubectl get inferenceservice sklearn-iris -o jsonpath='{.status.url}' | cut -d "/" -f 3)
  4. Gunakan tool uji stres Hey untuk menghasilkan lalu lintas inferensi:

    hey -z 2m -c 20 -m POST -host $SERVICE_HOSTNAME -H "Content-Type: application/json" -D ./iris-input.json http://${NGINX_INGRESS_IP}:80/v1/models/sklearn-iris:predict

    Expected output:

    Click to view the expected output

    Summary:
      Total:        120.0296 secs
      Slowest:      0.1608 secs
      Fastest:      0.0213 secs
      Average:      0.0275 secs
      Requests/sec: 727.3875
    
      Total data:   1833468 bytes
      Size/request: 21 bytes
    
    Response time histogram:
      0.021 [1]     |
      0.035 [85717] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
      0.049 [1272]  |■
      0.063 [144]   |
      0.077 [96]    |
      0.091 [44]    |
      0.105 [7]     |
      0.119 [0]     |
      0.133 [0]     |
      0.147 [11]    |
      0.161 [16]    |
    
    Latency distribution:
      10% in 0.0248 secs
      25% in 0.0257 secs
      50% in 0.0270 secs
      75% in 0.0285 secs
      90% in 0.0300 secs
      95% in 0.0315 secs
      99% in 0.0381 secs
    
    Details (average, fastest, slowest):
      DNS+dialup:  0.0000 secs, 0.0213 secs, 0.1608 secs
      DNS-lookup:  0.0000 secs, 0.0000 secs, 0.0000 secs
      req write:   0.0000 secs, 0.0000 secs, 0.0225 secs
      resp wait:   0.0273 secs, 0.0212 secs, 0.1607 secs
      resp read:   0.0001 secs, 0.0000 secs, 0.0558 secs
    
    Status code distribution:
      [200] 87308 responses
  5. (Opsional) Verifikasi bahwa metrik diekspos pada pod sebelum melakukan kueri ke ARMS. Pod mengekspos metrik pada Port 8080, tetapi port tersebut tidak dapat diakses dari luar kluster. Gunakan penerusan port untuk mengaksesnya secara lokal:

    MetricTypeLabelsDescription
    request_preprocess_secondsHistogrammodel_nameLatensi pra-pemrosesan per permintaan
    request_predict_secondsHistogrammodel_nameLatensi prediksi per permintaan
    request_postprocess_secondsHistogrammodel_nameLatensi pasca-pemrosesan per permintaan
    request_explain_secondsHistogrammodel_nameJelaskan latency permintaan

    Label model_name memungkinkan Anda menyaring dan mengagregasi metrik berdasarkan model ketika beberapa model dijalankan dalam kluster yang sama.

    # Dapatkan nama pod
    POD_NAME=`kubectl get po|grep sklearn-iris |awk -F ' ' '{print $1}'`
    # Teruskan Port 8080 pod ke localhost
    kubectl port-forward pod/$POD_NAME 8080:8080

    Expected output:

    Click to view the expected output

    # HELP python_gc_objects_collected_total Objects collected during gc
    # TYPE python_gc_objects_collected_total counter
    python_gc_objects_collected_total{generation="0"} 10298.0
    python_gc_objects_collected_total{generation="1"} 1826.0
    python_gc_objects_collected_total{generation="2"} 0.0
    # HELP python_gc_objects_uncollectable_total Uncollectable object found during GC
    # TYPE python_gc_objects_uncollectable_total counter
    python_gc_objects_uncollectable_total{generation="0"} 0.0
    python_gc_objects_uncollectable_total{generation="1"} 0.0
    python_gc_objects_uncollectable_total{generation="2"} 0.0
    # HELP python_gc_collections_total Number of times this generation was collected
    # TYPE python_gc_collections_total counter
    python_gc_collections_total{generation="0"} 660.0
    python_gc_collections_total{generation="1"} 60.0
    python_gc_collections_total{generation="2"} 5.0
    # HELP python_info Python platform information
    # TYPE python_info gauge
    python_info{implementation="CPython",major="3",minor="9",patchlevel="18",version="3.9.18"} 1.0
    # HELP process_virtual_memory_bytes Virtual memory size in bytes.
    # TYPE process_virtual_memory_bytes gauge
    process_virtual_memory_bytes 1.406291968e+09
    # HELP process_resident_memory_bytes Resident memory size in bytes.
    # TYPE process_resident_memory_bytes gauge
    process_resident_memory_bytes 2.73207296e+08
    # HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
    # TYPE process_start_time_seconds gauge
    process_start_time_seconds 1.71533439115e+09
    # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
    # TYPE process_cpu_seconds_total counter
    process_cpu_seconds_total 228.18
    # HELP process_open_fds Number of open file descriptors.
    # TYPE process_open_fds gauge
    process_open_fds 16.0
    # HELP process_max_fds Maximum number of open file descriptors.
    # TYPE process_max_fds gauge
    process_max_fds 1.048576e+06
    # HELP request_preprocess_seconds pre-process request latency
    # TYPE request_preprocess_seconds histogram
    request_preprocess_seconds_bucket{le="0.005",model_name="sklearn-iris"} 259709.0
    ...
    # HELP request_predict_seconds predict request latency
    # TYPE request_predict_seconds histogram
    request_predict_seconds_bucket{le="0.005",model_name="sklearn-iris"} 259708.0
    ...
    # HELP request_explain_seconds explain request latency
    # TYPE request_explain_seconds histogram
    Forwarding from 127.0.0.1:8080 -> 8080
    Forwarding from [::1]:8080 -> 8080

    Buka browser dan akses http://localhost:8080/metrics untuk melihat metrik mentah. KServe mengekspos metrik berikut:

Langkah 2: Kueri metrik aplikasi KServe

  1. Login ke Konsol ARMS.

  2. Di panel navigasi kiri, klik Integration Management, lalu klik Query Dashboards.

  3. Pada halaman Dashboard List, klik dasbor Kubernetes Pod untuk membuka halaman Grafana.

  4. Di panel navigasi kiri, klik Explore. Masukkan pernyataan pencarian berikut untuk mengkueri nilai metrik aplikasi:

    Pengumpulan data memiliki penundaan sekitar 5 menit setelah lalu lintas dihasilkan.

    request_predict_seconds_bucket

    image

FAQ

Question

Bagaimana cara memastikan bahwa metrik untuk request_predict_seconds_bucket sedang dikumpulkan?

Solution

Periksa status target pengambilan (scrape) di ARMS:

  1. Login ke Konsol ARMS.

  2. Di panel navigasi kiri, klik Integration Management. Pada halaman Integrated Environments, klik tab Container Service, lalu klik nama kluster Anda. Klik tab Self-Monitoring.

  3. Di panel navigasi kiri, klik Targets. Jika default/sklearn-iris-svcmonitor/0 (1/1 up) tercantum, pengumpulan metrik berjalan dengan benar.

Referensi