Others

[Model Studio] Upgrade of Model Observation

Model Observation is a built-in feature of Model Studio that lets you view call logs, token consumption, and performance metrics, including token latency, call duration, RPM (requests per minute), TPM (tokens per minute), and failure rate.

Changes

This upgrade includes:

  • In single-model monitoring, call statistics now use time-based bucketing, covering number of calls, number of failures, failure rate, and token usage. You can set the time granularity to hourly or daily.
  • Optimized charts for Basic and Advanced Monitoring:
    • Added failure information, including key error types such as 4xx, 5xx, rate limiting, and content moderation. Note: Only Advanced Monitoring displays failure details.
    • Improved the usage metric chart to display multiple metrics simultaneously without manual switching, including total, input, output, and cached tokens.
    • Added charts for display and comparison of average input and output tokens.

Impact

  • When the feature goes live on August 22, 2025, you can visit Model Studio Console > Model Observation to experience the upgraded capabilities.

Documentation