All Products
Search
Document Center

Cloud Monitor:Custom assessment tasks

Last Updated:Jan 30, 2026

If the system's built-in evaluators, such as relevance, security, and duplication, do not meet your business needs, you can create a custom evaluator. A custom evaluator uses a prompt that you define to instruct a large language model (LLM) to act as a judge. The LLM then scores the output of your AI application based on the dimensions and standards that you specify.

Prerequisites

An AI application has been created and connected to observable data.

Procedure

Step 1: Go to the Create Assessment Task page

  1. Log on to the CloudMonitor 2.0 console and select the target workspace.

  2. In the navigation pane on the left, under All Features, select AI Application Observability Assessment.

  3. Click Assessment. On the assessment list page, click Create Assessment Task.

Step 2: Configure basic information

In the Basic Information section, configure the following parameters:

Parameter

Description

Task Name

Enter a name for the assessment task.

Data Source

Select the source type for the evaluation data. Only Pipeline is currently supported.

AI Application

From the drop-down list, select the AI application to assess.

Time Range

Select the time range for the assessment data.

Step 3: Create a custom evaluator

  1. In the Select Evaluator section, expand the LLM as Judge tab.

  2. Click the Create Custom Evaluator card to open the configuration window.

  3. In the configuration window that appears, configure the following parameters:

    Parameter

    Required

    Description

    Evaluator Name

    Yes

    Enter a name for the custom evaluator to identify it in the assessment task. For example: Technical Term Accuracy Assessment.

    Metric Name

    Yes

    Define the metric ID for the assessment result that is displayed in the report. Use English characters or underscores. For example: pro_term_accuracy.

    Evaluation Prompt

    No

    Write the judge prompt. This is the core configuration of the custom evaluator. Include the assessment dimensions, scoring criteria, and output requirements.

    • Assessment dimensions: Clearly tell the model what to check.

    • Scoring criteria: Define the scoring range, such as 0.0 to 1.0, and the specific meaning of each score.

    • Output requirements: Require the model to output in JSON format, including a score and an explanation (the reason for the score).

    Configure variable mapping

    No

    Map runtime variables from the application to placeholders in the prompt. This lets the evaluator access actual business data to make judgments.

    Filter Assessment Data

    No

    Use filter statements to define which data enters the assessment flow.

    • Scope: Select the data layer where the assessment logic applies.

      • Span (Default): Assesses a single operation node in the call chain.

      • Trace: Assesses the entire call chain.

      • Session: Assesses the entire session.

    • Filter statement: Use tags such as service name and properties to precisely target the assessment object. For example: serviceName = "your-service-name".

Configure variable mapping

Add mappings to map fields from Span data to placeholder variables in the prompt. The following fields are available for mapping:

Field

Description

attributes.gen_ai.input.messages

Input messages

attributes.gen_ai.output.messages

Output messages

attributes.input.value

Input value

attributes.output.value

Output value

attributes.gen_ai.response.reasoning_content

Inference content

attributes.retrieval.query

Retrieval query

attributes.retrieval.document

Retrieval document

attributes.reranker.input_document

Reranking input document

attributes.reranker.output_document

Reranking output document

attributes.gen_ai.tool.call.arguments

Tool calling arguments

attributes.gen_ai.tool.call.result

Tool calling result

attributes.gen_ai.tool.definitions

Tool definitions

  1. After you complete the configuration, review the filtered data in the Preview and Test area on the right to verify that the configuration is correct.

  2. Click OK to finish creating the custom evaluator.

Step 4: Save and run the assessment task

  1. Once created, the custom evaluator appears in the evaluator list.

  2. Select other built-in evaluators as needed.

  3. Click Save and Run to start the assessment task.

Preview and Test area description

When you configure a custom evaluator, the Preview and Test area on the right provides the following features:

Feature

Description

Number of data entries

Displays the total amount of data that matches the filter criteria.

Data navigation

Browse different data records using the Previous/Next buttons.

Current span information

View the detailed Span properties of the currently selected data.

Run test

After entering the evaluation prompt, run a test to validate the assessment logic.

Assessment result

View test results in list or JSON format.