Activate and experience the service online - Platform For AI

PAI provides an intuitive web interface for model evaluation. You don't need to write code or prepare datasets — enter a question and a model response to generate evaluation results. You can also adjust advanced settings for more precise evaluation.

Prerequisites

Activate PAI.

Activate the service

Log on to the PAI console. In the left-side navigation pane, choose Model Application > Model Evaluation (ModelEval).
Click PAI Judge Model, and then click Activate Now. Follow the on-screen instructions to activate the model service.
After activation, go to the Overview page to view the Host and Token access parameters, usage, and other details.

Try it online

Log on to the PAI console. In the left-side navigation pane, choose Model Application > Model Evaluation (ModelEval).

Click PAI Judge Model, and then switch to the Online Demo tab. Configure the following parameters.

Evaluation content

Parameter	Description
Evaluate model	The following models are supported: pai-judge: A smaller model with better cost-effectiveness. pai-judge-plus: A larger model with better inference quality.
Evaluation Mode	Supported evaluation modes: Single Model Evaluation: Evaluates responses from a single model and generates scoring results. Dual Model Arena: Compares the performance of two models on the same questions and selects the better one.
Evaluation Questions	Enter the questions to evaluate.
Model Answer	Enter the model responses corresponding to the evaluation questions. Single-model mode: Enter one model response. Pairwise mode: Enter two model responses.
Standard Answer	Enter known reference answers. For deterministic questions, math problems, and translation tasks, reference answers can improve evaluation accuracy.

(Optional) Advanced settings

Parameter	Description
Evaluation Scenario
Problem Scenario	Scenarios include text rewriting, role-playing, code generation, and more. Each scenario has its own evaluation criteria to help the judge model produce more accurate scores. The system automatically classifies questions into scenarios. You can also specify a scenario manually.
Scenario Description	The description of the Problem Scenario.
Evaluation Criteria	The evaluation criteria for the Problem Scenario. You can customize the criteria.
Evaluation Score
Rating Range	Customize the score values for the judge model. Valid values: [2, 10]
Tier Definitions	The meaning of each score value.
Generate Parameters
Temperature	Controls the randomness of generated text. A lower value produces more conservative output, while a higher value produces more diverse output. Valid values: [0, 2)
Top_p	Controls the range of candidate tokens. The model randomly selects the next token from the set of tokens whose cumulative probability reaches the Top_p value. Valid values: [0, 1]

Click Evaluation. The judge model's output is returned in real time on the Evaluation results tab. You can provide feedback on the results to help improve the judge model.

On the Prompt Preview tab, the parameters from the online experience are automatically applied to the prompt template. View the complete prompt to understand how the judge model works.
You can also click Random Example to auto-fill the parameters for a quick trial of the judge model.