Platform for AI (PAI) provides a user-friendly web interface for beginners to easily experience the service. You do not need to program or prepare datasets. You can simply enter questions and model answers to generate evaluation results with a few clicks. The online experience also supports the modification of advanced configurations to obtain more accurate evaluations.
Prerequisites
Activate the service
Log on to the PAI console. On the Judge Model page, click Activate Now. Then, follow the instructions in the console to activate the model service.
After you activate the model service, you can view information about the Host and Token parameters, and call statistics on the Overview tab.
Experience the service online
Log on to the PAI console. On the Judge Model page, switch to the Playground tab and configure the parameters. The following table describes the parameters.
Evaluation content
Parameter
Description
Judge Model
The following two models are supported:
pai-judge: a small model with better cost-effectiveness.
pai-judge-plus: a large model with better inference results.
Evaluation Mode
You can select Single-answer Grading or Dual-model Competition.
Question
Enter the question to be evaluated.
Model Response
Enter the answers provided by models for the question.
Single model: Enter the answer provided by the single model.
Dual model: Enter the answers provided by two models.
Reference Answer
Enter the known reference answer.
For deterministic questions, mathematical questions, translations, and other scenarios, reference answers can improve evaluation accuracy.
(Optional) Advanced configurations
Parameter
Description
Evaluation Scenario
Question Scenario
Automatically detects the corresponding scenario based on your evaluation question. You can also manually specify the scenario.
Scenarios include text rewriting, role assumption, code generation, modification, and analysis, and others. Each scenario has different evaluation criteria, helping the judge model score more accurately.
Scenario Description
The scenario description.
Evaluation Criteria
The evaluation criteria. You can configure custom evaluation criteria.
Evaluation Score
Score Range
The score range for the judge model.
Valid values: [2,10].
Score Description
The meaning of each score.
Generation Parameters
Temperature
Controls the randomness of the generated text. The smaller the value, the more deterministic the model output, and the larger the value, the more diverse the model output.
Valid values: [0,2).
Top_p
Controls the selection range of candidate words. The model randomly selects the next word from the set of words whose cumulative probability reaches the Top_p value.
Valid values: [0,1].
Click Evaluate. The evaluation result of the judge model will be generated in a streaming manner on the Evaluation Result tab. You can provide feedback on the result to help us improve the effect of the judge model.
On the Prompt Preview tab, the online experience parameters are automatically inserted into the prompt template. You can view the complete prompt to better understand how the judge model works.
Click Fill In Random Example. The page automatically configures the parameters to help you quickly experience the capabilities of the judge model.