AI Application Observability provides a set of built-in evaluation templates for common scenarios. If you have custom evaluation metrics, you can use the real-time computing capabilities of the observability service to make custom calls.
Query built-in evaluation templates
Use the following statement to query all Chinese evaluation templates. You can modify the built-in templates for specific scenarios.
* | select * from "resource.llm_evaluation" where lang='cn'Access Alibaba Cloud Model Studio
http_call is a scalar function in SQL/Structured Process Language (SPL). It accepts information, such as the body and header, to access external services. You can use this function to access Alibaba Cloud Model Studio and process data.
Syntax
http_call(url, method, headers, params, body, timeout)Parameters
The Alibaba Cloud observability product charges for token consumption. To use your own account, you can specify the bearer key in the header.
Parameter | Type | Description |
url | string | The URL to access. Only access to Alibaba Cloud Model Studio is supported. Enter the HTTPS address for Model Studio: |
method | string in JSON format | The HTTP request method. |
headers | string in JSON format | The header for accessing Alibaba Cloud Model Studio. Use your own Model Studio key in the header. |
params | string in JSON format | Used for GET requests. Leave this empty for POST requests. |
body | message in JSON format | The prompt for accessing Alibaba Cloud Model Studio. Specify the model to access in the prompt. |
timeout | number | The timeout period in milliseconds. |
Example
http_call(
'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions',
'POST',
'{ "Content-Type":"application/json"}',
'',
'{
"model": "qwen-plus",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}
]
}',
60 * 1000
)Custom evaluation solutions
The following SQL statement summarizes data based on a custom evaluation template. You can specify the evaluation instructions in the SQL statement.
(* and id: 999 and type: dca)| set
session velox_use_io_executor = true;
with t1 as (
select
__time__,
"sentence1" as trans, -- The data to be processed.
'{{query}}' as targets, -- The placeholder in the evaluation template.
'{"model":"<QWEN_MODEL>","input":{"messages":[{"role":"system","content":"<SYSTEM_PROMPT>"},{"role":"user","content":"<USER_PROMPT>"}]}}' as body_template, -- The body template for accessing Alibaba Cloud Model Studio.
cast('Summarize the following content in one sentence: {{query}}' as varchar) as eval_prompt -- The evaluation template, which includes instructions and placeholders.
FROM log
),
t1_1 as (
select
__time__,
body_template,
eval_prompt,
replace(eval_prompt, targets, trans) as eval_content,
trans
FROM t1
),
t2 as (
select
__time__,
body_template,
eval_content,
trans
FROM t1_1
),
t3 as (
select
__time__,
trans as oldeval,
body_template,
replace(
replace(
replace(
replace(
replace(
replace(replace(eval_content, chr(92), '\\'), '"', '\"'),
chr(8),
'\b'
),
chr(12),
'\f'
),
chr(10),
'\n'
),
chr(13),
'\r'
),
chr(9),
'\t'
) as eval,
trans
FROM t2
),
t4 as (
select
__time__,
replace(
replace(
replace(body_template, '<QWEN_MODEL>', 'qwen-turbo'),
'<SYSTEM_PROMPT>',
'You are a helpful assistant.'
),
'<USER_PROMPT>',
eval
) as body,
oldeval,
trans
FROM t3
),
t5 as (
select
__time__,
http_call(
'https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation',
'POST',
'{ "Content-Type":"application/json"}',
'',
body,
60 * 1000
) as response,
body,
oldeval,
trans
FROM t4
),
t6 as (
select
__time__,
oldeval,
body,
response.code,
response.header,
response.body body_res,
response.error_msg as error,
trans
FROM t5
)
select
__time__,
trans as "Original Text",
replace(
replace(
json_extract_scalar(body_res, '$.output.text'),
'```json',
' '
),
'```',
' '
) as "Summary",
error,
json_extract_scalar(body_res, '$.usage.total_tokens') as "Tokens Consumed"
FROM t6You can also evaluate data using SPL:
.let t3=.logstore with(query='attributes.gen_ai.span.kind: LLM and attributes.input.value: * and *')
|extend trans = "attributes.input.value",targets = '{{query}}',body_template='{"model":"<QWEN_MODEL>","input":{"messages":[{"role":"system","content":"<SYSTEM_PROMPT>"},{"role":"user","content":"<USER_PROMPT>"}]}}' ,eval_prompt= cast('Summarize the following content in one sentence: {{query}}' as varchar)
|project __time__ ,trans,targets,body_template,eval_prompt;
$t3 |extend eval_content=replace(eval_prompt, targets, trans)
| extend eval = replace(replace(replace(replace(replace(replace(replace(eval_content, chr(92), '\\'), '"', '\"'), chr(8), '\b'), chr(12), '\f'), chr(10), '\n'), chr(13), '\r'), chr(9), '\t')
| extend body = replace(
replace(
replace(body_template, '<QWEN_MODEL>', 'qwen-max'),
'<SYSTEM_PROMPT>',
'You are a helpful assistant.'
),
'<USER_PROMPT>',
eval
)
| extend response = http_call(
'https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation',
'POST',
'{ "Content-Type":"application/json"}',
'',
body,
60 * 1000
)
| extend code=response.code, header=response.header,body_res = response.body,error = response.error_msg
| extend evaluationResult = replace(replace(json_extract_scalar(body_res ,'$.output.text'),'```json',' '),'```',' ') ,evaluationTemplate='mcp_tool_poisoning_attack_cn'
| limit 10000
| where evaluationResult <> 'null'
| project __time__,evaluationResult, evaluationTemplate, error ,body_res