API 操作の呼び出し - Platform For AI - Alibaba Cloud ドキュメントセンター

このトピックでは、判定モデルに対して単一の呼び出しと複数の呼び出しを実行する方法の例を示します。

前提条件

判定モデル機能が有効になっていること。オンラインで機能を体験できます。

ホストとトークンのパラメーターの値は判定モデルページで取得し、関連する判定モデルのエンドポイントはホストパラメーターの値に基づいて取得します。エンドポイントを使用して、評価のために判定モデルを呼び出すことができます。

次の表に、さまざまな使用シナリオにおける判定モデルのエンドポイントを示します。

シナリオ	機能	BASE_URL/endpoint
Python 用 SDK を使用して判定モデルを呼び出す		http://ai-service.ce8cc13b6421545749e7b4605f3d02607.cn-hangzhou.alicontainer.com/v1
HTTP 経由で判定モデルを呼び出す	チャット補完	https://aiservice.cn-hangzhou.aliyuncs.com/v1/chat/completions
	ファイル	https://aiservice.cn-hangzhou.aliyuncs.com/v1/files
	バッチ	https://aiservice.cn-hangzhou.aliyuncs.com/v1/batches

サポートされている判定モデル

次の表に、サポートされている判定モデルを示します。

モデル名	説明	コンテキスト長	最大入力	最大出力
pai-judge (標準版)	費用対効果の高い小規模判定モデル。	32,768	32,768	32,768
pai-judge-plus (アドバンスト版)	優れた推論効果を備えた大規模判定モデル。	32,768	32,768	32,768

単一呼び出しの例 (オンライン)

判定モデルは、単一モデル評価モードとデュアルモデル評価モードをサポートしています。これらの 2 つのモードがビジネス要件を満たせない場合は、カスタムテンプレートを使用できます。

詳細については、「入力パラメーター」および「レスポンスパラメーター」をご参照ください。

単一モデル評価

このモードでは、単一の大規模言語モデル (LLM) からの回答の品質を評価します。

サンプルリクエスト

Python

import os
from openai import OpenAI


def main():
    base_url = "https://aiservice.cn-hangzhou.aliyuncs.com/v1"
    judge_model_token = os.getenv("JUDGE_MODEL_TOKEN")

    client = OpenAI(
        api_key=f'Authorization: Bearer {judge_model_token}',
        base_url=base_url
    )
    completion = client.chat.completions.create(
        model='pai-judge',
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "mode": "single",
                        "type": "json",
                        "json": {
                            "question": "According to the first couplet, give the second couplet. first couplet: To climb the mountain, reach the peak", // 上句に基づいて下句を挙げてください。上句：山に登り、頂上に到達する
                            "answer": "To cross the river, find the creek." // 川を渡り、小川を見つける。
                        }
                    }
                ]
            }
        ]
    )
    print(completion.model_dump())


if __name__ == '__main__':
    main()

curl

$ curl -X POST https://aiservice.cn-hangzhou.aliyuncs.com/v1/chat/completions \
  -H "Authorization: Bearer ${JUDGE_MODEL_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "pai-judge",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "mode": "single",
                    "type": "json",
                    "json": {
                        "question": "According to the first couplet, give the second couplet. first couplet: To climb the mountain, reach the peak", // 上句に基づいて下句を挙げてください。上句：山に登り、頂上に到達する
                        "answer": "To cross the river, find the creek." // 川を渡り、小川を見つける。
                    }
                }
            ]
        }
    ]
}'

サンプルレスポンス

{
    "id": "3b7c3822-1e51-4dc9-b2ad-18b9649a7f19",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null,
            "message": {
                "content": "I think the overall score of the answer is [[2]] due to the following reasons: \nAdvantages of the answer: \n1. Relevance: The answer directly addresses the question. This meets the relevance criteria. [[4]]\n2. Harmlessness: The answer is appropriate and does not contain offensive content. This meets the harmlessness criteria. [[5]]\n\nDisadvantages of the answer: \n1. Accuracy: The answer does not completely conform to the logical sequence of the question because the river and creek do not match the mountain and peak. This reduces the accuracy of the answer. [[2]]\n2. Completeness: The answer does not fully address all aspects of the question because the answer does not provide a complete story or completely align with the question. This affects the completeness of the answer. [[2]]\n3. Source reliability: The answer does not provide source information. Although the source information may not be necessary in some scenarios, the information can enhance the credibility of the answer. [[3]]\n4. Clarity and structure: Although the answer is simple in structure, its clarity and comprehensibility are affected because the answer does not fully correspond to the question. [[3]]\n5. Adaptability to the user level: Although the answer directly addresses the question, the answer may not be completely suitable for users who have a certain understanding of couplets or traditional literature due to inaccuracy. [[3]]\n\n In summary, although the answer performs well in relevance and harmlessness, the answer shows shortcomings in accuracy, completeness, source reliability, clarity and structure, and adaptability to the user level, which results in an overall rating of 2.", // 回答の総合スコアは[[2]]だと思います。理由は以下のとおりです。\n回答のメリット：\n1. 関連性：回答は質問に直接答えています。これは関連性の基準を満たしています。[[4]]\n2. 無害性：回答は適切であり、不快なコンテンツを含んでいません。これは無害性の基準を満たしています。[[5]]\n\n回答のデメリット：\n1. 正確性：川と小川は山と頂上に一致しないため、回答は質問の論理シーケンスに完全には準拠していません。これにより、回答の正確性が低下します。[[2]]\n2. 完全性：回答は完全なストーリーを提供していない、または質問と完全に一致していないため、質問のすべての側面に完全には対応していません。これは回答の完全性に影響します。[[2]]\n3. ソースの信頼性：回答はソース情報を提供していません。ソース情報は一部のシナリオでは不要な場合がありますが、情報は回答の信頼性を高めることができます。[[3]]\n4. 明確さと構造：回答の構造は単純ですが、質問に完全には対応していないため、明確さと分かりやすさが影響を受けます。[[3]]\n5. ユーザーレベルへの適応性：回答は質問に直接答えていますが、不正確なため、対句や伝統文学についてある程度の理解があるユーザーには完全に適していない可能性があります。[[3]]\n\n要約すると、回答は関連性と無害性の点で優れていますが、正確性、完全性、ソースの信頼性、明確さと構造、ユーザーレベルへの適応性の点で欠点があり、その結果、総合評価は 2 になります。
                "role": "assistant",
                "function_call": null,
                "tool_calls": null,
                "refusal": ""
            }
        }
    ],
    "created": 1733260,
    "model": "pai-judge",
    "object": "chat.completion",
    "service_tier": "",
    "system_fingerprint": "",
    "usage": {
        "completion_tokens": 333,
        "prompt_tokens": 790,
        "total_tokens": 1123
    }
}

デュアルモデルコンペティション

このモードでは、2 つの LLM からの同じ質問に対する回答の品質を評価します。

サンプルリクエスト

Python

import os
from openai import OpenAI


def main():
    base_url = "https://aiservice.cn-hangzhou.aliyuncs.com/v1"
    judge_model_token = os.getenv("JUDGE_MODEL_TOKEN")

    client = OpenAI(
        api_key=f'Authorization: Bearer {judge_model_token}',
        base_url=base_url
    )
    completion = client.chat.completions.create(
        model='pai-judge',
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "mode": "pairwise",
                        "type": "json",
                        "json": {
                            "question": "According to the first couplet, give the second couplet. first couplet: To climb the mountain, reach the peak", // 上句に基づいて下句を挙げてください。上句：山に登り、頂上に到達する
                            "answer1": "To cross the river, find the creek.", // 川を渡り、小川を見つける。
                            "answer2": "To chase the dream, grasp the star." // 夢を追い、星をつかむ。
                        }
                    }
                ]
            }
        ]
    )
    print(completion.model_dump())


if __name__ == '__main__':
    main()

curl

$ curl -X POST https://aiservice.cn-hangzhou.aliyuncs.com/v1/chat/completions \
  -H "Authorization: Bearer ${JUDGE_MODEL_TOKEN}"  \
  -H "Content-Type: application/json" \
  -d '{
    "model": "pai-judge",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "mode": "pairwise",
                    "type": "json",
                    "json": {
                        "question": "According to the first couplet, give the second couplet. first couplet: To climb the mountain, reach the peak", // 上句に基づいて下句を挙げてください。上句：山に登り、頂上に到達する
                        "answer1": "To cross the river, find the creek.", // 川を渡り、小川を見つける。
                        "answer2": "To chase the dream, grasp the star." // 夢を追い、星をつかむ。
                    }
                }
            ]
        }
    ]
}'

サンプルレスポンス

{
    'id': 'a7026e5a-64c5-4726-9b10-27072ff34d46',
    'choices': [{
        'finish_reason': 'stop',
        'index': 0,
        'logprobs': None,
        'message': {
            'content': '***\n I regard [[the two answers as equivalent]]. The overall score of Answer 1 is [[4]] and the overall score of Answer 2 is [[4]] for the following reasons: \n1. Accuracy: The two answers accurately address the question and do not contain incorrect or misleading information. [[Rating for Answer 1: 5]] [[Rating for Answer 2: 5]]\n2. Relevance: The two answers directly address the question without including unnecessary information or background and completely meet the user requirements. [[Rating for Answer 1: 5]] [[Rating for Answer 2: 5]]\n3. Harmlessness: The two answers do not contain offensive content. The two answers are positive and appropriate in their expression and meet the requirements for appropriateness and cultural sensitivity. [[Rating for Answer 1: 5]] [[Rating for Answer 2: 5]]\n4. Completeness: The two answers completely provide a right couplet to the question without missing key points. [[Rating for Answer 1: 5]] [[Rating for Answer 2: 5]]\n5. Source reliability: Although the two answers do not cite external authoritative sources, the creation and sharing of couplets often do not require external validation in this scenario. In this case, the source reliability can be ignored. [[Rating for Answer 1: 4]] [[Rating for Answer 2: 4]]\n6. Clarity and structure: The two answers are concise, clearly structured, and easy to understand. [[Rating for Answer 1: 5]] [[Rating for Answer 2: 5]]\n7. Timeliness: This criteria is inapplicable in this scenario because couplet culture has a rich history, and both answers conform to traditional expressions. [[Rating for Answer 1: N/A]] [[Rating for Answer 2: N/A]]\n8. Adaptability to the user level: The two answers use simple and understandable language. In this case, both answers are suitable for users of any level. [[Rating for Answer 1: 5]] [[Rating for Answer 2: 5]]\n\n In summary, the two answers perform equally well across all criteria and adequately meet the user requirements. I regard the two answers as equivalent. \n***', // ***\n[[2 つの回答は同等である]]と見なします。回答 1 の総合スコアは[[4]]、回答 2 の総合スコアは[[4]]です。理由は以下のとおりです。\n1. 正確性：2 つの回答は質問に正確に答えており、不正確または誤解を招く情報を含んでいません。[[回答 1 の評価：5]] [[回答 2 の評価：5]]\n2. 関連性：2 つの回答は、不要な情報や背景を含ことなく質問に直接答えており、ユーザーの要件を完全に満たしています。[[回答 1 の評価：5]] [[回答 2 の評価：5]]\n3. 無害性：2 つの回答は不快なコンテンツを含んでいません。2 つの回答は表現が肯定的で適切であり、適切性と文化的な配慮の要件を満たしています。[[回答 1 の評価：5]] [[回答 2 の評価：5]]\n4. 完全性：2 つの回答は、重要なポイントを見逃すことなく、質問に対して正しい対句を完全に提供しています。[[回答 1 の評価：5]] [[回答 2 の評価：5]]\n5. ソースの信頼性：2 つの回答は外部の信頼できるソースを引用していませんが、このシナリオでは、対句の作成と共有に外部の検証は必要ありません。この場合、ソースの信頼性は無視できます。[[回答 1 の評価：4]] [[回答 2 の評価：4]]\n6. 明確さと構造：2 つの回答は簡潔で、明確に構造化されており、理解しやすいです。[[回答 1 の評価：5]] [[回答 2 の評価：5]]\n7. タイムリー性：対句文化には豊かな歴史があり、両方の回答が伝統的な表現に準拠しているため、この基準はこのシナリオには適用されません。[[回答 1 の評価：該当なし]] [[回答 2 の評価：該当なし]]\n8. ユーザーレベルへの適応性：2 つの回答は、シンプルで分かりやすい言葉を使用しています。この場合、両方の回答はあらゆるレベルのユーザーに適しています。[[回答 1 の評価：5]] [[回答 2 の評価：5]]\n\n要約すると、2 つの回答はすべての基準で同等に優れており、ユーザーの要件を十分に満たしています。2 つの回答は同等であると見なします。\n***
            'role': 'assistant',
            'function_call': None,
            'tool_calls': None,
            'refusal': ''
        }
    }],
    'created': 1734557,
    'model': 'pai-judge',
    'object': 'chat.completion',
    'service_tier': '',
    'system_fingerprint': '',
    'usage': {
        'completion_tokens': 408,
        'prompt_tokens': 821,
        'total_tokens': 1229
    }
}

カスタムテンプレート

上記のサンプルコードを使用して判定モデルを呼び出すと、システムは関連するプロンプトテンプレートを生成します。テンプレートが要件を満たさない場合は、カスタム評価テンプレートを使用できます。この例では、デュアルモデルコンペティション用のテンプレートが生成されます。

サンプルリクエスト

Python

import os
from openai import OpenAI


def main():
    base_url = "https://aiservice.cn-hangzhou.aliyuncs.com/v1"
    judge_model_token = os.getenv("JUDGE_MODEL_TOKEN")

    client = OpenAI(
        api_key=f'Authorization: Bearer {judge_model_token}',
        base_url=base_url
    )

    system = "Please evaluate the quality of the following answers to the question from AI assistants as a judge. \n\n" \
        "The following description provides basic character introductions to the AI assistants:\n" \
        "AI assistants do not evaluate, compare, or do anything harmful to people. The AI assistants have a personality that leans towards being independent and autonomous. \n" // 審査員として、AI アシスタントからの質問に対する以下の回答の質を評価してください。\n\n以下の説明は、AI アシスタントの基本的な性格を紹介します。\nAI アシスタントは、人々を評価、比較、または害するようなことはしません。AI アシスタントは、独立性と自律性を重視する性格を持っています。
    user = \
        "Please score the following answer to the question on a scale of 1 to 5: \n" \
        "Question: What do you think is the effect of social media on relationships? \n" \
        "Answer: Social media allows people to easily keep in contact with each other. However, social media can also lead to alienation. \n" \
        "Scoring criteria: \n" \
        "1: The answer is completely irrelevant, has no content, or is completely incorrect. \n" \
        "2: Specific content of the answer is relevant. However, the content is superficial or excessively brief. \n" \
        "3: The answer is relevant and provides insights. However, the answer lacks in-depth analysis. \n" \
        "4: The answer is relevant, in-depth, and provides clear insights and examples. \n" \
        "5: The answer is very relevant and profound and provides comprehensive insights and rich examples." // 以下の質問に対する回答を 1 から 5 のスケールで採点してください。\n質問：ソーシャルメディアは人間関係にどのような影響を与えていると思いますか？\n回答：ソーシャルメディアは、人々が簡単にお互いに連絡を取り合うことを可能にします。しかし、ソーシャルメディアは疎外感にもつながる可能性があります。\n採点基準：\n1：回答は完全に無関係、内容がない、または完全に間違っている。\n2：回答の特定の内容は関連している。しかし、内容は表面的または過度に簡潔である。\n3：回答は関連しており、洞察を提供している。しかし、回答には詳細な分析が欠けている。\n4：回答は関連しており、詳細であり、明確な洞察と例を提供している。\n5：回答は非常に関連しており、深遠であり、包括的な洞察と豊富な例を提供している。

    completion = client.chat.completions.create(
        model='pai-judge',
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": user}
        ]
    )
    print(completion.model_dump())


if __name__ == '__main__':
    main()

curl

$ curl -X POST https://aiservice.cn-hangzhou.aliyuncs.com/v1/chat/completions \
  -H "Authorization: Bearer ${JUDGE_MODEL_TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "pai-judge",
    "messages": [
        {
            "role": "user",
            "content": [
                {"role": "system", "content": "Please evaluate the quality of the following answer to the question from AI assistants as a judge. \n\n The following section provides basic character introductions to the AI assistants: \n The AI assistants do not evaluate, compare, or do anything harmful to people. The AI assistants have a personality that leans towards being independent and autonomous. \n"}, // 審査員として、AI アシスタントからの質問に対する以下の回答の質を評価してください。\n\n以下のセクションでは、AI アシスタントの基本的な性格を紹介します。\nAI アシスタントは、人々を評価、比較、または害するようなことはしません。AI アシスタントは、独立性と自律性を重視する性格を持っています。
                {
                  "role": "user", 
                  "content": 
                      "Please score the following answer to the question on a scale of 1 to 5: \n"
                      "Question: What do you think is the effect of social media on relationships? \n"
                      "Answer: Social media allows people to easily keep in contact with each other. However, social media can also lead to alienation. \n"
                      "Scoring criteria: \n"
                      "1: The answer is completely irrelevant, has no content, or is completely incorrect. \n"
                      "2: Specific content of the answer is relevant. However, the content is superficial or excessively brief. \n"
                      "3: The answer is relevant and provides insights. However, the answer lacks in-depth analysis. \n"
                      "4: The answer is relevant, in-depth, and provides clear insights and examples. \n"
                      "5: The answer is very relevant and profound and provides comprehensive insights and rich examples." // 以下の質問に対する回答を 1 から 5 のスケールで採点してください。\n質問：ソーシャルメディアは人間関係にどのような影響を与えていると思いますか？\n回答：ソーシャルメディアは、人々が簡単にお互いに連絡を取り合うことを可能にします。しかし、ソーシャルメディアは疎外感にもつながる可能性があります。\n採点基準：\n1：回答は完全に無関係、内容がない、または完全に間違っている。\n2：回答の特定の内容は関連している。しかし、内容は表面的または過度に簡潔である。\n3：回答は関連しており、洞察を提供している。しかし、回答には詳細な分析が欠けている。\n4：回答は関連しており、詳細であり、明確な洞察と例を提供している。\n5：回答は非常に関連しており、深遠であり、包括的な洞察と豊富な例を提供している。
                }
            ]
        }
    ]
}'

サンプルレスポンス

{
  "id": "e2f72777-ddf5-4ff8-b7dd-4ecefd6e4014",
  "object": "chat.completion",
  "created": 1153092,
  "model": "pai-judge",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Based on the provided scoring criteria, I rate this answer as 3. The answer \"Social media allows people to easily keep in contact with each other. However, social media can also lead to alienation.\" is explicit in relevance and directly addresses the effect of social media on relationships. The answer mentions two opposite effects: strengthening relationships and alienation, which reveals some insights. However, the answer is relatively brief, does not further develop the specific effect of the two aspects or provide examples to support its views, and lacks in-depth analysis. The answer is scored 3 based on the criteria. The answer is relevant to the question and provides insights. However, the answer lacks in-depth analysis.", // 提供された採点基準に基づいて、この回答を 3 と評価します。回答「ソーシャルメディアは、人々が簡単にお互いに連絡を取り合うことを可能にします。しかし、ソーシャルメディアは疎外感にもつながる可能性があります。」は関連性が明確であり、ソーシャルメディアの人間関係への影響に直接対応しています。回答は、人間関係の強化と疎外という 2 つの反対の影響について言及しており、いくつかの洞察を明らかにしています。しかし、回答は比較的簡潔であり、2 つの側面の具体的な影響をさらに発展させたり、その見解を裏付ける例を提供したりしておらず、詳細な分析が欠けています。回答は基準に基づいて 3 と採点されます。回答は質問に関連しており、洞察を提供しています。しかし、回答には詳細な分析が欠けています。
        "refusal": "",
        "function_call": null,
        "tool_calls": null},
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 910,
    "completion_tokens": 411,
    "total_tokens": 1321
  },
  "system_fingerprint": "",
  "service_tier": ""
}

複数呼び出しの例 (オフライン)

ステップ 1: バッチデータを準備する

以下の要件を満たす必要があります。

単一ファイルのサイズは 10 MB を超えることはできません。大きなファイルは複数の小さなファイルに分割できます。
単一アカウントによってアップロードされるすべてのファイルの合計サイズは 100 GB を超えることはできません。
バッチ処理用の API ファイルは .jsonl 形式である必要があります。
ファイルの各行には単一の API リクエストの詳細が含まれており、各行のパラメーターには body パラメーターと一意の custom_id 値が含まれている必要があります。サポートされている body パラメーターの詳細については、「入力パラメーター」をご参照ください。

サンプルファイル形式:

{"custom_id": "request-1", "body": {"model": "pai-judge", "messages": [{"role": "user", "content": [{"mode": "single", "type": "json", "json": {"question": "According to the first couplet, give the second couplet. first couplet: To climb the mountain, reach the peak", "answer": "To cross the river, find the creek."}}]}]}} // {"custom_id": "request-1", "body": {"model": "pai-judge", "messages": [{"role": "user", "content": [{"mode": "single", "type": "json", "json": {"question": "上句に基づいて下句を挙げてください。上句：山に登り、頂上に到達する", "answer": "川を渡り、小川を見つける。"}}]}]}}
{"custom_id": "request-2", "body": {"model": "pai-judge-plus", "messages": [{"role": "user", "content": [{"mode": "single", "type": "json", "json": {"question": "According to the first couplet, give the second couplet. first couplet: To climb the mountain, reach the peak", "answer": "To cross the river, find the creek."}}]}]}} // {"custom_id": "request-2", "body": {"model": "pai-judge-plus", "messages": [{"role": "user", "content": [{"mode": "single", "type": "json", "json": {"question": "上句に基づいて下句を挙げてください。上句：山に登り、頂上に到達する", "answer": "川を渡り、小川を見つける。"}}]}]}}

ステップ 2: バッチデータをアップロードする

判定モデルを使用してバッチデータをサーバーにアップロードし、一意の file_id 値を取得します。

サンプルリクエスト

Python

import os
from openai import OpenAI


def main():
    base_url = "https://aiservice.cn-hangzhou.aliyuncs.com/v1"
    judge_model_token = os.getenv("JUDGE_MODEL_TOKEN")

    client = OpenAI(
        api_key=f'Authorization: Bearer {judge_model_token}',
        base_url=base_url
    )
    upload_files = client.files.create(
        file=open("/home/xxx/input.jsonl", "rb"), // /home/xxx/input.jsonl
        purpose="batch",
    )
    print(upload_files.model_dump_json(indent=4))


if __name__ == '__main__':
    main()

curl

$ curl -XPOST https://aiservice.cn-hangzhou.aliyuncs.com/v1/files \
  -H "Authorization: Bearer ${JUDGE_MODEL_TOKEN}" \
  -F purpose="batch"  \
  -F file="@/home/xxx/input.jsonl" // -F file="@/home/xxx/input.jsonl"

サンプルレスポンス

{
    "id": "file-batch-EC043540BE1C7BE3F9F2F0A8F47D1713",
    "object": "file",
    "bytes": 698,
    "created_at": 1742454203,
    "filename": "input.jsonl",
    "purpose": "batch"
}

ステップ 3: 複数のジョブを同時に作成する

ファイルをアップロードした後、file_id に基づいて複数のジョブを同時に作成します。

この例では、file_id は file-batch-EC043540BE1C7BE3F9F2F0A8F47D1713 です。完了ウィンドウは 24 時間にのみ設定できます。ジョブが作成されると、一意の batch_id が返されます。

サンプルリクエスト

Python

import os
from openai import OpenAI


def main():
    base_url = "https://aiservice.cn-hangzhou.aliyuncs.com/v1"
    judge_model_token = os.getenv("JUDGE_MODEL_TOKEN")

    client = OpenAI(
        api_key=f'Authorization: Bearer {judge_model_token}',
        base_url=base_url
    )
    create_batches = client.batches.create(
        endpoint="/v1/chat/completions",
        input_file_id="file-batch-EC043540BE1C7BE3F9F2F0A8F47D1713", // file-batch-EC043540BE1C7BE3F9F2F0A8F47D1713
        completion_window="24h", // 24h
    )
    print(create_batches.model_dump_json(indent=4))


if __name__ == '__main__':
    main()

curl

$ curl -XPOST https://aiservice.cn-hangzhou.aliyuncs.com/v1/batches \ 
    -H "Authorization: Bearer ${JUDGE_MODEL_TOKEN}" \
    -d '{
        "input_file_id": "file-batch-EC043540BE1C7BE3F9F2F0A8F47D1713", // "input_file_id": "file-batch-EC043540BE1C7BE3F9F2F0A8F47D1713"
        "endpoint": "/v1/chat/completions",
        "completion_window": "24h" // "completion_window": "24h"
 }'

サンプルレスポンス

{
    "id": "batch_66f245a0-88d1-458c-8e1c-a819a5943022",
    "object": "batch",
    "endpoint": "/v1/chat/completions",
    "errors": null,
    "input_file_id": "file-batch-EC043540BE1C7BE3F9F2F0A8F47D1713",
    "completion_window": "24h",
    "status": "Creating",
    "output_file_id": null,
    "error_file_id": null,
    "created_at": 1742455213,
    "in_process_at": null,
    "expires_at": null,
    "FinalizingAt": null,
    "completed_at": null,
    "failed_at": null,
    "expired_at": null,
    "cancelling_at": null,
    "cancelled_at": null,
    "request_counts": {
        "total": 3,
        "completed": 0,
        "failed": 0
    },
    "metadata": null
}

ステップ 4: ジョブステータスを確認する

batch_id に基づいてジョブの実行ステータスをクエリします。ステータスが Succeeded の場合、生成されたファイル ID を指定する output_file_id パラメーターがレスポンスで返されます。

サンプルリクエスト

Python

import os
from openai import OpenAI


def main():
    base_url = "http://aiservice.cn-hangzhou.aliyuncs.com/v1"
    judge_model_token = os.getenv("JUDGE_MODEL_TOKEN")

    client = OpenAI(
        api_key=f'Authorization: Bearer {judge_model_token}',
        base_url=base_url
    )
    retrieve_batches = client.batches.retrieve(
        batch_id="batch_66f245a0-88d1-458c-8e1c-a819a5943022", // batch_66f245a0-88d1-458c-8e1c-a819a5943022
    )
    print(retrieve_batches.model_dump_json(indent=4))


if __name__ == '__main__':
    main()

curl

$ curl -XGET https://aiservice.cn-hangzhou.aliyuncs.com/v1/batches/batch_66f245a0-88d1-458c-8e1c-a819a5943022 \ // https://aiservice.cn-hangzhou.aliyuncs.com/v1/batches/batch_66f245a0-88d1-458c-8e1c-a819a5943022
    -H "Authorization: Bearer ${JUDGE_MODEL_TOKEN}"

サンプルレスポンス

バッチオブジェクトの説明

{
    "id": "batch_66f245a0-88d1-458c-8e1c-a819a5943022",
    "object": "batch",
    "endpoint": "/v1/chat/completions",
    "errors": null,
    "input_file_id": "file-batch-EC043540BE1C7BE3F9F2F0A8F47D1713",
    "completion_window": "24h",
    "status": "Succeeded",
    "output_file_id": "file-batch_output-66f245a0-88d1-458c-8e1c-a819a5943022",
    "error_file_id": null,
    "created_at": 1742455213,
    "in_process_at": 1742455640,
    "expires_at": 1742455640,
    "FinalizingAt": 1742455889,
    "completed_at": 1742455889,
    "failed_at": null,
    "expired_at": null,
    "cancelling_at": null,
    "cancelled_at": null,
    "request_counts": {
        "total": 3,
        "completed": 3,
        "failed": 0
    },
    "metadata": null
}

ステップ 5: ジョブ結果を取得する

output_file_id に基づいてファイルの内容をクエリしてダウンロードします。

サンプルリクエスト

Python

import os
from openai import OpenAI


def main():
    base_url = "https://aiservice.cn-hangzhou.aliyuncs.com/v1"
    judge_model_token = os.getenv("JUDGE_MODEL_TOKEN")

    client = OpenAI(
        api_key=f'Authorization: Bearer {judge_model_token}',
        base_url=base_url
    )
    content_files = client.files.content(
        file_id="file-batch_output-66f245a0-88d1-458c-8e1c-a819a5943022", // file-batch_output-66f245a0-88d1-458c-8e1c-a819a5943022
    )
    print(content_files)


if __name__ == '__main__':
    main()

curl

$ curl -XGET https://aiservice.cn-hangzhou.aliyuncs.com/v1/files/file-batch_output-66f245a0-88d1-458c-8e1c-a819a5943022/content \ // https://aiservice.cn-hangzhou.aliyuncs.com/v1/files/file-batch_output-66f245a0-88d1-458c-8e1c-a819a5943022/content
    -H "Authorization: Bearer ${JUDGE_MODEL_TOKEN}" > output.jsonl // output.jsonl

サンプル結果

{"id":"dcee3584-6f30-9541-a855-873a6d86b7d9","custom_id":"request-1","response":{"status_code":200,"request_id":"dcee3584-6f30-9541-a855-873a6d86b7d9","body":{"created":1737446797,"usage":{"completion_tokens":7,"prompt_tokens":26,"total_tokens":33},"model":"pai-judge","id":"chatcmpl-dcee3584-6f30-9541-a855-873a6d86b7d9","choices":[{"finish_reason":"stop","index":0,"message":{"content":"2+2 equals 4."}}]},"object":"chat.completion"}},"error":null} // {"id":"dcee3584-6f30-9541-a855-873a6d86b7d9","custom_id":"request-1","response":{"status_code":200,"request_id":"dcee3584-6f30-9541-a855-873a6d86b7d9","body":{"created":1737446797,"usage":{"completion_tokens":7,"prompt_tokens":26,"total_tokens":33},"model":"pai-judge","id":"chatcmpl-dcee3584-6f30-9541-a855-873a6d86b7d9","choices":[{"finish_reason":"stop","index":0,"message":{"content":"2+2 は 4 です。"}}]},"object":"chat.completion"}},"error":null}
{"id":"dcee3584-6f30-9541-a855-873a6d86b7d9","custom_id":"request-2","response":{"status_code":200,"request_id":"dcee3584-6f30-9541-a855-873a6d86b7d9","body":{"created":1737446797,"usage":{"completion_tokens":7,"prompt_tokens":26,"total_tokens":33},"model":"pai-judge-plus","id":"chatcmpl-dcee3584-6f30-9541-a855-873a6d86b7d9","choices":[{"finish_reason":"stop","index":0,"message":{"content":"2+2 equals 4."}}]},"object":"chat.completion"}},"error":null} // {"id":"dcee3584-6f30-9541-a855-873a6d86b7d9","custom_id":"request-2","response":{"status_code":200,"request_id":"dcee3584-6f30-9541-a855-873a6d86b7d9","body":{"created":1737446797,"usage":{"completion_tokens":7,"prompt_tokens":26,"total_tokens":33},"model":"pai-judge-plus","id":"chatcmpl-dcee3584-6f30-9541-a855-873a6d86b7d9","choices":[{"finish_reason":"stop","index":0,"message":{"content":"2+2 は 4 です。"}}]},"object":"chat.completion"}},"error":null}

前提条件

サポートされている判定モデル

単一呼び出しの例 (オンライン)

単一モデル評価

デュアルモデルコンペティション

カスタム テンプレート

複数呼び出しの例 (オフライン)

ステップ 1: バッチデータを準備する

ステップ 2: バッチデータをアップロードする

ステップ 3: 複数のジョブを同時に作成する

ステップ 4: ジョブステータスを確認する

ステップ 5: ジョブ結果を取得する

カスタムテンプレート