トラフィックルーティング：ASM を使用した LLM トラフィックの管理 - Alibaba Cloud Service Mesh

Alibaba Cloud Service Mesh (ASM) は、標準の HTTP プロトコルを拡張して、大規模言語モデル (LLM) のリクエストをより適切にサポートし、LLM アクセスを管理するためのシンプルで効率的な方法を提供します。ASM を使用して、カナリアアクセス、重み付けルーティング、およびさまざまな可観測性機能を実装できます。このトピックでは、LLM トラフィックルーティングの設定方法と使用方法について説明します。

前提条件

ASM インスタンスにクラスターを追加済みであること。インスタンスのバージョンは v1.21.6.88 以降である必要があります。
サイドカーインジェクションポリシーを設定済みであること。
Alibaba Cloud Model Studio を有効化し、有効な API キーを取得済みであること。詳細については、「API キーの取得」をご参照ください。
Moonshot AI API サービスを有効化し、有効な API キーを取得済みであること。詳細については、「Moonshot AI オープンプラットフォーム」をご参照ください。

セットアップ

ステップ 1：sleep テストアプリケーションの作成

次の内容で sleep.yaml という名前のファイルを作成します。

YAML コンテンツ

apiVersion: v1
kind: ServiceAccount
metadata:
  name: sleep
---
apiVersion: v1
kind: Service
metadata:
  name: sleep
  labels:
    app: sleep
    service: sleep
spec:
  ports:
  - port: 80
    name: http
  selector:
    app: sleep
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sleep
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sleep
  template:
    metadata:
      labels:
        app: sleep
    spec:
      terminationGracePeriodSeconds: 0
      serviceAccountName: sleep
      containers:
      - name: sleep
        image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/curl:asm-sleep
        command: ["/bin/sleep", "infinity"]
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - mountPath: /etc/sleep/tls
          name: secret-volume
      volumes:
      - name: secret-volume
        secret:
          secretName: sleep-secret
          optional: true
---

次のコマンドを実行して、sleep アプリケーションを作成します。
```
kubectl apply -f sleep.yaml
```

ステップ 2：Model Studio 用の LLMProvider の作成

次の内容で LLMProvider.yaml という名前のファイルを作成します。

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMProvider
metadata:  
  name: dashscope-qwen
spec:
  host: dashscope.aliyuncs.com
  path: /compatible-mode/v1/chat/completions
  configs:
    defaultConfig:
      openAIConfig:
        model: qwen1.5-72b-chat  # Qwen オープンソースシリーズモデル
        apiKey: ${YOUR_DASHSCOPE_API_KEY}

オープンソースモデルの詳細については、「テキスト生成 - Qwen - オープンソース版」をご参照ください。

次のコマンドを実行して、LLMProvider を作成します。

kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider.yaml

次のコマンドを実行して、設定をテストします。

kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' --header 'Content-Type: application/json' --data '{
    "messages": [
        {"role": "user", "content": "自己紹介をしてください。"}
    ]
}'

期待される出力：

{"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1720680044,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-3608dcd5-e3ad-9ade-bc70-xxxxxxxxxxxxxx"}

LLMProvider が作成されると、sleep Pod から dashscope.aliyuncs.com にプレーンな HTTP リクエストを送信できます。ASM サイドカーはリクエストを自動的にインターセプトし、OpenAI 互換の LLM フォーマットに変換し、API キーを追加し、接続を HTTPS にアップグレードして、外部 LLM プロバイダーのサーバーに転送します。Alibaba Cloud Model Studio は OpenAI LLM プロトコルと互換性があります。

シナリオ

シナリオ 1：ユーザーを異なるモデルにルーティング

次の内容で LLMRoute.yaml という名前のファイルを作成します。

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMRoute
metadata:  
  name: dashscope-route
spec:
  host: dashscope.aliyuncs.com # これはプロバイダー間で一意である必要があります。
  rules:
  - name: vip-route
    matches:
    - headers:
        user-type:
          exact: subscriber  # サブスクライバー向けのルーティングルール。プロバイダーで特定の構成が提供されます。
    backendRefs:
    - providerHost: dashscope.aliyuncs.com
  - backendRefs:
    - providerHost: dashscope.aliyuncs.com

この設定では、user-type: subscriber ヘッダーを含むリクエストが vip-route ルーティングルールにルーティングされます。

次のコマンドを実行して、LLMRoute を作成します。

kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMRoute.yaml

LLMProvider.yaml ファイルを次のルートレベルの構成で更新します。

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMProvider
metadata:  
  name: dashscope-qwen
spec:
  host: dashscope.aliyuncs.com
  path: /compatible-mode/v1/chat/completions
  configs:
    defaultConfig:
      openAIConfig:
        model: qwen1.5-72b-chat  # デフォルトではオープンソースモデルを使用します。
        apiKey: ${YOUR_DASHSCOPE_API_KEY}
    routeSpecificConfigs:
      vip-route:  # サブスクライバー向けの特定の構成。
        openAIConfig:
          model: qwen-turbo  # サブスクライバーは qwen-turbo モデルを使用します。
          apiKey: ${YOUR_DASHSCOPE_API_KEY}

次のコマンドを実行して、LLMProvider に更新を適用します。

kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider.yaml

次のコマンドを実行して、ルーティングをテストします。

kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' --header 'Content-Type: application/json' --data '{
    "messages": [
        {"role": "user", "content": "自己紹介をしてください。"}
    ]
}'

kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' --header 'Content-Type: application/json' --header 'user-type: subscriber' --data '{
    "messages": [
        {"role": "user", "content": "自己紹介をしてください。"}
    ]
}'

期待される出力：

{"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1720680044,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-1c33b950-3220-9bfe-9066-xxxxxxxxxxxx"}

{"choices":[{"message":{"role":"assistant","content":"Hello, I'm Qwen, a large language model from Alibaba Cloud. As an AI assistant, my goal is to help users get accurate and useful information, and to solve their problems and confusions. I can provide knowledge in various fields, engage in conversation, and even create text. Please note that all the content I provide is based on the data I was trained on and may not include the latest events or personal information. If you have any questions, feel free to ask me at any time!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":11,"completion_tokens":85,"total_tokens":96},"created":1720683416,"system_fingerprint":null,"model":"qwen-turbo","id":"chatcmpl-9cbc7c56-06e9-9639-a50d-xxxxxxxxxxxx"}

出力は、サブスクライバーからのリクエストが qwen-turbo モデルにルーティングされたことを示しています。

シナリオ 2：プロバイダー間の重み付けルーティング

次の内容で LLMProvider-moonshot.yaml という名前のファイルを作成します。

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMProvider
metadata:  
  name: moonshot
spec:
  host: api.moonshot.cn # これはプロバイダー間で一意である必要があります。
  path: /v1/chat/completions
  configs:
    defaultConfig:
      openAIConfig:
        model: moonshot-v1-8k
        stream: false
        apiKey: ${YOUR_MOONSHOT_API_KEY}

次のコマンドを実行して、Moonshot 用の LLMProvider を作成します。
```
kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider-moonshot.yaml
```

次の内容で demo-llm-server.yaml という名前のファイルを作成します。

apiVersion: v1
kind: Service
metadata:
  name: demo-llm-server
  namespace: default
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: none
  type: ClusterIP

次のコマンドを実行して、demo-llm-server サービスを作成します。
```
kubectl apply -f demo-llm-server.yaml
```

LLMRoute.yaml ファイルを次の内容で更新します。

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMRoute
metadata:
  name: demo-llm-server
  namespace: default
spec:
  host: demo-llm-server
  rules:
  - backendRefs:
    - providerHost: dashscope.aliyuncs.com
      weight: 50
    - providerHost: api.moonshot.cn
      weight: 50
    name: migrate-rule

次のコマンドを実行して、LLMRoute ルーティングルールを更新します。
```
kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMRoute.yaml
```

次のコマンドを複数回実行します。

kubectl exec deployment/sleep -it -- curl --location 'http://demo-llm-server' --header 'Content-Type: application/json' --data '{
    "messages": [
        {"role": "user", "content": "自己紹介をしてください。"}
    ]
}'

期待される出力：

{"id":"cmpl-cafd47b181204cdbb4a4xxxxxxxxxxxx","object":"chat.completion","created":1720687132,"model":"moonshot-v1-8k","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! I am an AI language model named Kimi. My main function is to help people generate human-like text. I can write articles, answer questions, provide advice, and more. I am trained on a massive amount of text data, so I can generate a wide variety of text. My goal is to help people communicate more effectively and solve problems."},"finish_reason":"stop"}],"usage":{"prompt_tokens":11,"completion_tokens":59,"total_tokens":70}}

{"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1720687164,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-2443772b-4e41-9ea8-9bed-xxxxxxxxxxxx"}

出力は、リクエストが Moonshot と Alibaba Cloud Model Studio の間で均等に分散されていることを示しています。