Routing Trafik: Kelola trafik LLM dengan ASM - Alibaba Cloud Service Mesh

Alibaba Cloud Service Mesh (ASM) meningkatkan protokol HTTP standar untuk mendukung permintaan Large Language Model (LLM) secara lebih baik, menyediakan cara yang sederhana dan efisien untuk mengelola akses LLM. Anda dapat menggunakan ASM untuk menerapkan Canary Access, Weighted Routing, serta berbagai kemampuan observabilitas. Topik ini menjelaskan cara mengonfigurasi dan menggunakan routing trafik LLM.

Prasyarat

Anda telah menambahkan kluster ke instans ASM, dan versi instans tersebut adalah v1.21.6.88 atau yang lebih baru.
Anda telah mengonfigurasi kebijakan injeksi sidecar.
Anda telah mengaktifkan Alibaba Cloud Model Studio dan memperoleh API Key yang valid. Lihat Peroleh API Key.
Anda telah mengaktifkan layanan Moonshot AI API dan memperoleh API Key yang valid. Lihat Moonshot AI Open Platform.

Setup

Langkah 1: Buat aplikasi uji sleep

Buat file bernama sleep.yaml dengan konten berikut.

Konten YAML

apiVersion: v1
kind: ServiceAccount
metadata:
  name: sleep
---
apiVersion: v1
kind: Service
metadata:
  name: sleep
  labels:
    app: sleep
    service: sleep
spec:
  ports:
  - port: 80
    name: http
  selector:
    app: sleep
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: sleep
spec:
  replicas: 1
  selector:
    matchLabels:
      app: sleep
  template:
    metadata:
      labels:
        app: sleep
    spec:
      terminationGracePeriodSeconds: 0
      serviceAccountName: sleep
      containers:
      - name: sleep
        image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/curl:asm-sleep
        command: ["/bin/sleep", "infinity"]
        imagePullPolicy: IfNotPresent
        volumeMounts:
        - mountPath: /etc/sleep/tls
          name: secret-volume
      volumes:
      - name: secret-volume
        secret:
          secretName: sleep-secret
          optional: true
---

Jalankan perintah berikut untuk membuat aplikasi sleep.
```
kubectl apply -f sleep.yaml
```

Langkah 2: Buat LLMProvider untuk Model Studio

Buat file bernama LLMProvider.yaml dengan konten berikut.

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMProvider
metadata:  
  name: dashscope-qwen
spec:
  host: dashscope.aliyuncs.com
  path: /compatible-mode/v1/chat/completions
  configs:
    defaultConfig:
      openAIConfig:
        model: qwen1.5-72b-chat  # model seri open-source Qwen
        apiKey: ${YOUR_DASHSCOPE_API_KEY}

Untuk informasi selengkapnya tentang model open-source, lihat Text Generation-Qwen-Open-source Version.

Jalankan perintah berikut untuk membuat LLMProvider.

kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider.yaml

Jalankan perintah berikut untuk menguji konfigurasi.

kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' --header 'Content-Type: application/json' --data '{
    "messages": [
        {"role": "user", "content": "Please introduce yourself."}
    ]
}'

Output yang diharapkan:

{"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1720680044,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-3608dcd5-e3ad-9ade-bc70-xxxxxxxxxxxxxx"}

Setelah LLMProvider dibuat, Anda dapat mengirim permintaan HTTP biasa ke dashscope.aliyuncs.com dari Pod sleep. Sidecar ASM secara otomatis mencegat permintaan tersebut, mengonversinya ke format LLM yang kompatibel dengan OpenAI, menambahkan API Key, meningkatkan koneksi ke HTTPS, lalu meneruskannya ke server penyedia LLM eksternal. Alibaba Cloud Model Studio kompatibel dengan protokol LLM OpenAI.

Skenario

Skenario 1: Arahkan pengguna ke model yang berbeda

Buat file bernama LLMRoute.yaml dengan konten berikut.

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMRoute
metadata:  
  name: dashscope-route
spec:
  host: dashscope.aliyuncs.com # Nilai ini harus unik di antara penyedia.
  rules:
  - name: vip-route
    matches:
    - headers:
        user-type:
          exact: subscriber  # Aturan routing untuk subscriber. Konfigurasi spesifik akan disediakan di penyedia.
    backendRefs:
    - providerHost: dashscope.aliyuncs.com
  - backendRefs:
    - providerHost: dashscope.aliyuncs.com

Konfigurasi ini mengarahkan permintaan yang berisi header user-type: subscriber ke aturan routing vip-route.

Jalankan perintah berikut untuk membuat LLMRoute.

kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMRoute.yaml

Perbarui file LLMProvider.yaml dengan konfigurasi tingkat rute berikut:

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMProvider
metadata:  
  name: dashscope-qwen
spec:
  host: dashscope.aliyuncs.com
  path: /compatible-mode/v1/chat/completions
  configs:
    defaultConfig:
      openAIConfig:
        model: qwen1.5-72b-chat  # Gunakan model open-source secara default.
        apiKey: ${YOUR_DASHSCOPE_API_KEY}
    routeSpecificConfigs:
      vip-route:  # Konfigurasi spesifik untuk subscriber.
        openAIConfig:
          model: qwen-turbo  # Subscriber menggunakan model qwen-turbo.
          apiKey: ${YOUR_DASHSCOPE_API_KEY}

Jalankan perintah berikut untuk menerapkan pembaruan pada LLMProvider.

kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider.yaml

Jalankan perintah berikut untuk menguji routing.

kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' --header 'Content-Type: application/json' --data '{
    "messages": [
        {"role": "user", "content": "Please introduce yourself."}
    ]
}'

kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' --header 'Content-Type: application/json' --header 'user-type: subscriber' --data '{
    "messages": [
        {"role": "user", "content": "Please introduce yourself."}
    ]
}'

Output yang diharapkan:

{"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1720680044,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-1c33b950-3220-9bfe-9066-xxxxxxxxxxxx"}

{"choices":[{"message":{"role":"assistant","content":"Hello, I'm Qwen, a large language model from Alibaba Cloud. As an AI assistant, my goal is to help users get accurate and useful information, and to solve their problems and confusions. I can provide knowledge in various fields, engage in conversation, and even create text. Please note that all the content I provide is based on the data I was trained on and may not include the latest events or personal information. If you have any questions, feel free to ask me at any time!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":11,"completion_tokens":85,"total_tokens":96},"created":1720683416,"system_fingerprint":null,"model":"qwen-turbo","id":"chatcmpl-9cbc7c56-06e9-9639-a50d-xxxxxxxxxxxx"}

Output menunjukkan bahwa permintaan dari subscriber diarahkan ke model qwen-turbo.

Skenario 2: Routing berbasis bobot antarpenyedia

Buat file bernama LLMProvider-moonshot.yaml dengan konten berikut.

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMProvider
metadata:  
  name: moonshot
spec:
  host: api.moonshot.cn # Nilai ini harus unik di antara penyedia.
  path: /v1/chat/completions
  configs:
    defaultConfig:
      openAIConfig:
        model: moonshot-v1-8k
        stream: false
        apiKey: ${YOUR_MOONSHOT_API_KEY}

Jalankan perintah berikut untuk membuat LLMProvider untuk Moonshot.

kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider-moonshot.yaml

Buat file bernama demo-llm-server.yaml dengan konten berikut.

apiVersion: v1
kind: Service
metadata:
  name: demo-llm-server
  namespace: default
spec:
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: none
  type: ClusterIP

Jalankan perintah berikut untuk membuat layanan demo-llm-server.
```
kubectl apply -f demo-llm-server.yaml
```

Perbarui file LLMRoute.yaml dengan konten berikut.

apiVersion: istio.alibabacloud.com/v1beta1
kind: LLMRoute
metadata:
  name: demo-llm-server
  namespace: default
spec:
  host: demo-llm-server
  rules:
  - backendRefs:
    - providerHost: dashscope.aliyuncs.com
      weight: 50
    - providerHost: api.moonshot.cn
      weight: 50
    name: migrate-rule

Jalankan perintah berikut untuk memperbarui aturan routing LLMRoute.

kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMRoute.yaml

Jalankan perintah berikut beberapa kali.

kubectl exec deployment/sleep -it -- curl --location 'http://demo-llm-server' --header 'Content-Type: application/json' --data '{
    "messages": [
        {"role": "user", "content": "Please introduce yourself."}
    ]
}'

Output yang diharapkan:

{"id":"cmpl-cafd47b181204cdbb4a4xxxxxxxxxxxx","object":"chat.completion","created":1720687132,"model":"moonshot-v1-8k","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! I am an AI language model named Kimi. My main function is to help people generate human-like text. I can write articles, answer questions, provide advice, and more. I am trained on a massive amount of text data, so I can generate a wide variety of text. My goal is to help people communicate more effectively and solve problems."},"finish_reason":"stop"}],"usage":{"prompt_tokens":11,"completion_tokens":59,"total_tokens":70}}

{"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1720687164,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-2443772b-4e41-9ea8-9bed-xxxxxxxxxxxx"}

Output menunjukkan bahwa permintaan didistribusikan secara merata antara Moonshot dan Alibaba Cloud Model Studio.