Alibaba Cloud Service Mesh (ASM) meningkatkan protokol HTTP standar untuk mendukung permintaan Large Language Model (LLM) secara lebih baik, menyediakan cara yang sederhana dan efisien untuk mengelola akses LLM. Anda dapat menggunakan ASM untuk menerapkan Canary Access, Weighted Routing, serta berbagai kemampuan observabilitas. Topik ini menjelaskan cara mengonfigurasi dan menggunakan routing trafik LLM.
Prasyarat
-
Anda telah menambahkan kluster ke instans ASM, dan versi instans tersebut adalah v1.21.6.88 atau yang lebih baru.
-
Anda telah mengonfigurasi kebijakan injeksi sidecar.
-
Anda telah mengaktifkan Alibaba Cloud Model Studio dan memperoleh API Key yang valid. Lihat Peroleh API Key.
-
Anda telah mengaktifkan layanan Moonshot AI API dan memperoleh API Key yang valid. Lihat Moonshot AI Open Platform.
Setup
Langkah 1: Buat aplikasi uji sleep
-
Buat file bernama sleep.yaml dengan konten berikut.
-
Jalankan perintah berikut untuk membuat aplikasi sleep.
kubectl apply -f sleep.yaml
Langkah 2: Buat LLMProvider untuk Model Studio
-
Buat file bernama LLMProvider.yaml dengan konten berikut.
apiVersion: istio.alibabacloud.com/v1beta1 kind: LLMProvider metadata: name: dashscope-qwen spec: host: dashscope.aliyuncs.com path: /compatible-mode/v1/chat/completions configs: defaultConfig: openAIConfig: model: qwen1.5-72b-chat # model seri open-source Qwen apiKey: ${YOUR_DASHSCOPE_API_KEY}Untuk informasi selengkapnya tentang model open-source, lihat Text Generation-Qwen-Open-source Version.
-
Jalankan perintah berikut untuk membuat LLMProvider.
kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider.yaml -
Jalankan perintah berikut untuk menguji konfigurasi.
kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' --header 'Content-Type: application/json' --data '{ "messages": [ {"role": "user", "content": "Please introduce yourself."} ] }'Output yang diharapkan:
{"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1720680044,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-3608dcd5-e3ad-9ade-bc70-xxxxxxxxxxxxxx"}Setelah LLMProvider dibuat, Anda dapat mengirim permintaan HTTP biasa ke
dashscope.aliyuncs.comdari Pod sleep. Sidecar ASM secara otomatis mencegat permintaan tersebut, mengonversinya ke format LLM yang kompatibel dengan OpenAI, menambahkan API Key, meningkatkan koneksi ke HTTPS, lalu meneruskannya ke server penyedia LLM eksternal. Alibaba Cloud Model Studio kompatibel dengan protokol LLM OpenAI.
Skenario
Skenario 1: Arahkan pengguna ke model yang berbeda
-
Buat file bernama LLMRoute.yaml dengan konten berikut.
apiVersion: istio.alibabacloud.com/v1beta1 kind: LLMRoute metadata: name: dashscope-route spec: host: dashscope.aliyuncs.com # Nilai ini harus unik di antara penyedia. rules: - name: vip-route matches: - headers: user-type: exact: subscriber # Aturan routing untuk subscriber. Konfigurasi spesifik akan disediakan di penyedia. backendRefs: - providerHost: dashscope.aliyuncs.com - backendRefs: - providerHost: dashscope.aliyuncs.comKonfigurasi ini mengarahkan permintaan yang berisi header
user-type: subscriberke aturan routingvip-route. -
Jalankan perintah berikut untuk membuat LLMRoute.
kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMRoute.yaml -
Perbarui file LLMProvider.yaml dengan konfigurasi tingkat rute berikut:
apiVersion: istio.alibabacloud.com/v1beta1 kind: LLMProvider metadata: name: dashscope-qwen spec: host: dashscope.aliyuncs.com path: /compatible-mode/v1/chat/completions configs: defaultConfig: openAIConfig: model: qwen1.5-72b-chat # Gunakan model open-source secara default. apiKey: ${YOUR_DASHSCOPE_API_KEY} routeSpecificConfigs: vip-route: # Konfigurasi spesifik untuk subscriber. openAIConfig: model: qwen-turbo # Subscriber menggunakan model qwen-turbo. apiKey: ${YOUR_DASHSCOPE_API_KEY}Jalankan perintah berikut untuk menerapkan pembaruan pada LLMProvider.
kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider.yaml -
Jalankan perintah berikut untuk menguji routing.
kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' --header 'Content-Type: application/json' --data '{ "messages": [ {"role": "user", "content": "Please introduce yourself."} ] }'kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' --header 'Content-Type: application/json' --header 'user-type: subscriber' --data '{ "messages": [ {"role": "user", "content": "Please introduce yourself."} ] }'Output yang diharapkan:
{"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1720680044,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-1c33b950-3220-9bfe-9066-xxxxxxxxxxxx"}{"choices":[{"message":{"role":"assistant","content":"Hello, I'm Qwen, a large language model from Alibaba Cloud. As an AI assistant, my goal is to help users get accurate and useful information, and to solve their problems and confusions. I can provide knowledge in various fields, engage in conversation, and even create text. Please note that all the content I provide is based on the data I was trained on and may not include the latest events or personal information. If you have any questions, feel free to ask me at any time!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":11,"completion_tokens":85,"total_tokens":96},"created":1720683416,"system_fingerprint":null,"model":"qwen-turbo","id":"chatcmpl-9cbc7c56-06e9-9639-a50d-xxxxxxxxxxxx"}Output menunjukkan bahwa permintaan dari subscriber diarahkan ke model
qwen-turbo.
Skenario 2: Routing berbasis bobot antarpenyedia
-
Buat file bernama LLMProvider-moonshot.yaml dengan konten berikut.
apiVersion: istio.alibabacloud.com/v1beta1 kind: LLMProvider metadata: name: moonshot spec: host: api.moonshot.cn # Nilai ini harus unik di antara penyedia. path: /v1/chat/completions configs: defaultConfig: openAIConfig: model: moonshot-v1-8k stream: false apiKey: ${YOUR_MOONSHOT_API_KEY} -
Jalankan perintah berikut untuk membuat LLMProvider untuk Moonshot.
kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider-moonshot.yaml -
Buat file bernama demo-llm-server.yaml dengan konten berikut.
apiVersion: v1 kind: Service metadata: name: demo-llm-server namespace: default spec: ports: - name: http port: 80 protocol: TCP targetPort: 80 selector: app: none type: ClusterIP -
Jalankan perintah berikut untuk membuat layanan demo-llm-server.
kubectl apply -f demo-llm-server.yaml -
Perbarui file LLMRoute.yaml dengan konten berikut.
apiVersion: istio.alibabacloud.com/v1beta1 kind: LLMRoute metadata: name: demo-llm-server namespace: default spec: host: demo-llm-server rules: - backendRefs: - providerHost: dashscope.aliyuncs.com weight: 50 - providerHost: api.moonshot.cn weight: 50 name: migrate-rule -
Jalankan perintah berikut untuk memperbarui aturan routing LLMRoute.
kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMRoute.yaml -
Jalankan perintah berikut beberapa kali.
kubectl exec deployment/sleep -it -- curl --location 'http://demo-llm-server' --header 'Content-Type: application/json' --data '{ "messages": [ {"role": "user", "content": "Please introduce yourself."} ] }'Output yang diharapkan:
{"id":"cmpl-cafd47b181204cdbb4a4xxxxxxxxxxxx","object":"chat.completion","created":1720687132,"model":"moonshot-v1-8k","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! I am an AI language model named Kimi. My main function is to help people generate human-like text. I can write articles, answer questions, provide advice, and more. I am trained on a massive amount of text data, so I can generate a wide variety of text. My goal is to help people communicate more effectively and solve problems."},"finish_reason":"stop"}],"usage":{"prompt_tokens":11,"completion_tokens":59,"total_tokens":70}} {"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1720687164,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-2443772b-4e41-9ea8-9bed-xxxxxxxxxxxx"}Output menunjukkan bahwa permintaan didistribusikan secara merata antara Moonshot dan Alibaba Cloud Model Studio.