Berdasarkan HTTP, Alibaba Cloud Service Mesh (ASM) menawarkan dukungan yang ditingkatkan untuk protokol permintaan LLM. Kini mendukung standar protokol penyedia LLM umum, memberikan pengguna pengalaman integrasi yang sederhana dan efisien. Dengan ASM, pengguna dapat menerapkan akses canary, pengalihan proporsional, serta berbagai fitur observabilitas untuk LLM. Topik ini menjelaskan cara mengelola trafik LLM di ASM dari perspektif pengalihan trafik.
Prasyarat
Tambahkan kluster ke instance ASM versi 1.21.6.88 atau lebih baru.
Alibaba Cloud Model Studio telah diaktifkan dan API_KEY yang tersedia telah diperoleh. Untuk informasi lebih lanjut, lihat Memperoleh API Key.
Layanan API Moonshot telah diaktifkan dan API_KEY yang tersedia telah diperoleh. Untuk informasi lebih lanjut, lihat Moonshot AI Open Platform.
Persiapan
Langkah 1: Buat aplikasi uji bernama sleep
Buat file bernama sleep.yaml dengan konten berikut.
Jalankan perintah berikut untuk membuat aplikasi bernama sleep.
kubectl apply -f sleep.yaml
Langkah 2: Buat LLMProvider untuk Alibaba Cloud Model Studio
Buat file bernama LLMProvider.yaml dengan konten berikut.
apiVersion: istio.alibabacloud.com/v1beta1 kind: LLMProvider metadata: name: dashscope-qwen spec: host: dashscope.aliyuncs.com path: /compatible-mode/v1/chat/completions configs: defaultConfig: openAIConfig: model: qwen1.5-72b-chat # Qwen open-source LLM apiKey: ${dashscope API_KEY}Untuk model open-source lainnya, lihat dan Text Generation-Qwen-Open-source Version.
Jalankan perintah berikut untuk membuat LLMProvider.
kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider.yamlJalankan perintah berikut untuk menguji LLMProvider.
kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \ --header 'Content-Type: application/json' \ --data '{ "messages": [ {"role": "user", "content": "Please introduce yourself."} ] }'Output yang diharapkan:
{"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1735021898,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-3608dcd5-e3ad-9ade-bc70-xxxxxxxxxxxxxx"}%Setelah LLMProvider dibuat, ia dapat langsung mengakses
dashscope.aliyuncs.comdalam pod tempat aplikasi sleep berada melalui HTTP. Sidecar ASM secara otomatis mengonversi permintaan menjadi format yang sesuai dengan protokol LLM OpenAI (Alibaba Cloud Model Studio kompatibel dengan protokol LLM OpenAI), menambahkan API Key ke permintaan, memperbarui protokol HTTP menjadi HTTPS, dan akhirnya mengirimkan permintaan ke server penyedia LLM di luar kluster.
Demonstrasikan prosedur dalam skenario berbeda
Skenario 1: Buat LLMRoute untuk menerapkan model berbeda untuk jenis pengguna yang berbeda
Buat file bernama LLMRoute.yaml dengan konten berikut.
apiVersion: istio.alibabacloud.com/v1beta1 kind: LLMRoute metadata: name: dashscope-route spec: host: dashscope.aliyuncs.com # Penyedia berbeda tidak dapat memiliki host yang sama. rules: - name: vip-route matches: - headers: user-type: exact: subscriber # Ini adalah rute khusus untuk pengguna berlangganan, yang akan disediakan konfigurasi khusus di penyedia nanti. backendRefs: - providerHost: dashscope.aliyuncs.com - backendRefs: - providerHost: dashscope.aliyuncs.comKonfigurasi ini memungkinkan permintaan yang membawa
user-type:subscribermengikuti aturan pengalihan vip-route.Buat LLMRoute dengan perintah berikut.
kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMRoute.yamlPerbarui file LLMProvider.yaml dengan konten berikut dan tambahkan konfigurasi tingkat rute.
apiVersion: istio.alibabacloud.com/v1beta1 kind: LLMProvider metadata: name: dashscope-qwen spec: host: dashscope.aliyuncs.com path: /compatible-mode/v1/chat/completions configs: defaultConfig: openAIConfig: model: qwen1.5-72b-chat # Model open-source digunakan secara default. apiKey: ${dashscope API_KEY} routeSpecificConfigs: vip-route: # Rute khusus untuk pengguna berlangganan. openAIConfig: model: qwen-turbo # Model qwen-turbo untuk pengguna berlangganan. apiKey: ${dashscope API_KEY}Jalankan perintah berikut untuk memodifikasi LLMProvider.
kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider.yamlJalankan pengujian menggunakan perintah berikut:
kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \ --header 'Content-Type: application/json' \ --data '{ "messages": [ {"role": "user", "content": "Please introduce yourself."} ] }'kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \ --header 'Content-Type: application/json' \ --header 'user-type: subscriber' \ --data '{ "messages": [ {"role": "user", "content": "Please introduce yourself."} ] }'Output yang diharapkan:
{"choices":[{"message":{"role":"assistant","content":"I am a pre-trained language model developed by Alibaba Cloud. I am Qwen. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1735021898,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-3608dcd5-e3ad-9ade-bc70-06aed84b6715"}%{"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1735021898,"system_fingerprint":null,"model":"qwen-turbo","id":"chatcmpl-3608dcd5-e3ad-9ade-bc70-06aed84b6715"}%Output menunjukkan bahwa model qwen-turbo digunakan untuk pengguna berlangganan.
Skenario 2: Konfigurasikan LLMProvider dan LLMRoute untuk mendistribusikan trafik secara proporsional
Buat file bernama LLMProvider-moonshot.yaml dengan konten berikut.
apiVersion: istio.alibabacloud.com/v1beta1 kind: LLMProvider metadata: name: moonshot spec: host: api.moonshot.cn # Penyedia berbeda tidak dapat memiliki host yang sama. path: /v1/chat/completions configs: defaultConfig: openAIConfig: model: moonshot-v1-8k stream: false apiKey: ${Moonshot API_KEY}Jalankan perintah berikut untuk membuat LLMProvider untuk Moonshot.
kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider-moonshot.yamlBuat file bernama demo-llm-server.yaml dengan konten berikut.
apiVersion: v1 kind: Service metadata: name: demo-llm-server namespace: default spec: ports: - name: http port: 80 protocol: TCP targetPort: 80 selector: app: none type: ClusterIPJalankan perintah berikut untuk membuat layanan demo-llm-server.
kubectl apply -f demo-llm-server.yamlPerbarui file LLMRoute.yaml dengan konten berikut.
apiVersion: istio.alibabacloud.com/v1beta1 kind: LLMRoute metadata: name: demo-llm-server namespace: default spec: host: demo-llm-server rules: - backendRefs: - providerHost: dashscope.aliyuncs.com weight: 50 - providerHost: api.moonshot.cn weight: 50 name: migrate-ruleJalankan perintah berikut untuk memperbarui aturan pengalihan LLMRoute.
kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMRoute.yamlJalankan pengujian berikut beberapa kali.
kubectl exec deployment/sleep -it -- curl --location 'http://demo-llm-server' \ --header 'Content-Type: application/json' \ --data '{ "messages": [ {"role": "user", "content": "Please introduce yourself"} ] }'Output yang diharapkan:
{"id":"chatcmpl-676a6599045dxxxxxxxxxxxx","object":"chat.completion","created":1735026073,"model":"moonshot-v1-8k","choices":[{"index":0,"message":{"role":"assistant","content":"Hello there! I'm Kimi, your AI assistant crafted by the innovative minds at Moonshot AI. I'm here to lend a digital hand with your queries, providing safe, helpful, and accurate responses. Whether it's a dash of information or a deep dive into data, I'm your go-to for a chat. Let's make today awesome! How can I assist you?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":78,"completion_tokens":78,"total_tokens":156}} {"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1735021898,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-3608dcd5-e3ad-9ade-bc70-06aed84b6715"}%Output menunjukkan bahwa sekitar 50% permintaan dikirim ke Moonshot, dan 50% ke Alibaba Cloud Model Studio.