全部产品
Search
文档中心

Alibaba Cloud Service Mesh:Routing Trafik: Kelola trafik LLM dengan ASM

更新时间:Feb 28, 2026

Alibaba Cloud Service Mesh (ASM) meningkatkan protokol HTTP standar untuk mendukung permintaan Large Language Model (LLM) secara lebih baik, menyediakan cara yang sederhana dan efisien untuk mengelola akses LLM. Anda dapat menggunakan ASM untuk menerapkan Canary Access, Weighted Routing, serta berbagai kemampuan observabilitas. Topik ini menjelaskan cara mengonfigurasi dan menggunakan routing trafik LLM.

Prasyarat

Setup

Langkah 1: Buat aplikasi uji sleep

  1. Buat file bernama sleep.yaml dengan konten berikut.

    Konten YAML

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: sleep
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: sleep
      labels:
        app: sleep
        service: sleep
    spec:
      ports:
      - port: 80
        name: http
      selector:
        app: sleep
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: sleep
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: sleep
      template:
        metadata:
          labels:
            app: sleep
        spec:
          terminationGracePeriodSeconds: 0
          serviceAccountName: sleep
          containers:
          - name: sleep
            image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/curl:asm-sleep
            command: ["/bin/sleep", "infinity"]
            imagePullPolicy: IfNotPresent
            volumeMounts:
            - mountPath: /etc/sleep/tls
              name: secret-volume
          volumes:
          - name: secret-volume
            secret:
              secretName: sleep-secret
              optional: true
    ---
  2. Jalankan perintah berikut untuk membuat aplikasi sleep.

    kubectl apply -f sleep.yaml

Langkah 2: Buat LLMProvider untuk Model Studio

  1. Buat file bernama LLMProvider.yaml dengan konten berikut.

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: LLMProvider
    metadata:  
      name: dashscope-qwen
    spec:
      host: dashscope.aliyuncs.com
      path: /compatible-mode/v1/chat/completions
      configs:
        defaultConfig:
          openAIConfig:
            model: qwen1.5-72b-chat  # model seri open-source Qwen
            apiKey: ${YOUR_DASHSCOPE_API_KEY}

    Untuk informasi selengkapnya tentang model open-source, lihat Text Generation-Qwen-Open-source Version.

  2. Jalankan perintah berikut untuk membuat LLMProvider.

    kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider.yaml
  3. Jalankan perintah berikut untuk menguji konfigurasi.

    kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' --header 'Content-Type: application/json' --data '{
        "messages": [
            {"role": "user", "content": "Please introduce yourself."}
        ]
    }'

    Output yang diharapkan:

    {"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1720680044,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-3608dcd5-e3ad-9ade-bc70-xxxxxxxxxxxxxx"}

    Setelah LLMProvider dibuat, Anda dapat mengirim permintaan HTTP biasa ke dashscope.aliyuncs.com dari Pod sleep. Sidecar ASM secara otomatis mencegat permintaan tersebut, mengonversinya ke format LLM yang kompatibel dengan OpenAI, menambahkan API Key, meningkatkan koneksi ke HTTPS, lalu meneruskannya ke server penyedia LLM eksternal. Alibaba Cloud Model Studio kompatibel dengan protokol LLM OpenAI.

Skenario

Skenario 1: Arahkan pengguna ke model yang berbeda

  1. Buat file bernama LLMRoute.yaml dengan konten berikut.

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: LLMRoute
    metadata:  
      name: dashscope-route
    spec:
      host: dashscope.aliyuncs.com # Nilai ini harus unik di antara penyedia.
      rules:
      - name: vip-route
        matches:
        - headers:
            user-type:
              exact: subscriber  # Aturan routing untuk subscriber. Konfigurasi spesifik akan disediakan di penyedia.
        backendRefs:
        - providerHost: dashscope.aliyuncs.com
      - backendRefs:
        - providerHost: dashscope.aliyuncs.com

    Konfigurasi ini mengarahkan permintaan yang berisi header user-type: subscriber ke aturan routing vip-route.

  2. Jalankan perintah berikut untuk membuat LLMRoute.

    kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMRoute.yaml
  3. Perbarui file LLMProvider.yaml dengan konfigurasi tingkat rute berikut:

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: LLMProvider
    metadata:  
      name: dashscope-qwen
    spec:
      host: dashscope.aliyuncs.com
      path: /compatible-mode/v1/chat/completions
      configs:
        defaultConfig:
          openAIConfig:
            model: qwen1.5-72b-chat  # Gunakan model open-source secara default.
            apiKey: ${YOUR_DASHSCOPE_API_KEY}
        routeSpecificConfigs:
          vip-route:  # Konfigurasi spesifik untuk subscriber.
            openAIConfig:
              model: qwen-turbo  # Subscriber menggunakan model qwen-turbo.
              apiKey: ${YOUR_DASHSCOPE_API_KEY}

    Jalankan perintah berikut untuk menerapkan pembaruan pada LLMProvider.

    kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider.yaml
  4. Jalankan perintah berikut untuk menguji routing.

    kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' --header 'Content-Type: application/json' --data '{
        "messages": [
            {"role": "user", "content": "Please introduce yourself."}
        ]
    }'
    kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' --header 'Content-Type: application/json' --header 'user-type: subscriber' --data '{
        "messages": [
            {"role": "user", "content": "Please introduce yourself."}
        ]
    }'

    Output yang diharapkan:

    {"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1720680044,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-1c33b950-3220-9bfe-9066-xxxxxxxxxxxx"}
    {"choices":[{"message":{"role":"assistant","content":"Hello, I'm Qwen, a large language model from Alibaba Cloud. As an AI assistant, my goal is to help users get accurate and useful information, and to solve their problems and confusions. I can provide knowledge in various fields, engage in conversation, and even create text. Please note that all the content I provide is based on the data I was trained on and may not include the latest events or personal information. If you have any questions, feel free to ask me at any time!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":11,"completion_tokens":85,"total_tokens":96},"created":1720683416,"system_fingerprint":null,"model":"qwen-turbo","id":"chatcmpl-9cbc7c56-06e9-9639-a50d-xxxxxxxxxxxx"}

    Output menunjukkan bahwa permintaan dari subscriber diarahkan ke model qwen-turbo.

Skenario 2: Routing berbasis bobot antarpenyedia

  1. Buat file bernama LLMProvider-moonshot.yaml dengan konten berikut.

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: LLMProvider
    metadata:  
      name: moonshot
    spec:
      host: api.moonshot.cn # Nilai ini harus unik di antara penyedia.
      path: /v1/chat/completions
      configs:
        defaultConfig:
          openAIConfig:
            model: moonshot-v1-8k
            stream: false
            apiKey: ${YOUR_MOONSHOT_API_KEY}
  2. Jalankan perintah berikut untuk membuat LLMProvider untuk Moonshot.

    kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider-moonshot.yaml
  3. Buat file bernama demo-llm-server.yaml dengan konten berikut.

    apiVersion: v1
    kind: Service
    metadata:
      name: demo-llm-server
      namespace: default
    spec:
      ports:
      - name: http
        port: 80
        protocol: TCP
        targetPort: 80
      selector:
        app: none
      type: ClusterIP
  4. Jalankan perintah berikut untuk membuat layanan demo-llm-server.

    kubectl apply -f demo-llm-server.yaml
  5. Perbarui file LLMRoute.yaml dengan konten berikut.

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: LLMRoute
    metadata:
      name: demo-llm-server
      namespace: default
    spec:
      host: demo-llm-server
      rules:
      - backendRefs:
        - providerHost: dashscope.aliyuncs.com
          weight: 50
        - providerHost: api.moonshot.cn
          weight: 50
        name: migrate-rule
  6. Jalankan perintah berikut untuk memperbarui aturan routing LLMRoute.

    kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMRoute.yaml
  7. Jalankan perintah berikut beberapa kali.

    kubectl exec deployment/sleep -it -- curl --location 'http://demo-llm-server' --header 'Content-Type: application/json' --data '{
        "messages": [
            {"role": "user", "content": "Please introduce yourself."}
        ]
    }' 

    Output yang diharapkan:

    {"id":"cmpl-cafd47b181204cdbb4a4xxxxxxxxxxxx","object":"chat.completion","created":1720687132,"model":"moonshot-v1-8k","choices":[{"index":0,"message":{"role":"assistant","content":"Hello! I am an AI language model named Kimi. My main function is to help people generate human-like text. I can write articles, answer questions, provide advice, and more. I am trained on a massive amount of text data, so I can generate a wide variety of text. My goal is to help people communicate more effectively and solve problems."},"finish_reason":"stop"}],"usage":{"prompt_tokens":11,"completion_tokens":59,"total_tokens":70}}
    
    {"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1720687164,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-2443772b-4e41-9ea8-9bed-xxxxxxxxxxxx"}

    Output menunjukkan bahwa permintaan didistribusikan secara merata antara Moonshot dan Alibaba Cloud Model Studio.