全部产品
Search
文档中心

Alibaba Cloud Service Mesh:Pengalihan Trafik: Gunakan ASM untuk Mengelola Trafik LLM

更新时间:Jul 02, 2025

Berdasarkan HTTP, Alibaba Cloud Service Mesh (ASM) menawarkan dukungan yang ditingkatkan untuk protokol permintaan LLM. Kini mendukung standar protokol penyedia LLM umum, memberikan pengguna pengalaman integrasi yang sederhana dan efisien. Dengan ASM, pengguna dapat menerapkan akses canary, pengalihan proporsional, serta berbagai fitur observabilitas untuk LLM. Topik ini menjelaskan cara mengelola trafik LLM di ASM dari perspektif pengalihan trafik.

Prasyarat

Persiapan

Langkah 1: Buat aplikasi uji bernama sleep

  1. Buat file bernama sleep.yaml dengan konten berikut.

    Klik untuk melihat detail

    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: sleep
    ---
    apiVersion: v1
    kind: Service
    metadata:
      name: sleep
      labels:
        app: sleep
        service: sleep
    spec:
      ports:
      - port: 80
        name: http
      selector:
        app: sleep
    ---
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: sleep
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: sleep
      template:
        metadata:
          labels:
            app: sleep
        spec:
          terminationGracePeriodSeconds: 0
          serviceAccountName: sleep
          containers:
          - name: sleep
            image: registry-cn-hangzhou.ack.aliyuncs.com/ack-demo/curl:asm-sleep
            command: ["/bin/sleep", "infinity"]
            imagePullPolicy: IfNotPresent
            volumeMounts:
            - mountPath: /etc/sleep/tls
              name: secret-volume
          volumes:
          - name: secret-volume
            secret:
              secretName: sleep-secret
              optional: true
    ---
  2. Jalankan perintah berikut untuk membuat aplikasi bernama sleep.

    kubectl apply -f sleep.yaml

Langkah 2: Buat LLMProvider untuk Alibaba Cloud Model Studio

  1. Buat file bernama LLMProvider.yaml dengan konten berikut.

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: LLMProvider
    metadata:  
      name: dashscope-qwen
    spec:
      host: dashscope.aliyuncs.com
      path: /compatible-mode/v1/chat/completions
      configs:
        defaultConfig:
          openAIConfig:
            model: qwen1.5-72b-chat  # Qwen open-source LLM
            apiKey: ${dashscope API_KEY}

    Untuk model open-source lainnya, lihat dan Text Generation-Qwen-Open-source Version.

  2. Jalankan perintah berikut untuk membuat LLMProvider.

    kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider.yaml
  3. Jalankan perintah berikut untuk menguji LLMProvider.

    kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \
    --header 'Content-Type: application/json' \
    --data '{
        "messages": [
            {"role": "user", "content": "Please introduce yourself."}
        ]
    }'

    Output yang diharapkan:

    {"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1735021898,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-3608dcd5-e3ad-9ade-bc70-xxxxxxxxxxxxxx"}%   

    Setelah LLMProvider dibuat, ia dapat langsung mengakses dashscope.aliyuncs.com dalam pod tempat aplikasi sleep berada melalui HTTP. Sidecar ASM secara otomatis mengonversi permintaan menjadi format yang sesuai dengan protokol LLM OpenAI (Alibaba Cloud Model Studio kompatibel dengan protokol LLM OpenAI), menambahkan API Key ke permintaan, memperbarui protokol HTTP menjadi HTTPS, dan akhirnya mengirimkan permintaan ke server penyedia LLM di luar kluster.

Demonstrasikan prosedur dalam skenario berbeda

Skenario 1: Buat LLMRoute untuk menerapkan model berbeda untuk jenis pengguna yang berbeda

  1. Buat file bernama LLMRoute.yaml dengan konten berikut.

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: LLMRoute
    metadata:  
      name: dashscope-route
    spec:
      host: dashscope.aliyuncs.com # Penyedia berbeda tidak dapat memiliki host yang sama.
      rules:
      - name: vip-route
        matches:
        - headers:
            user-type:
              exact: subscriber  # Ini adalah rute khusus untuk pengguna berlangganan, yang akan disediakan konfigurasi khusus di penyedia nanti.
        backendRefs:
        - providerHost: dashscope.aliyuncs.com
      - backendRefs:
        - providerHost: dashscope.aliyuncs.com

    Konfigurasi ini memungkinkan permintaan yang membawa user-type:subscriber mengikuti aturan pengalihan vip-route.

  2. Buat LLMRoute dengan perintah berikut.

    kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMRoute.yaml
  3. Perbarui file LLMProvider.yaml dengan konten berikut dan tambahkan konfigurasi tingkat rute.

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: LLMProvider
    metadata:  
      name: dashscope-qwen
    spec:
      host: dashscope.aliyuncs.com
      path: /compatible-mode/v1/chat/completions
      configs:
        defaultConfig:
          openAIConfig:
            model: qwen1.5-72b-chat  # Model open-source digunakan secara default.
            apiKey: ${dashscope API_KEY}
        routeSpecificConfigs:
          vip-route:  # Rute khusus untuk pengguna berlangganan.
            openAIConfig:
              model: qwen-turbo  # Model qwen-turbo untuk pengguna berlangganan.
              apiKey: ${dashscope API_KEY}

    Jalankan perintah berikut untuk memodifikasi LLMProvider.

    kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider.yaml
  4. Jalankan pengujian menggunakan perintah berikut:

    kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \
    --header 'Content-Type: application/json' \
    --data '{
        "messages": [
            {"role": "user", "content": "Please introduce yourself."}
        ]
    }'
    kubectl exec deployment/sleep -it -- curl --location 'http://dashscope.aliyuncs.com' \
    --header 'Content-Type: application/json' \
    --header 'user-type: subscriber' \
    --data '{
        "messages": [
            {"role": "user", "content": "Please introduce yourself."}
        ]
    }'

    Output yang diharapkan:

    {"choices":[{"message":{"role":"assistant","content":"I am a pre-trained language model developed by Alibaba Cloud. I am Qwen. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1735021898,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-3608dcd5-e3ad-9ade-bc70-06aed84b6715"}%   
    {"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1735021898,"system_fingerprint":null,"model":"qwen-turbo","id":"chatcmpl-3608dcd5-e3ad-9ade-bc70-06aed84b6715"}%   

    Output menunjukkan bahwa model qwen-turbo digunakan untuk pengguna berlangganan.

Skenario 2: Konfigurasikan LLMProvider dan LLMRoute untuk mendistribusikan trafik secara proporsional

  1. Buat file bernama LLMProvider-moonshot.yaml dengan konten berikut.

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: LLMProvider
    metadata:  
      name: moonshot
    spec:
      host: api.moonshot.cn # Penyedia berbeda tidak dapat memiliki host yang sama.
      path: /v1/chat/completions
      configs:
        defaultConfig:
          openAIConfig:
            model: moonshot-v1-8k
            stream: false
            apiKey: ${Moonshot API_KEY}
  2. Jalankan perintah berikut untuk membuat LLMProvider untuk Moonshot.

    kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMProvider-moonshot.yaml
  3. Buat file bernama demo-llm-server.yaml dengan konten berikut.

    apiVersion: v1
    kind: Service
    metadata:
      name: demo-llm-server
      namespace: default
    spec:
      ports:
      - name: http
        port: 80
        protocol: TCP
        targetPort: 80
      selector:
        app: none
      type: ClusterIP
  4. Jalankan perintah berikut untuk membuat layanan demo-llm-server.

    kubectl apply -f demo-llm-server.yaml
  5. Perbarui file LLMRoute.yaml dengan konten berikut.

    apiVersion: istio.alibabacloud.com/v1beta1
    kind: LLMRoute
    metadata:
      name: demo-llm-server
      namespace: default
    spec:
      host: demo-llm-server
      rules:
      - backendRefs:
        - providerHost: dashscope.aliyuncs.com
          weight: 50
        - providerHost: api.moonshot.cn
          weight: 50
        name: migrate-rule
  6. Jalankan perintah berikut untuk memperbarui aturan pengalihan LLMRoute.

    kubectl --kubeconfig=${PATH_TO_ASM_KUBECONFIG} apply -f LLMRoute.yaml
  7. Jalankan pengujian berikut beberapa kali.

    kubectl exec deployment/sleep -it -- curl --location 'http://demo-llm-server' \
    --header 'Content-Type: application/json' \
    --data '{
        "messages": [
            {"role": "user", "content": "Please introduce yourself"}
        ]
    }' 

    Output yang diharapkan:

    {"id":"chatcmpl-676a6599045dxxxxxxxxxxxx","object":"chat.completion","created":1735026073,"model":"moonshot-v1-8k","choices":[{"index":0,"message":{"role":"assistant","content":"Hello there! I'm Kimi, your AI assistant crafted by the innovative minds at Moonshot AI. I'm here to lend a digital hand with your queries, providing safe, helpful, and accurate responses. Whether it's a dash of information or a deep dive into data, I'm your go-to for a chat. Let's make today awesome! How can I assist you?"},"finish_reason":"stop"}],"usage":{"prompt_tokens":78,"completion_tokens":78,"total_tokens":156}}
    
    {"choices":[{"message":{"role":"assistant","content":"Hello! I am Qwen, a pre-trained language model developed by Alibaba Cloud. My purpose is to assist users in generating various types of text, such as articles, stories, poems, and answering questions by leveraging my extensive knowledge and understanding of context. Although I'm an AI, I don't have a physical body or personal experiences like human beings do, but I've been trained on a vast corpus of text data, which allows me to engage in conversations, provide information, or help with various tasks to the best of my abilities. So, feel free to ask me anything, and I'll do my best to provide helpful and informative responses!"},"finish_reason":"stop","index":0,"logprobs":null}],"object":"chat.completion","usage":{"prompt_tokens":12,"completion_tokens":130,"total_tokens":142},"created":1735021898,"system_fingerprint":null,"model":"qwen1.5-72b-chat","id":"chatcmpl-3608dcd5-e3ad-9ade-bc70-06aed84b6715"}%   

    Output menunjukkan bahwa sekitar 50% permintaan dikirim ke Moonshot, dan 50% ke Alibaba Cloud Model Studio.