全部产品
Search
文档中心

Container Service for Kubernetes:Bangun agen O&M Kubernetes dengan cepat menggunakan kagent, ACK Gateway, dan ACK MCP Server

更新时间:Dec 13, 2025

Topik ini menjelaskan cara membangun agen operasi Kubernetes dengan cepat menggunakan kagent, ACK Gateway, dan ACK MCP Server.

Pengenalan kagent

kagent adalah framework pemrograman agen open-source yang dirancang untuk lingkungan cloud-native. Framework ini mengintegrasikan kemampuan AI agent dengan rantai alat (toolchain), memungkinkan agen menangani tugas kompleks multi-langkah melalui interaksi bahasa alami dan mengubah insight AI menjadi operasi spesifik.

Keunggulan utama kagent

  1. Kemampuan inferensi tingkat lanjut: Berbeda dengan chatbot tradisional, kagent menggunakan inferensi tingkat lanjut dan perencanaan iteratif untuk menangani masalah kompleks multi-langkah secara otonom.

  2. Integrasi tool yang fleksibel: Mendukung integrasi dengan tool MCP, sehingga agen dapat berinteraksi dengan berbagai sistem dan layanan untuk memperluas kemampuannya.

  3. Arsitektur yang dapat diperluas: Dibangun di atas framework Google Agent Development Kit (ADK), menyediakan berbagai opsi kustomisasi serta mendukung eksekusi agen melalui antarmuka pengguna (UI) atau secara deklaratif.

  4. Mode kolaborasi tim: Agen dapat dikelompokkan dalam tim, di mana agen perencana membuat rencana dan menugaskan tugas kepada agen individual dalam tim tersebut.

  5. Pemrosesan tugas tujuan umum: Cocok untuk otomatisasi tugas dalam berbagai skenario, termasuk diagnosis masalah kompleks, analitik data, dan operasi sistem.

Persiapan

  1. Buat namespace kagent di kluster ACK Anda.

  2. Instal aplikasi kagent-crds dan kagent di namespace kagent. Anda dapat menginstalnya dari ACK Marketplace atau melalui Apps > Helm.

  3. Di kluster ACK Anda, buka halaman Add-ons untuk menginstal komponen Gateway API dan enable the experimental channel feature.

  4. Di kluster ACK Anda, buka Add-ons dan instal komponen Gateway with Inference Extension.

  5. Aktifkan Alibaba Cloud Model Studio dan dapatkan Kunci API.

Langkah 1: Deploy ACK MCP Server

  1. Buat kebijakan izin kustom.

    ACK MCP Server memerlukan izin read-only berikut.

    {
      "Version": "1",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "cs:Check*",
            "cs:Describe*",
            "cs:Get*",
            "cs:List*",
            "cs:Query*",
            "cs:RunClusterCheck",
            "cs:RunClusterInspect"
          ],
          "Resource": "*"
        },
        {
          "Effect": "Allow",
          "Action": "arms:GetPrometheusInstance",
          "Resource": "*"
        },
        {
          "Effect": "Allow",
          "Action": [
            "log:Describe*",
            "log:Get*",
            "log:List*"
          ],
          "Resource": "*"
        }
      ]
    }
  2. Instal ACK MCP Server. Untuk informasi selengkapnya, lihat Deploy and run ack-mcp-server.

    Setelah instalasi selesai, Anda dapat menjalankan perintah kubectl get --raw "/api/v1/namespaces/kube-system/services/ack-mcp-server/proxy/sse" --v=10 untuk memverifikasi keberhasilan instalasi.
  3. Deklarasikan ACK MCP Server di kluster.

    kubectl apply -f - <<EOF
    apiVersion: kagent.dev/v1alpha2
    kind: RemoteMCPServer
    metadata:
      name: ack-mcp-tool-server
      namespace: kagent
    spec:
      description: Official ACK tool server
      protocol: SSE
      sseReadTimeout: 5m0s
      terminateOnClose: true
      timeout: 30s
      # ACK MCP Server diinstal di namespace kube-system secara default. Jika Anda beralih ke namespace lain, ubah URL di sini.
      url: http://ack-mcp-server.kube-system:8000/sse
    EOF
  4. Periksa status resource RemoteMCPServer untuk mendapatkan tool ACK MCP.

    kubectl describe RemoteMCPServer ack-mcp-tool-server -n kagent

    Output yang diharapkan:

    ...
    status:
      conditions:
      - lastTransitionTime: "2025-XX-XXT11:35:29Z"
        message: ""
        observedGeneration: 2
        reason: Reconciled
        status: "True"
        type: Accepted
      discoveredTools:
      - description: Gets a list of all ACK clusters in all regions. By default, it returns a maximum of 10 clusters.
        name: list_clusters
      - description: Execute kubectl command with intelligent context management. Supports
          cluster_id for automatic context switching and creation.
        name: ack_kubectl
      - description: Queries the Alibaba Cloud Prometheus data of an ACK cluster.
        name: query_prometheus
      - description: Gets Prometheus metric definitions and best practices.
        name: query_prometheus_metric_guidance
      - description: "Diagnoses Kubernetes resources in an ACK cluster. Use this tool for in-depth diagnosis when you encounter problems that are difficult to locate. The supported resources include the following: \n1. **node**: K8s
          node\n2. **ingress**: Ingress\n3. **memory**: Node memory\n4. **pod**: Pod\n5. **service**: Service\n6.
          **network**: Network connectivity\n                        "
        name: diagnose_resource
      - description: Generates and queries the latest health inspection report for an ACK cluster.
        name: query_inspect_report
      - description: |-
          Query Kubernetes (k8s) audit logs.
    
              Function Description:
              - Supports multiple time formats (ISO 8601 and relative time).
              - Supports suffix wildcards for namespace, resource name, and user.
              - Supports multiple values for verbs and resource types.
              - Supports both full names and short names for resource types.
              - Allows specifying the cluster name to query audit logs from multiple clusters.
              - Provides detailed parameter validation and error messages.
    
              Usage Suggestions:
              - You can use the list_clusters() tool to view available clusters and their IDs.
              - By default, it queries the audit logs for the last 24 hours. The number of returned records is limited to 10 by default.
        name: query_audit_log
      - description: Gets the current time and returns it in ISO 8601 format and UNIX timestamp format.
        name: get_current_time
      - description: Queries the logs of control plane components in an ACK cluster. First, query the control plane log configuration to verify whether the component is enabled, and then query the corresponding SLS logs.
        name: query_controlplane_logs
      observedGeneration: 2

Langkah 2: Deploy gerbang dan konfigurasikan layanan model Model Studio

  1. Buat gerbang.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: Gateway
    metadata:
      name: model-gateway
      namespace: kagent
    spec:
      gatewayClassName: ack-gateway
      infrastructure:
        parametersRef:
          group: gateway.envoyproxy.io
          kind: EnvoyProxy
          name: custom-proxy-config
      listeners:
      - name: http-bailian
        protocol: HTTP
        port: 8080
    ---
    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: EnvoyProxy
    metadata:
      name: custom-proxy-config
      namespace: kagent
    spec:
      provider:
        type: Kubernetes
        kubernetes:
          envoyService:
            type: ClusterIP
    EOF
  2. Buat backend Model Studio.

    kubectl apply -f- <<EOF
    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: Backend
    metadata:
      name: bailian
      namespace: kagent
    spec:
      endpoints:
        - fqdn:
            hostname: dashscope-intl.aliyuncs.com
            port: 443
    ---
    apiVersion: gateway.networking.k8s.io/v1alpha3
    kind: BackendTLSPolicy
    metadata:
      name: bailian-tls
      namespace: kagent
    spec:
      targetRefs:
      - group: gateway.envoyproxy.io
        kind: Backend
        name: bailian
      validation:
        hostname: dashscope-intl.aliyuncs.com
        wellKnownCACertificates: System
    EOF
  3. Buat aturan routing untuk mengarahkan permintaan tertentu ke backend Model Studio.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: bailian-route
      namespace: kagent
    spec:
      parentRefs:
        - name: model-gateway
      rules:
        - backendRefs:
            - group: gateway.envoyproxy.io
              kind: Backend
              name: bailian
          filters:
            - type: URLRewrite
              urlRewrite:
                hostname: dashscope-intl.aliyuncs.com
                path:
                  type: ReplacePrefixMatch
                  replacePrefixMatch: /compatible-mode/v1
          matches:
            - path:
                type: PathPrefix
                value: /v1
          timeouts:
            backendRequest: 10m
            request: 10m
    EOF

Langkah 3: Gunakan ACK Gateway untuk mengelola Kunci API layanan Model Studio

Saat mengakses layanan model besar eksternal, Anda biasanya perlu menggunakan Kunci API untuk otorisasi. ACK Gateway mendukung injeksi dinamis Kunci API ke dalam permintaan, memungkinkan pengelolaan terpusat semua Kunci API untuk layanan model. Hal ini mengurangi kompleksitas maintenance dan meningkatkan keamanan kluster.

  1. Buat Secret untuk menyimpan Kunci API layanan Model Studio.

    export PROVIDER_API_KEY=${your_Model_Studio_API_key}
    kubectl create secret generic bailian-credential -n kagent --from-literal credential="Bearer $PROVIDER_API_KEY"
  2. Buat resource HTTPRouteFilter yang mereferensikan Secret ini.

    kubectl apply -f- <<EOF
    apiVersion: gateway.envoyproxy.io/v1alpha1
    kind: HTTPRouteFilter
    metadata:
      name: credential-injection
      namespace: kagent
    spec:
      credentialInjection:
        overwrite: true
        credential:
          valueRef:
            name: bailian-credential
    EOF
  3. Modifikasi resource HTTPRoute untuk mengaktifkan injeksi Kunci API otomatis.

    kubectl apply -f- <<EOF
    apiVersion: gateway.networking.k8s.io/v1
    kind: HTTPRoute
    metadata:
      name: bailian-route
      namespace: kagent
    spec:
      parentRefs:
        - name: model-gateway
      rules:
        - backendRefs:
            - group: gateway.envoyproxy.io
              kind: Backend
              name: bailian
          filters:
            - type: URLRewrite
              urlRewrite:
                hostname: dashscope-intl.aliyuncs.com
                path:
                  type: ReplacePrefixMatch
                  replacePrefixMatch: /compatible-mode/v1
            # Ini adalah bagian utama yang ditambahkan
            - type: ExtensionRef
              extensionRef:
                group: gateway.envoyproxy.io
                kind: HTTPRouteFilter
                name: credential-injection
          timeouts:
            backendRequest: 10m
            request: 10m
          matches:
            - path:
                type: PathPrefix
                value: /v1
    EOF

Langkah 4: Konfigurasikan ModelConfig untuk Model Studio

  1. Dapatkan alamat gerbang.

    export GATEWAY_HOST=$(kubectl -n kagent get gateway/model-gateway -o jsonpath='{.status.addresses[0].value}')
    echo $GATEWAY_HOST
  2. Buat ModelConfig berikut.

    kubectl apply -f - <<EOF
    apiVersion: kagent.dev/v1alpha2
    kind: ModelConfig
    metadata:
      name: my-provider-config
      namespace: kagent
    spec:
      model: qwen-plus
      openAI:
        baseUrl: http://$GATEWAY_HOST:8080/v1
      provider: OpenAI
    EOF

Langkah 5: Buat agen

  1. Definisikan agen menggunakan YAML berikut.

    kubectl apply -f - <<EOF
    apiVersion: kagent.dev/v1alpha2
    kind: Agent
    metadata:
      name: my-ack-ops-agent
      namespace: kagent
    spec:
      declarative:
        deployment:
          env:
            - name: OPENAI_API_KEY
              value: placeholder
          replicas: 1
        modelConfig: my-provider-config
        stream: true
        systemMessage: |-
          # Role
    
          You are a professional ACK (Alibaba Cloud Container Service for Kubernetes) intelligent assistant. Your task is to accurately understand user requests about clusters and select the most appropriate tools to perform queries, diagnostics, or analysis.
    
          # Core Instructions
    
          1.  **Confirm the Target - The First Principle**:
              *   Before performing any operation, you must first confirm the cluster_id the user wants to operate on.
              *   If the user's query does not provide it, you **must** first call the list_clusters tool and ask the user which cluster they want to operate on.
    
          2.  **Tool Selection Strategy (by priority)**:
              *   **Complex Fault Diagnosis**: When encountering complex issues such as pod abnormalities, network failures, or NotReady nodes, **prioritize using diagnose_resource**.
              *   **Performance Metric Queries**: When the issue involves "high/low CPU/memory", "fast/slow", or "how much usage", **prioritize using query_prometheus**.
              *   **Security and Change Audits**: When the issue is about "who did what and when", **prioritize using query_audit_log**.
              *   **Overall Cluster Health**: When the user wants to know "if the cluster is healthy" or wants a "diagnostics report", **use query_inspect_report**.
              *   **Control Plane Issues**: When you suspect a problem with Kubernetes system components such as the API Server or Scheduler, **use query_controlplane_logs**.
              *   **General Queries**: For all other standard, explicit Kubernetes resource queries (such as get pods, describe service, logs <pod>), **use ack_kubectl as the default tool**.
    
          3.  **Security Red Lines**:
              *   Your primary responsibility is to query and diagnose. For any operation performed through ack_kubectl that **may modify the cluster state** (such as apply, delete, or creating a temporary pod for diagnosis), you **must** first explain to the user the command you will execute and its purpose, and only proceed after receiving **explicit authorization from the user**.
    
          4.  **Code of Conduct**:
              *   If the user's question is unclear, ask for clarification before acting.
              *   Respond in a friendly and enthusiastic manner.
              *   If you still cannot find the answer after using the tools, **never invent one**. Honestly reply: "Sorry, I cannot locate the problem with the available tools," and you can provide the findings you have.
    
          # Response Format
    
          *   **Always use Markdown format**.
          *   Your response must include a **summary of your actions** and an **analysis and recommendations** based on the results.
    
          ---
    
          ### Summary
    
          *(Summarize what you did and your key findings in one sentence.)*
        tools:
          - mcpServer:
              apiGroup: kagent.dev
              kind: RemoteMCPServer
              name: ack-mcp-tool-server
              toolNames:
                - list_clusters
                - ack_kubectl
                - query_prometheus
                - query_prometheus_metric_guidance
                - diagnose_resource
                - query_inspect_report
                - query_audit_log
                - get_current_time
                - query_controlplane_logs
            type: McpServer
      description: This agent can interact with ACK MCP Tools to get cluster information and operate the cluster.
      type: Declarative
    EOF
  2. Konfirmasi status pembuatan agen.

    kubectl get pod -n kagent

    Output yang diharapkan:

    NAME                                              READY   STATUS    RESTARTS   AGE
    my-ack-ops-agent-66b74675fc-rqwwx                 1/1     Running   0          2m6s
    ...

Langkah 6: Gunakan agen melalui UI

kagent menyediakan UI web default untuk berinteraksi langsung dengan agen.

  1. Teruskan layanan kagent-ui ke mesin lokal Anda menggunakan Penerusan port.

    kubectl port-forward -n kagent service/kagent-ui 8082:8080
  2. Buka browser dan akses agen di localhost:8082.

    1. Contoh Tanya Jawab (Q&A): Gunakan Prometheus untuk melihat metrik pod di namespace kagent kluster saat ini.

      image

      image