node-problem-detector (NPD) provides a unified approach to manage Kubernetes clusters that are deployed across different regions. This topic describes how to enable NPD on an application that runs in an external Kubernetes cluster.

Prerequisites

You have connected to an external Kubernetes cluster through Container Service for Kubernetes. For more information, see Register a cluster.

Procedure

  1. Log on to the Container Service console.
  2. In the left-side navigation pane, choose Marketplace > App Catalog. The Alibaba Cloud Apps tab appears.
  3. Choose Operations/Observability (6) > ack-node-problem-detector.
    cluster_NPD_01
  4. Click the Parameters tab to set parameters. Click Create.
    Parameter Description
    alibaba_cloud_plugins Delete ram_role_check.

    If the instance does not have GPU cards, delete nvidia_gpu_check.

    serviceaccount Specify a service account with administrator privileges. You can run kubectl -n kube-system get sa to query service accounts. For more information, see Use kubectl on Cloud Shell to manage Kubernetes clusters.
    env Set parameters AccessKeyId, AccessKeySecret, and RegionId.
    sls enabled To archive events in Log Service, set enabled to true.
    topic Enter the name of the cluster.
    project Specify the project in Log Service that corresponds to your cluster.
    logstore Specify a Logstore under the project. To use the event center feature in Log Service, set the value to k8s-event.
    internal If a leased line connects the cluster and the VPC network, set the value to true. Otherwise, set the value to false.
    dingtalk enabled To send alerts to DingTalk groups, set enabled to true.
    monitorkinds Select the type of alert that you want to receive. Valid values:
    • Node
    • Pod
    To send alerts to DingTalk only, set the value to Node.
    token Enter the token of a DingTalk Chatbot. You can find the token in the URL of a DingTalk Chatbot.

Result

A sample DingTalk alert is as follows:cluster_NPD_03