Event monitoring is another monitoring method in Kubernetes. It makes up for the disadvantages of resource monitoring in timeliness, accuracy, and scenarios. The core design concept of Kubernetes is state machine. The transitions between different states generate corresponding events. Specifically, there will be Normal events when the state machine changes to a desired state and Warning events when the state machine changes to an unexpected state. Developers can obtain events to diagnose cluster exceptions and problems in real time.

Background information

Maintained by Alibaba Cloud Container Service, kube-eventer is an open-source event emitter that sends Kubernetes events to systems such as DingTalk and Log Service. It also provides filter conditions of different levels to realize real-time event collection, targeted alerting, and asynchronous archiving. For more information, see kube-eventer.

This topic describes how to configure event monitoring in the following three scenarios:

Scenario 1: Use DingTalk to implement monitoring and alerting of Kubernetes events

To use a DingTalk Chatbot to monitor Kubernetes events and send alerts when necessary is a typical scenario of ChatOps. To configure event monitoring with DingTalk, follow these steps:

  1. Click Group settings in the upper-right corner of the chatbox of a DingTalk group. The Group Settings page appears.
  2. Click Group Assistant. On the Group Assistant page that appears, click the plus sign to go to the Chatbot page. In this example, select Custom.
    Add a custom Chatbot
  3. On the Robot details page, click Add to go to the Add Robot page.
    Add a Chatbot
  4. Set the parameters as shown in the following table, and then click Finish.
    Parameter Description
    Edit Profile Picture Optional. The avatar of the Chatbot.
    Chatbot Name The name of the Chatbot.
    Add to Group The group to which the Chatbot is added to.
    Enable the outgoing function Optional. You can tag the Chatbot to send a message to an external service or return the response of the external service to the group.
    Note We recommend that you disable this feature.
    POST Address The HTTP address for receiving messages.
    Note This parameter is available only when you enable the outgoing feature.
    Token The token used to verify whether the request is sent from DingTalk.
    Note This parameter is available only when you enable the outgoing feature.
  5. On the Add Robot page, click Copy to copy the webhook URL.
    Copy the webhook URL
    Note On the Chatbot page, find the target Chatbot, and click Settings button to:
    • Modify the avatar and name of the Chatbot.
    • Enable or disable message push.
    • Reset the webhook URL.
    • Remove the Chatbot.
    Modify the configurations of a Chatbot
  6. Log on to the Container Service console.
  7. In the left-side navigation pane, choose Marketplace > App Catalog. On the App Catalog page that appears, select ack-node-problem-detector.
    App Catalog
  8. On the App Catalog - ack-node-problem-detector page, click the Parameters tab and modify the following content.
    • In the npd section, set the enabled parameter to false.
    • In the dingtalk section, set the enabled parameter to true.
    • Enter the token from step 5 in the token field.
    Enter the token
  9. On the right side of the page, select the target cluster and verify that the Namespace parameter is set to kube-system and the Release Name parameter is set to ack-node-problem-detector, and then click Create.
kube-eventer takes effect about 30 seconds after the deployment is complete. When the event level exceeds the threshold, you will receive alerts in the DingTalk group, as shown in the following figure.Message prompt

Scenario 2: Use Log Service to store Kubernetes events

Log Service can store Kubernetes events in a relatively persistent manner and provides stronger event archiving and auditing capabilities. To configure event monitoring with Log Service, follow these steps:

  1. Create a project and a Logstore.
    1. Log on to the Log Service console.
    2. In the upper-right corner of the Projects section, click Create Project. On the Create Project page that appears, configure basic information about the project, and then click Create.
      In this example, a project named k8s-log4j is created in the China (Hangzhou) region, where the Kubernetes cluster is deployed.
      Note We recommend that you create the project in the region where the Kubernetes cluster is deployed. When the Log Service project and the Kubernetes cluster are located in the same region, log data is transmitted through the internal network, thus avoiding Internet bandwidth fees and reducing time consumption caused by cross-region transmission. This ensures real-time log collection and quick retrieval.
      Create a project
    3. Find the k8s-log4j project in the Projects section, and then click the project name to go to the project details page.
    4. Click Create in the upper-right corner of the Logstore page that appears.
      Create a Logstore named k8s-logstore
    5. On the Create Logstore page that appears, set the parameters, and then click Create.
      In this example, a Logstore named k8s-logstore is created.
      Create a Logstore
    6. Open the data import wizard after the Logstore is created.
      Message prompt
    7. On the Custom Data page, select log4jAppender, and then set related parameters as prompted.
      In this example, the default settings are used. You can set the parameters based on the specific scenario.
      Custom data
  2. Configure the k8s-log4j project in the Kubernetes cluster.
    1. Log on to the Container Service console.
    2. In the left-side navigation pane, choose Marketplace > App Catalog. On the App Catalog page that appears, select ack-node-problem-detector.
      App Catalog
    3. On the App Catalog - ack-node-problem-detector page, click the Parameters tab and modify the following content.
      • In the npd section, set the enabled parameter to false.
      • In the sinks section, set the sls.enabled parameter to true.
      • Specify the project and logstore from step 1 in the corresponding fields.
    4. Click Create to deploy the eventer to the target cluster.
  3. After an operation is performed on the cluster, such as deleting a pod or creating an application, an event is generated. You can log on to the Log Service console to view the collected log data. For more information, see Preview logs in the Log Service console.
    View the collected data
  4. Set indexes and offline archiving. For more information, see Enable and configure the index feature for a Logstore.
    1. Go to the Log Service console, find the target project, and click the project name.
    2. In the Logstores list, find the target Logstore, click the icon at the right side of the Logstore name, and select Search & Analysis.
    3. Click Enable in the upper-right corner.
      Enable indexing
    4. On the Search & Analysis page that appears, set the parameters based on your needs.
      Configure log query and analysis
    5. Click OK.
      The log query and analysis page appears.Log analysis
      Note
      • The index configurations take effect within one minute.
      • After you have enabled or modified the indexes, the new configurations take effect for the newly written data only.
    6. If you need to implement offline archiving and computing, you can ship data from the Logstore to OSS. For more information, see Ship logs to OSS.

Scenario 3: Use node-problem-detector and kube-eventer to alert on node exceptions

node-problem-detector is a tool for diagnosing node exceptions in Kubernetes. It converts node exceptions to node events, and works with kube-eventer to report alerts on these node events. Supported node exceptions include Docker engine hang, Linux kernel hang, outbound traffic exceptions, and file descriptor exceptions. To configure event monitoring with node-problem-detector, follow these steps:

  1. Log on to the Container Service console.
  2. In the left-side navigation pane, choose Marketplace > App Catalog. On the App Catalog page that appears, select ack-node-problem-detector.
    App Catalog
  3. On the App Catalog - ack-node-problem-detector page, click the Parameters tab to view the default configurations of node-problem-detector.
    Cluster configuration page
    You can configure kube-eventer with the parameters in the following table.
    Table 1. Parameters
    Parameter Description Default value
    npd.image.repository The URL of the node-problem-detector image. registry.cn-beijing.aliyuncs.com/acs/node-problem-detector
    npd.image.tag The tag of the node-problem-detector image. v0.6.3-16-g30dab97
    alibaba_cloud_plugins The list of the Alibaba Cloud plug-ins to enable. fd_check
    eventer.image.repository The URL of the kube-eventer image. registry.cn-hangzhou.aliyuncs.com/acs/eventer
    eventer.image.tag The tag of the kube-eventer image. v1.6.0-4c4c66c-aliyun
    eventer.image.pullPolicy The download policy of the kube-eventer image. IfNotPresent
    eventer.sinks.sls.enabled Specifies whether to enable Log Service as a sink of kube-eventer. false
    eventer.sinks.sls.project The name of the Log Service project. -
    eventer.sinks.sls.logstore The name of the Logstore in the Log Service project. -
    eventer.sinks.dingtalk.enabled Whether to enable DingTalk as a sink of kube-eventer. false
    eventer.sinks.dingtalk.level The level of events at which alerts are sent to DingTalk. warning
    eventer.sinks.dingtalk.label The labels of the alert events. -
    eventer.sinks.dingtalk.token The token of the DingTalk Chatbot. -
    eventer.sinks.dingtalk.monitorkinds The type of resources whose events are monitored. -
    eventer.sinks.dingtalk.monitornamespaces The namespace of the resources whose events are monitored. -
  4. On the right of the App Catalog - ack-node-problem-detector page, select the target cluster and verify that the Namespace parameter is set to kube-system and the Release Name parameter is set to ack-node-problem-detector, and then click Create.
    In the left-side navigation pane, choose Applications > Pods. On the Pods page, select the target cluster and namespace, and verify that all the ack-node-problem-detector-daemonset pods are running properly.daemonset
When both node-problem-detector and kube-eventer are running properly, events are saved and alerts are reported based on the kube-eventer configurations.