All Products
Search
Document Center

Container Service for Kubernetes:ack-node-problem-detector

Last Updated:Jun 17, 2026

ack-node-problem-detector detects node anomalies, provides an event center, and supports integration with third-party monitoring platforms for ACK clusters.

Overview

Built on the open-source Node Problem Detector (NPD), this component monitors node health and serves as an event center. It includes:

  • kube-event-init: Initializes the Simple Log Service (SLS) resources for the event center during installation, enabling ack-node-problem-detector-daemonset and kube-eventer to store, compute, and analyze event data.

  • ack-node-problem-detector-daemonset: Runs a pod on each eligible node to monitor node health and report cluster conditions and events. In this topic, the ack-node-problem-detector image address refers to the image for ack-node-problem-detector-daemonset.

    Note

    See the node-problem-detector open-source project.

  • kube-eventer: Reports cluster events to the SLS event center by default, providing event storage and analysis with a 90-day retention period, dashboards, alerting, and search. You can also configure kube-eventer to forward events to other systems such as DingTalk or EventBridge. See kube-eventer.

  • accel-health-monitor: Runs a pod on each eligible GPU node to monitor GPU device status and report node conditions and Kubernetes events. Image addresses are listed in the release notes below. For permissions and precautions, see GPU fault detection.

Usage

Event monitoring covers installation, use cases, and plugin features.

Release notes

May 2026

Version

Image address

Release date

Description

1.2.35

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.14-4b806cb-aliyun

  • node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/ack-node-problem-detector:v0.8.17-952071f-aliyun

  • accel-health-monitor: registry-__ACK_REGION_ID__-vpc.ack.aliyuncs.com/acs/accel-health-monitor:v0.5.5-ad7ad729-aliyun

May 18, 2026

Note

This version is in a canary release. To use this version, submit a ticket.

February 2026

Version

Image address

Release date

Description

1.2.30

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.14-4b806cb-aliyun

  • node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/ack-node-problem-detector:v0.8.17-952071f-aliyun

  • accel-health-monitor: registry-__ACK_REGION_ID__-vpc.ack.aliyuncs.com/acs/accel-health-monitor:v0.5.4-4c80dfa0-aliyun

February 2, 2026

Note

This version is in a canary release. To use this version, submit a ticket.

  • Applied security hardening to ack-node-problem-detector-daemonset.

  • Applied security hardening to kube-eventer.

  • Added a console option to enable or disable GPU fault isolation file generation.

  • Changed the fencing strategy for some GPU detection items. See GPU fault detection.

  • Added support for eRDMA detection.

November 2025

Version

Image address

Release date

Description

1.2.29

  • accel-health-monitor: registry-__ACK_REGION_ID__-vpc.ack.aliyuncs.com/acs/accel-health-monitor:v0.5.3-bafb2ba5-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.14-315a7cb-aliyun

November 30, 2025

Note

This version is in a canary release. To use this version, submit a ticket.

  • Decoupled the GPU detection plug-in from ack-node-problem-detector-daemonset into a separate DaemonSet, ack-accel-health-monitor. For required permissions, see GPU fault detection.

  • Added software and device detection to the GPU plug-in, including nvidia-persistenced, nvidia-fabricmanager, and NVLink.

  • Fixed occasional GPU plug-in restarts caused by JSON serialization failure.

  • Enabled kube-eventer to report data to SLS over HTTPS.

July 2025

Version

Image address

Release date

Description

1.2.27

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.13-b4a3960-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.9-2b115d6-aliyun

July 24, 2025

Note

This version is in a canary release. To use this version, submit a ticket.

June 2025

Version

Image address

Release date

Description

1.2.26

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.16-8d2193b-aliyun

  • npd-gpu: registry-__ACK_REGION_ID__-vpc.ack.aliyuncs.com/acs/npd-gpu-plugin:v0.4.1-7359b830-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.12-c7c1896-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.8-e43647f-aliyun

June 11, 2025

Note

This version is in a canary release. To use this version, submit a ticket.

  • Fixed an issue where the NvidiaDeviceRecovered event was not correctly emitted in some GPU self-healing scenarios.

  • Optimized the image size of ack-node-problem-detector.

Version

Image address

Release date

Description

1.2.25

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.16-8ed7053-aliyun

  • npd-gpu: registry-__ACK_REGION_ID__-vpc.ack.aliyuncs.com/acs/npd-gpu-plugin:v0.4.0-e434dc36-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.12-c7c1896-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.8-e43647f-aliyun

June 6, 2025

Note

This version is in a canary release. To use this version, submit a ticket.

  • Added the npd-gpu container for GPU fault detection.

  • Added support for fencing specific GPUs when a fault is detected.

  • Added detection items including NvidiaXID44Error, NvidiaXID61Error, NvidiaXID62Error, and NvidiaXID69Error. See GPU fault detection and automatic fencing.

  • Added support for configuring which GPU detection items to enable by using ack-node-problem-detector-config.

  • Optimized the image size of ack-node-problem-detector.

August 2024

Version

Image address

Release date

Description

1.2.20

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.14-3c6002c-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.11-0620284-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.8-e43647f-aliyun

August 20, 2024

  • Added support for GPU fault inspection on ECS nodes.

  • Upgraded kube-eventer to resolve performance bottlenecks when reporting large volumes of events.

  • Upgraded kube-eventer to support the V4 signature algorithm for SLS data transmission.

  • Added a parameter to configure the local port (20256 or 20257, disabled by default) for the ack-node-problem-detector DaemonSet pod.

December 2023

Version

Image address

Release date

Description

v1.2.18

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.13-003ac31-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-27a468a-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

December 18, 2023

  • Fixed false positive podOOMKilling events caused by cached historical kernel logs.

  • Enabled inheritance of custom component parameters when upgrading from older versions of ack-node-problem-detector.

August 2023

Version

Image address

Release date

Description

v1.2.17

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-27a468a-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

August 24, 2023

  • You can now update SLS Project and Logstore configurations from the Add-ons page in the ACK console.

  • Added support for custom labels, such as a cluster name, when sending log data to SLS. These labels appear by default in the ACK event center.

June 2023

Version

Image address

Release date

Description

v1.2.16

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-019546c-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

June 27, 2023

Added support for configuring component resource specifications on the ACK console Add-ons page.

v1.2.15

  • ack-node-problem-detector: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/node-problem-detector:v0.8.12-bf8aff8-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.8-019546c-aliyun

  • kube-event-init: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

June 6, 2023

Reduced API server and etcd load from frequent podOOMKilling events in large clusters.

February 2023

Version

Image address

Release date

Description

v1.2.14

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.11-edc7907-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.6-bbf76f7-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

February 3, 2023

  • Improved the image pull speed for the component.

  • Added support for ACK Edge clusters.

September 2022

Version

Image address

Release date

Description

v1.2.11

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.11-edc7907-aliyun

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer:v1.2.6-bbf76f7-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.7-48a2acc-aliyun

September 30, 2022

  • Optimized ack-node-problem-detector inspection logic to reduce load on core cluster components.

  • Applied image security hardening.

February 2022

Version

Image address

Release date

Description

v1.2.9

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.6-f0efecf-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun

February 22, 2022

  • Added support for kernel inspection.

  • Applied security hardening.

January 2022

Version

Image address

Release date

Description

v1.2.8

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.5-cc7ec54-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun

January 20, 2022

  • Added compatibility for different modes of Containerd.

  • Optimized QoS resource limits to improve component stability.

November 2021

Version

Image address

Release date

Description

v1.2.7

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.8.10-e0ff7d2

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.5-cc7ec54-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:v1.6-a92aba6-aliyun

November 25, 2021

  • Added compatibility with system services on operating systems such as Alibaba Cloud Linux 3 and CentOS 8.

  • Added support for the ARM architecture.

April 2021

Version

Image address

Release date

Description

v1.2.5

  • ack-node-problem-detector: registry.aliyuncs.com/acs/node-problem-detector:v0.6.3-28-160499f

  • kube-eventer: registry-vpc.__ACK_REGION_ID__.aliyuncs.com/acs/kube-eventer-amd64:v1.2.4-0f5aaee-aliyun

  • kube-event-init: registry.{ .Values.controller.regionId }.aliyuncs.com/acs/kube-eventer-init:1.5-5e0e7c1-aliyun

April 25, 2021

July 2020

Version

Image address

Release date

Description

v0.6.3-28-160499f

registry.aliyuncs.com/acs/node-problem-detector:v0.6.3-28-160499f

July 27, 2020

  • Enhanced OOMKilling event messages to include the pod name, namespace, and UID.

  • Improved the execution efficiency of the check_fd plug-in.

  • Improved event notifications for node PID usage thresholds.

  • Upgraded the network problem detection plug-in.

  • Added a plug-in to trigger alerts for node system disk inode usage.