ack-node-problem-detector detects node anomalies, provides an event center, and supports integration with third-party monitoring platforms for ACK clusters.
Overview
Built on the open-source Node Problem Detector (NPD), this component monitors node health and serves as an event center. It includes:
-
kube-event-init: Initializes the Simple Log Service (SLS) resources for the event center during installation, enablingack-node-problem-detector-daemonsetandkube-eventerto store, compute, and analyze event data. -
ack-node-problem-detector-daemonset: Runs a pod on each eligible node to monitor node health and report cluster conditions and events. In this topic, theack-node-problem-detectorimage address refers to the image forack-node-problem-detector-daemonset.NoteSee the node-problem-detector open-source project.
-
kube-eventer: Reports cluster events to the SLS event center by default, providing event storage and analysis with a 90-day retention period, dashboards, alerting, and search. You can also configurekube-eventerto forward events to other systems such as DingTalk or EventBridge. See kube-eventer. -
accel-health-monitor: Runs a pod on each eligible GPU node to monitor GPU device status and report node conditions and Kubernetes events. Image addresses are listed in the release notes below. For permissions and precautions, see GPU fault detection.
Usage
Event monitoring covers installation, use cases, and plugin features.
Release notes
May 2026
|
Version |
Image address |
Release date |
Description |
|
1.2.35 |
|
May 18, 2026 |
Note
This version is in a canary release. To use this version, submit a ticket.
|
February 2026
|
Version |
Image address |
Release date |
Description |
|
1.2.30 |
|
February 2, 2026 |
Note
This version is in a canary release. To use this version, submit a ticket.
|
November 2025
|
Version |
Image address |
Release date |
Description |
|
1.2.29 |
|
November 30, 2025 |
Note
This version is in a canary release. To use this version, submit a ticket.
|
July 2025
|
Version |
Image address |
Release date |
Description |
|
1.2.27 |
|
July 24, 2025 |
Note
This version is in a canary release. To use this version, submit a ticket.
|
June 2025
|
Version |
Image address |
Release date |
Description |
|
1.2.26 |
|
June 11, 2025 |
Note
This version is in a canary release. To use this version, submit a ticket.
|
|
Version |
Image address |
Release date |
Description |
|
1.2.25 |
|
June 6, 2025 |
Note
This version is in a canary release. To use this version, submit a ticket.
|
August 2024
|
Version |
Image address |
Release date |
Description |
|
1.2.20 |
|
August 20, 2024 |
|
December 2023
|
Version |
Image address |
Release date |
Description |
|
v1.2.18 |
|
December 18, 2023 |
|
August 2023
|
Version |
Image address |
Release date |
Description |
|
v1.2.17 |
|
August 24, 2023 |
|
June 2023
|
Version |
Image address |
Release date |
Description |
|
v1.2.16 |
|
June 27, 2023 |
Added support for configuring component resource specifications on the ACK console Add-ons page. |
|
v1.2.15 |
|
June 6, 2023 |
Reduced API server and etcd load from frequent |
February 2023
|
Version |
Image address |
Release date |
Description |
|
v1.2.14 |
|
February 3, 2023 |
|
September 2022
|
Version |
Image address |
Release date |
Description |
|
v1.2.11 |
|
September 30, 2022 |
|
February 2022
|
Version |
Image address |
Release date |
Description |
|
v1.2.9 |
|
February 22, 2022 |
|
January 2022
|
Version |
Image address |
Release date |
Description |
|
v1.2.8 |
|
January 20, 2022 |
|
November 2021
|
Version |
Image address |
Release date |
Description |
|
v1.2.7 |
|
November 25, 2021 |
|
April 2021
|
Version |
Image address |
Release date |
Description |
|
v1.2.5 |
|
April 25, 2021 |
|
July 2020
|
Version |
Image address |
Release date |
Description |
|
v0.6.3-28-160499f |
registry.aliyuncs.com/acs/node-problem-detector:v0.6.3-28-160499f |
July 27, 2020 |
|