All Products
Search
Document Center

Alibaba Cloud Linux:Using the SysAK toolkit

Last Updated:Mar 01, 2026

System Analyse Kit (SysAK) provides tools for routine monitoring, online issue diagnostics, and system failure recovery on Alibaba Cloud operating systems. SysAK was built from years of experience operating and maintaining millions of servers.

SysAK runs with minimal overhead. All tools combined use at most 3% CPU, and each individual tool uses at most 1% CPU. SysAK does not add load overheads or cause network jitters.

Warning

SysAK hooks functions in the kernel for runtime diagnostics and monitoring, which may cause system instability. Select an appropriate maintenance window to run diagnostic and monitoring commands.

Quick start

After you install SysAK, try these common commands:

sysak list -a                        # List all available tools
sysak loadtask -s                    # Show current system load summary
sysak iofsstat -T 10                 # Monitor disk I/O for 10 seconds
sysak memleak -t slab -c             # Quick check for slab memory leaks
sysak nosched -t 20 -s 30            # Detect scheduling delays > 20ms for 30s
sysak irqoff -t 5 60                 # Detect interrupt-off periods > 5ms for 60s
sysak pingtrace -c <ip>              # Trace network latency to a host
sysak memgraph -g                    # Show memory usage chart
sudo sysak mservice -S               # Start system monitoring
sysak mservice -l                    # View monitoring data interactively

SysAK operates in two modes:

ModeBehavior
MonitoringRuns in the background. Collects and tracks system metrics continuously.
DiagnosticsRuns on demand. Analyzes root causes of system issues in real time.

O&M scenarios

SysAK covers three operations and maintenance (O&M) scenarios:

  • Routine monitoring: Monitor system resources, schedule and manage business, and control business resources at fine granularity. Track interruptions and jitters in real time.

  • Issue diagnostics: Diagnose abnormal loads, network jitters, memory leaks, I/O hangs, and performance exceptions online.

  • Failure recovery: Isolate and recover from partial failures such as deadlocks and breakdowns.

Install SysAK

Prerequisites

  • Linux with kernel version 3.10 or later (Alibaba Cloud Linux 2, Alibaba Cloud Linux 3, Anolis OS 8.4 ANCK, or CentOS 7)

  • x86_64 architecture

Run uname -a to check the kernel version of your Elastic Compute Service (ECS) instance.

Alibaba Cloud Linux 2

Option 1: Install from the YUM repository (recommended)

  1. Check available versions:

       yum search sysak
  2. Install the latest version:

       sudo yum install -y sysak

Option 2: Install from an RPM package (if the Alibaba Cloud YUM repository is unavailable)

  1. Download the latest RPM package that matches your kernel version:

    Visit the Open Source Image Site to find RPM packages for your kernel version.
       wget https://mirrors.openanolis.cn/sysak/packages/sysak-1.3.0-2.x86_64.rpm
  2. Install the package:

       sudo rpm -ivh --nodeps sysak-1.3.0-2.x86_64.rpm

Anolis OS 8.4 ANCK

  1. Download the latest RPM package that matches your kernel version:

    Visit anolis / sysak to find RPM packages for your kernel version.
       wget https://mirrors.openanolis.cn/sysak/packages/sysak-1.3.0-2.x86_64.rpm
  2. Install the package:

       sudo rpm -ivh --nodeps sysak-1.3.0-2.x86_64.rpm

Other Linux distributions (kernel 3.10 or later)

For other Linux distributions such as CentOS 7, build SysAK from source. Only the open source version is available, and compatibility issues may occur. Visit anolis / sysak for build instructions.

Verify the installation

Confirm that SysAK is working:

sysak help

Expected output:

Usage: sysak [ cmd ] [ subcmd [ cmdargs ] ]

List all available tools:

sysak list -a

Common commands

CommandDescription
sysak helpDisplay usage information. Syntax: sysak [cmd] [subcmd [cmdargs]]
sysak list -aList all supported tool features
sysak [subcmd] -hDisplay help for a specific tool
  • cmd: Management commands such as list and help.

  • subcmd: Tool-specific feature commands.

  • cmdargs: Arguments for tool commands.

System monitoring

Start monitoring

Start monitoring with one of the following methods:

  • Run the monitoring service directly:

      sudo sysak mservice -S
  • Add SysAK as a persistent system service that starts automatically on boot:

      sudo systemctl enable sysak
      sudo systemctl start sysak

Configure monitoring

The configuration file is located at /usr/local/sysak/sysakmon.conf. After modifying the file, restart the service:

systemctl restart sysak

Configuration options:

OptionDescriptionDefault
server_mode http,localMonitoring mode. http: expose metrics over HTTP. local: store and view data locally.-
cron_period 60Sampling period in local mode (seconds)60
output_file_pathLog storage path in local mode/usr/local/sysak/tsar.data
mod_xxx on&#124;offEnable (on) or disable (off) a specific metric-

View monitoring data

ModeCommandDescription
HTTPcurl http://127.0.0.1:9200/metrics/raw/System monitoring data
HTTPcurl http://127.0.0.1:9200/metrics/cgroup/rawControl groups (cgroups) monitoring data
HTTPcurl http://127.0.0.1:9200/metrics/cgroup/$cgroupid/rawData for a specific cgroup
Localsysak mservice -lInteractive monitoring view
For HTTP mode, replace 127.0.0.1 with the IP address of the monitored ECS instance.

Monitoring metrics reference

Metrics marked with a provider name are implemented by SysAK itself or by kernel features of Alibaba Cloud Linux and Anolis OS.

System resources

Computing resources

CategoryMetricDescription
CPUuserUser-mode CPU utilization
CPUsysSystem-mode CPU utilization
CPUhirqCPU utilization servicing hardware interrupts
CPUsirqCPU utilization servicing software interrupts
LOADload*Average system load over the past 1 second, 5 seconds, or 15 seconds

Memory resources

CategoryMetricDescription
MemoryfreeAmount of unused memory
MemoryusedAmount of used memory
MemorybufferAmount of memory used as buffers
MemorycacheAmount of memory used as cache
MemorytotalTotal memory
Memorymem.utilMemory usage percentage
SwapswpinNumber of pages swapped in
SwapswapoutNumber of pages swapped out
SwaptotalTotal swap pages
Swapswap.utilSwap usage percentage

I/O resources

CategoryMetricDescription
I/O accessrrqmsMerged read requests per second
I/O accesswrqmsMerged write requests per second
I/O accessrsRead requests per second
I/O accesswsWrite requests per second
I/O accessrsecsSectors read per second
I/O accesswsecsSectors written per second
I/O accessrqsizeAverage request size
I/O accessqusizeAverage request queue length
I/O accesssvctmAverage I/O service duration
I/O accessio.utilPercentage of CPU time during which requests are issued
Disk spacebfreeUnused data blocks
Disk spacebusedUsed data blocks
Disk spacebtotlTotal data blocks
Disk spacepatition.utilPartition usage
Disk spaceifreeAvailable inodes
Disk spaceitotlTotal inodes
Disk spaceiutilInode usage

Network resources

CategoryMetricDescription
Network trafficbytinReceived bytes
Network trafficbytoutSent bytes
Network trafficpktinTotal received packets
Network trafficpktoutTotal sent packets
TCPactiveActive TCP connections
TCPpasivePassive TCP connections
TCPisegReceived TCP packets
TCPoutsegSent TCP packets
UDPidgmReceived UDP packets
UDPodgmSent UDP packets

System bottlenecks

I/O bottleneck

CategoryMetricDescription
Read/write latencyawaitAverage I/O waiting time
Read/write latencyrawaitAverage I/O read waiting time
Read/write latencywawaitAverage I/O write waiting time

Memory bottleneck

CategoryMetricDescription
Cache reclaim and defragmentationkswapdNumber of times Kernel Swap Daemon (kswapd) reclaims pages
Cache reclaim and defragmentationpg_krPages asynchronously reclaimed
Cache reclaim and defragmentationpg_drPages directly reclaimed
Cache reclaim and defragmentationkcompdNumber of times kcompactd compacts memory
Cache reclaim and defragmentationdc_allNumber of direct memory compaction events
Cache reclaim and defragmentationdc_finNumber of completed direct memory compactions
Cache reclaim and defragmentationoomNumber of out-of-memory (OOM) errors

Network bottleneck

CategoryMetricDescription
Network transmissionpkterrError packets
Network transmissionpktdrpDropped packets
Network transmissionEstResetResets during ESTABLISHED TCP connections
Network transmissionAtmpFailFailed TCP connection attempts
Network transmissionretranTCP retransmission rate
Network transmissionnoportNonexistent UDP ports or addresses
Network transmissionidmerrInvalid UDP packets

CPU bottleneck

CategoryMetricDescriptionProvided by
Multitask concurrencycswchContext switches on CPU resources-
Multitask concurrencyprocNumber of fork system calls-
Ready queue delaysrqslow.dltnumTimes the ready queue wait exceeded the thresholdSysAK
Ready queue delaysrqslow.dlttmTotal latency when ready queue wait exceeded the thresholdSysAK

System software bottleneck

CategoryMetricDescriptionProvided by
Kernel critical resourcesnoschd.dltnumTimes CPU system-mode duration exceeded the thresholdSysAK
Kernel critical resourcesnoschd.dlttmTotal latency when CPU system-mode duration exceeded the thresholdSysAK

System interruptions

CategoryMetricDescriptionProvided by
Interrupt disable latencyirqoff.dltnumTimes the interrupt disable period exceeded the thresholdSysAK
Interrupt disable latencyirqoff.dlttmTotal latency when the interrupt disable period exceeded the thresholdSysAK

Container metrics

These metrics are collected per container.

Computing resources

CategoryMetricDescriptionProvided by
CPUusr/sys/hriq/sirqCPU utilization in user mode, system mode, hardware interrupts, and software interrupts-
LoadnrunReady tasks in the containerAlibaba Cloud Linux and Anolis OS
LoadnunintTasks in D block state in the containerAlibaba Cloud Linux and Anolis OS
Loadload*Average container load over the past 1 second, 5 seconds, or 10 secondsAlibaba Cloud Linux and Anolis OS

Memory resources

CategoryMetricDescriptionProvided by
Memorytotal/free/used/cache/bufferTotal, available, used, cache, and buffer memory in the container-
Memory bottleneckpgfaultPage faults in the container-
Memory bottleneckpgmajfaultPage faults due to disk swapping or file mappings-
Memory bottleneckmfailcntFailed memory allocation requests in the container-
Memory bottleneckdrgl*Global memory reclaim latency distributionAlibaba Cloud Linux and Anolis OS
Memory bottleneckdrml*Container memory reclaim latency distributionAlibaba Cloud Linux and Anolis OS
Memory bottleneckdcl*Container memory compaction latency distributionAlibaba Cloud Linux and Anolis OS

I/O resources

CategoryMetricDescriptionProvided by
I/OriopsRead operations in the container-
I/OwiopsWrite operations in the container-
I/OrbpsBytes read from the container-
I/OwbpsBytes written to the container-
I/OrwaitRead operation waiting timeAlibaba Cloud Linux and Anolis OS
I/OwwaitWrite operation waiting timeAlibaba Cloud Linux and Anolis OS
I/OrsrvRead service timeAlibaba Cloud Linux and Anolis OS
I/OwsrvWrite service timeAlibaba Cloud Linux and Anolis OS
I/OrioqQueued read operationsAlibaba Cloud Linux and Anolis OS
I/OwioqQueued write operationsAlibaba Cloud Linux and Anolis OS
I/OrioqszBytes in queued read operationsAlibaba Cloud Linux and Anolis OS
I/OwioqszBytes in queued write operationsAlibaba Cloud Linux and Anolis OS
I/OrarqszAverage bytes per read operationAlibaba Cloud Linux and Anolis OS
I/OwarqszAverage bytes per write operationAlibaba Cloud Linux and Anolis OS

Hardware resources

CategoryMetricDescription
Resource bottleneckllcrefLast Level Cache (LLC) accesses in the container
Resource bottleneckllcmisLLC misses in the container
Resource bottleneckCPICPI (Cycles Per Instruction) in the container

Diagnostics tools

System scanning

ossre_client

Automatically scans for potential issues across the system.

sysak ossre_client [ -a ] [ -p ] [ -i ]
OptionDescription
-aScan the entire system
-pScan for panic events only
-iScan for known issues only

Some options can be used with the ossre server.

CPU and scheduling issues

loadtask

Diagnoses system load by identifying the processes with the highest loads and their causes.

sysak loadtask [ -m maxload ] [ -i interval ] [ -f outfile ] [ -d ] [ -s ] [ -g ]
OptionDescriptionDefault
-m maxloadLoad threshold. Triggers automatic diagnostics when breached. If omitted, diagnoses immediately.Immediate
-i intervalScan interval in seconds (monitoring mode)-
-f outfileOutput file path/var/log/sysak/loadtask.log
-dIn monitoring mode: save all data when values exceed maxload (without -d, SysAK exits after the first detection)Off
-sShow load summary in the consoleOff
-gGenerate a flame graph for the entire systemOff

nosched

Diagnoses tasks that cannot be scheduled in a timely manner because the CPU has run in kernel mode for an extended period.

sysak nosched [--help] [-t THRESH(ms)] [-f LOGFILE] [-s duration(s)]
OptionDescriptionDefault
-t THRESHThreshold for unscheduled time (milliseconds). Events exceeding this value are recorded.10
-f LOGFILELog file path/var/log/sysak/nosched/nosched.log
-s durationProgram run duration (seconds). Runs indefinitely if omitted.Indefinite

irqoff

Diagnoses interrupts that are disabled for an extended period.

sysak irqoff [--help] [-t THRESH(ms)] [-f LOGFILE] [duration(s)]
OptionDescriptionDefault
-t THRESHThreshold for interrupt-disabled time (milliseconds). Events exceeding this value are recorded.10
-f LOGFILELog file path/var/log/sysak/irqoff/irqoff.log
durationProgram run duration (seconds). Runs indefinitely if omitted.Indefinite

runqslower

Diagnoses high task scheduling latency.

sysak runqslower [-s SPAN] [-t TID] [-f LOGFILE] [-P] [THRESH]
OptionDescriptionDefault
-s SPANProgram run duration (seconds). Runs indefinitely if omitted.Indefinite
THRESHThreshold for preemption time (milliseconds). Events exceeding this value are recorded.50
-f LOGFILELog file path/var/log/sysak/runqslow/runqslow.log
-t TIDFilter to a specific thread ID. Monitors all threads if omitted.All threads
-PRecord the name and TID of the previously preempted taskOff

cpuirq

Shows interrupt binding and execution status for a CPU.

sysak cpuirq [-c cpu -b ] [ -t [ -i interval ] ]
OptionDescription
-c cpuSpecify a CPU
-bShow interrupt binding information for the specified CPU
-tShow the request with the most interrupts over a time period
-i intervalData collection interval

softirq

Records the running status (count or rate) of soft interrupts in the system.

sysak softirq [ option ] [ args ]
OptionDescription
-sSource file containing initial data
-rOutput file

Memory issues

memleak

Checks for kernel memory leaks (slab, vmalloc, and buddy allocator) and identifies where leaks occur.

sysak memleak [-t type] [-i interval] [-c]
OptionDescriptionDefault
-t typeMemory leak type: slab, vmalloc, or page-
-i intervalDiagnostic period (seconds)300
-cQuick diagnostics mode. Determines whether memory is leaked without identifying exact locations.Off

mmaptrace

Identifies user-mode memory leak locations and provides call stacks for memory allocation requests.

The mmaptrace tool requires a separate component download. Run sysak list -a to check whether this tool is installed.
sysak mmaptrace [ option ] [ args ]
OptionDescription
-p <pid>Monitor memory allocation for a specific process
-lMonitor memory sizes requested by malloc and mmap
-sShow the call stack for user-mode memory requests

memgraph

Analyzes and visualizes memory usage.

sysak memgraph [ option ]
OptionDescription
-gShow the memory usage chart
-fShow page cache details
-aShow anonymous memory details
-kCheck for memory leaks
-lShow memory usage by system threads
-cShow memory usage by system cgroups

I/O issues

iosdiag

Diagnoses I/O latency and I/O hang conditions.

sysak iosdiag [ options ] subcmd [ cmdargs ]

Options:

OptionDescription
-u urlUpload diagnostic logs to the specified URL using curl. Logs are not uploaded if omitted.
-s latency&#124;hangdetectStop diagnostics for the specified subcommand

Subcommands:

SubcommandDescription
latencyEnable I/O latency diagnostics
hangdetectEnable I/O hang diagnostics
-hShow supported parameters (use after a subcommand)

iofsstat

Collects disk I/O information at process and file granularity.

sysak iofsstat [-h] [-T TIMEOUT] [-t TOP] [-u UTIL_THRESH] [-b BW_THRESH] [-i IOPS_THRESH] [-c CYCLE] [-d DEVICE] [-p PID] [-j] [-f]
OptionDescription
-T TIMEOUTRun duration (seconds)
-t TOPNumber of top I/O-consuming disks to display
-u UTIL_THRESHI/O utilization threshold. Disks below this threshold are ignored.
-b BW_THRESHBandwidth threshold. Disks below this threshold are ignored.
-i IOPS_THRESHIOPS threshold. Disks below this threshold are ignored.
-c CYCLERefresh interval (seconds)
-d DEVICEDisk name to monitor
-p PIDProcess ID to monitor
-j, --jsonOutput in JSON format
-f, --fsMonitor and report partition information

Network issues

pingtrace

Detects and traces network latency.

sysak pingtrace [ options ]
OptionDescriptionDefault
-v, --versionShow the version number-
-h, --helpShow help information-
-s, --serverRun in server mode-
-c, --client ipRun in client mode-
-C, --count UINTNumber of probe packetsUnlimited
-i <interval_us>Packet send interval (microseconds)-
-t <UINT>Program run duration (seconds)-
-m, --maxdelay usPing latency threshold. Only packets exceeding this latency are recorded.0
-b <INT=556>Probe packet size in bytes. Must be greater than 144.556
--log TEXT=./pingtrace.logLog file name./pingtrace.log
--logsize INTMaximum log file size-
--logbackup INT=3Maximum number of log file backups3
--mode auto/pingpong/compactPingTrace running mode-
-o, --output image/json/log/imagelogOutput format-
-n, --namespaceCheck net namespace information-
--nslocalIndicate that client and server run on the same host (prevents redundant data when checking namespaces)-
--userid UINTAssign different user IDs per host to help resolve time desynchronization-
--debugShow debugging information (such as libbpf data)-

skcheck

Checks for TCP and socket leaks.

sysak skcheck [ options ] [ cmdargs ]
OptionDescriptionDefault
-sEnable leak detection-
-iSocket enable threshold2000
-lSocket disable threshold500

Performance analysis

numa_access

Shows process information for a specified PID and Non-Uniform Memory Access (NUMA) information for a CPU.

sysak numa_access [ options ] [ cmdargs ]
OptionDescription
-p <pid>Specify a process ID
-c <cpu>Specify a CPU
-i <time>Set a display interval

hw_event

Shows container hardware events.

The hw_event tool requires a separate component download. Run sysak list -a to check whether this tool is installed.
sysak hw_event [ options ] [ cmdargs ]
OptionDescriptionDefault
-c <name>Name of a container. If omitted, hardware events for all containers are displayed.All containers
-s <time>Run duration (seconds)5

syscall_slow

Analyzes lock contention among application threads when system call response times are slow.

sysak syscall_slow [-t THRESH(ms)] [-n sys_NR] <[-c COMM] [-p tid]> [-f LOGFILE] [duration(s)]
OptionDescriptionDefault
-t THRESHSystem response time threshold (milliseconds). Events exceeding this value are recorded.10
-n sys_NRExclude specified system call IDs from tracing. Traces all system calls if omitted.All
-c COMM / -p tidSpecify a task name or process ID. Required. Cannot specify both.Required
-f LOGFILELog file path/var/log/sysak/syscall_slow/syscall_slow.log
durationProgram run duration (seconds). Runs indefinitely if omitted.Indefinite

Lock contention

ulockcheck

Analyzes lock contention among application threads.

The ulockcheck tool requires a separate component download. Run sysak list -a to check whether this tool is installed.
sysak ulockcheck -p <pid> | -s <thread pid> | -a | -t <0|1> | -d
OptionDescription
-p <pid>Monitor lock contention among threads of a specified process
-aShow the current lock owner and the top five lock requesters
-s <thread pid>Show lock contention status for a monitored thread
-t <0&#124;1>Enable output. If a thread waits for a lock for more than 100 milliseconds, display the user-mode call stack.
-dDisable monitoring

Virtualization

kvmexittime

Traces and diagnoses VM-exit events.

sysak kvmexittime [--help] [-p PID] [-t TID] [interval]
OptionDescription
-p <PID>Specify a process ID
-t <TID>Specify a thread ID
intervalInterval for tracing and analyzing VM-exit events
--helpShow help information