When time-consuming I/O requests stall, the system can become unstable or go down before you have enough information to diagnose the cause. Alibaba Cloud Linux 2 and Alibaba Cloud Linux 3 extend core kernel data structures to expose per-device hang counts, per-request details, and per-process wait information — all with low overhead. This topic describes the available interfaces and shows how to use them.
How it works
The kernel tracks every I/O request from submission to completion. When a request exceeds the configured hang threshold, the kernel marks it as a hang. Query the interfaces in the following order to diagnose an I/O hang:
Check
/sys/block/<device>/hang— confirm whether I/O hangs have occurred on the device.Check
/sys/kernel/debug/block/<device>/rq_hang— get details about each hanging request.Check
/proc/<pid>/wait_resor/proc/<pid>/task/<tid>/wait_res— identify which process or thread is blocked and what it is waiting for.
Interfaces
| Interface | Description | Unit |
|---|---|---|
/sys/block/<device>/queue/hang_threshold | Query or set the threshold for I/O hangs. Default: 5000. | ms |
/sys/block/<device>/hang | Query the count of I/O operations that exceeded the hang threshold. Output format: <read_count> <write_count>. | — |
/sys/kernel/debug/block/<device>/rq_hang | Query details about hanging I/O requests. | — |
/proc/<pid>/wait_res | Query the resources a process is waiting for. | — |
/proc/<pid>/task/<tid>/wait_res | Query the resources a thread is waiting for. | — |
The following table describes the variables in the interface paths.
| Variable | Description |
|---|---|
<device> | Name of the block storage device |
<pid> | Process ID |
<tid> | Thread ID |
Set the hang threshold
The default hang threshold is 5,000 ms. Adjust it to match your workload's expected I/O latency.
Write the new threshold to the interface. This example sets the threshold on the
vdbdisk to 10,000 ms.echo 10000 > /sys/block/vdb/queue/hang_thresholdVerify the change.
cat /sys/block/vdb/queue/hang_thresholdExpected output:
10000
Check the hang count
Read the hang interface to see how many read and write operations have exceeded the threshold on a device. This example uses the vdb disk.
cat /sys/block/vdb/hangSample output:
0 1The output contains two values separated by whitespace:
| Field | Description |
|---|---|
| Left value | Number of read operations that caused I/O hangs |
| Right value | Number of write operations that caused I/O hangs |
In this output, 0 read hangs and 1 write hang have been recorded.
Inspect hanging requests
Read the rq_hang interface to get details about each hanging request. This example uses the vdb disk.
cat /sys/kernel/debug/block/vdb/rq_hangSample output:
ffff9e50162fc600 {.op=WRITE, .cmd_flags=SYNC, .rq_flags=STARTED|ELVPRIV|IO_STAT|STATS, .state=in_flight, .tag=118, .internal_tag=67, .start_time_ns=1260981417094, .io_start_time_ns=1260981436160, .current_time=1268458297417, .bio = ffff9e4907c31c00, .bio_pages = { ffffc85960686740 }, .bio = ffff9e4907c31500, .bio_pages = { ffffc85960639000 }, .bio = ffff9e4907c30300, .bio_pages = { ffffc85960651700 }, .bio = ffff9e4907c31900, .bio_pages = { ffffc85960608b00 }}Each entry represents one hanging I/O request. The key fields are:
| Field | Description |
|---|---|
.op | I/O operation type, such as READ or WRITE |
.cmd_flags | Command flags for the request, such as SYNC |
.rq_flags | Request state flags, such as STARTED and IO_STAT |
.state | Current state of the request in the block layer, such as in_flight |
.tag | Hardware queue tag assigned to the request |
.internal_tag | Software queue tag assigned internally |
start_time_ns | Time the request was submitted to the block layer (nanoseconds) |
io_start_time_ns | Start time of the I/O request (nanoseconds). A non-zero value means the I/O request was not processed in a timely manner, indicating a prolonged I/O time. |
current_time | Time of the snapshot (nanoseconds). Subtract io_start_time_ns from current_time to get the elapsed hang duration. |
.bio | Address of the associated bio structure |
.bio_pages | Addresses of the pages referenced by the bio |
Identify blocked processes
Read /proc/<pid>/wait_res to find out what resources a process is waiting for. This example checks process 577.
cat /proc/577/wait_resSample output:
1 0000000000000000 4310058496 4310061448The output contains four space-separated fields:
| Field | Description |
|---|---|
| Field 1 | Type of resource the process is waiting for. 1 = page cache in the file system. 2 = block I/O layer. |
| Field 2 | Address of the resource (page cache entry or block I/O layer) the process is waiting for |
| Field 3 | Time at which the process started waiting for the resource |
| Field 4 | Current time when the file is read. Subtract Field 3 from Field 4 to get the total wait time. |
To check a specific thread, use /proc/<pid>/task/<tid>/wait_res with the same output format.