Difference between revisions of "INFO: task blocked for more than 120 seconds."

From Helpful
Jump to: navigation, search
m
m
Line 36: Line 36:
 
The D state is uninterruptible sleep (TASK_UNINTERRUPTIBLE), is a deeper sleep than interruptible (which allows signaling). Uninterruptible sleep is generally only used when signals cannot be handled (signal handling is userspace code) such as when a process is waiting on device IO.
 
The D state is uninterruptible sleep (TASK_UNINTERRUPTIBLE), is a deeper sleep than interruptible (which allows signaling). Uninterruptible sleep is generally only used when signals cannot be handled (signal handling is userspace code) such as when a process is waiting on device IO.
 
-->
 
-->
 +
 +
[[Category:Unices]]
 +
[[Category:Warnings and messages]]

Revision as of 14:47, 12 September 2012

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Under heavy IO load on servers you may see something like:

INFO: task nfsd:2252 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

...probably followed by a call trace that mentions your filesystem, and probably io_schedule and sync_buffer.

Don't worry about how serious such a trace looks, this message is purely informational (unless you set sysctl_hung_task_panic, in which case your host is now panicked), but still probably something you want to do something about.


The code for this sits in hung_task.c The code is relatively new (added somewhere around 2.6.30?). It is a kernel thread that detects tasks that stays in the D state for a while, basically meaning it is waiting for IO. It complains when it sees a process has been waiting on IO so long that the whole process has not been scheduled for 120 seconds (default).


Notes:

  • most likely to happen for a process that was ioniced into the idle class, in which case this this message indicates intended or at least expectable behaviour for that process under constant IO load
  • if not, this can easily mean your IO system is slower than your IO use -- often specifically caused by overhead, such as that from head seeking.
  • tweaking the linux io scheduler for the device may help (See Computer hard drives#Drive_specifics)
    • if your load is fairly sequential, you may get some relief from using the noop io scheduler (instead of cfq - though note that that disables ionice)
    • if it's relatively random upping the queue depth may help
  • if it happens nightly, it's probably some cron job, and load from something like updatedb.
  • if it happens on a fileserver, you may want to consider spreading to more fileservers, or using a parallel filesystem
  • NFS seems to be a common culprit, probably because it's good at filling the writeback cache, something which implies blocking while writeback happens - which is likely to block various things related to the same filesystem. (verify)