INFO: task blocked for more than 120 seconds.

From Helpful
Revision as of 01:08, 21 August 2012 by Helpful (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Under heavy IO load on servers you may see something like:

INFO: task nfsd:2252 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

...probably followed by a call trace that mentions your filesystem, and probably io_schedule and sync_buffer.


The code for this sits in hung_task.c, which is a kernel thread that detects tasks stuck in D state (basically meaning waiting for IO so long it hasn't been scheduled for 120 seconds (default)). The code is relatively new, added somewhere around 2.6.30.


The message is purely informational (unless you set sysctl_hung_task_panic, in which case your host is now panicked), but still probably something you want to do something about.


Notes:

  • if the mentioned process was ioniced as idle, this may be intended or at least expectable behaviour under load
  • if not, this often means your IO system is slower than your IO use -- often specifically caused by overhead, such as that from head seeking
  • tweaking the linux io scheduler for the device may help (See Computer hard drives#Drive_specifics)
    • if your load is fairly sequential, you may get some relief from using the noop io scheduler (instead of cfq
    • if it's relatively random upping the queue depth may help
  • if it happens nightly, it's probably some cron job, and load from something like updatedb.
  • if it happens on a fileserver, you may want to consider spreading to more fileservers, or using a parallel filesystem
  • NFS seems to be a common culprit, probably because it's good at filling the writeback cache, something which implies blocking while writeback happens - which is likely to block various things related to the same filesystem. (verify)