Difference between revisions of "INFO: task blocked for more than 120 seconds."

From Helpful
Jump to: navigation, search
m
m
Line 5: Line 5:
 
  INFO: task nfsd:2252 blocked for more than 120 seconds.
 
  INFO: task nfsd:2252 blocked for more than 120 seconds.
 
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 
  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
...probably followed by a call trace that mentions your filesystem, and probably io_schedule and sync_buffer. Don't worry about how serious that looks, this message is purely informational {{comment|(unless you set sysctl_hung_task_panic, in which case your host is now panicked)}}, but still probably something you want to do something about.
+
...probably followed by a call trace that mentions your filesystem, and probably io_schedule and sync_buffer.  
 +
 
 +
Don't worry about how serious such a trace looks, this message is purely informational {{comment|(unless you set sysctl_hung_task_panic, in which case your host is now panicked)}}, but still probably something you want to do something about.
  
  

Revision as of 02:12, 23 August 2012

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Under heavy IO load on servers you may see something like:

INFO: task nfsd:2252 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

...probably followed by a call trace that mentions your filesystem, and probably io_schedule and sync_buffer.

Don't worry about how serious such a trace looks, this message is purely informational (unless you set sysctl_hung_task_panic, in which case your host is now panicked), but still probably something you want to do something about.


The code for this sits in hung_task.c, which is a kernel thread that detects tasks stuck in D state (basically meaning waiting for IO so long it hasn't been scheduled for 120 seconds (default)). The code is relatively new, added somewhere around 2.6.30.


Notes:

  • Presumably most likely to happen for a process that was ioniced into the idle class, in which case this is the intended or at least expectable behaviour for that process under heavy IO load
  • if not, this can easily mean your IO system is slower than your IO use -- often specifically caused by overhead, such as that from head seeking
  • tweaking the linux io scheduler for the device may help (See Computer hard drives#Drive_specifics)
    • if your load is fairly sequential, you may get some relief from using the noop io scheduler (instead of cfq
    • if it's relatively random upping the queue depth may help
  • if it happens nightly, it's probably some cron job, and load from something like updatedb.
  • if it happens on a fileserver, you may want to consider spreading to more fileservers, or using a parallel filesystem
  • NFS seems to be a common culprit, probably because it's good at filling the writeback cache, something which implies blocking while writeback happens - which is likely to block various things related to the same filesystem. (verify)