INFO: task blocked for more than 120 seconds.: Difference between revisions

Latest revision as of 15:10, 14 July 2023

Redirect to:

Some explanation to some errors and warnings#INFO: task blocked for more than 120 seconds.

@@ Line 1: / Line 1: @@
+#redirect [[Some_explanation_to_some_errors_and_warnings#INFO:_task_blocked_for_more_than_120_seconds.]]
-Under heavy IO load on servers you may see something like:
- INFO: task nfsd:2252 blocked for more than 120 seconds.
- "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
-...probably followed by a call trace that mentions your filesystem, and probably io_schedule and sync_buffer.
-'''This message is not an error''', it's telling you that this process has not been scheduled on the CPU ''at all'' for 120 seconds, because it was in [[uninterruptable sleep]] state. {{comment|(The code behind this message sits in <tt>hung_task.c</tt> and was added somewhere around <tt>2.6.30</tt>. This is a kernel thread that detects tasks that stays in the [[D state]] for a while)}}
-'''At the same time''', 120 real-world seconds is an ''eternity'' for the CPU, and most programs, and most users.
-Not being scheduled for that long typically signals resource starvation, usually IO, often some disk API.
-Which means you usually don't want to silence or ignore that message,
-because you want to find out when and why this happened, and probably avoid it in the future.
-The stack trace can help diagnose what it was doing. {{comment|(which is not so informative of the ''reason'' -
-the named program is often the victim of another one misbehaving, though it is sometimes the culprit)}}
-Reasons include
-* the system is heavily [[swapping]], possibly to the point of [[trashing]], due to memory allocation issues
-: could be any program
-* the underlying IO system is very slow for some reason
-:: I've seen mentions of this happening in VMs that share disks
-* specific bugs (in kernel code, systemd) have caused this as a side effect
-<!--
-{{comment|(...though you can explicitly set <tt>sysctl_hung_task_panic</tt>, in which case your host is now panicked)}}
--->
-Notes:
-* if it happens constantly your IO system is slower than your IO use
-* can happen '''to''' a process that was [[ionice]]d into the idle class,
-: which means ionice is working as intended, because idle-class is meant as an extreme politeness thing. It just indicates something else is doing a consistent bunch of IO right now (for at least 120 seconds), and doesn't help find the actual cause
-: e.g. [http://en.wikipedia.org/wiki/Locate_%28Unix%29 updatedb], which may be the recipient if it were ioniced
-* if it happens only nightly, look at your cron jobs
-* a [[trashing]] system can cause this, and then it's purely a side effect of program using too more memory than there is RAM
-* being blocked by a desktop-class drive with bad sectors (because they retry for a long while)
-* NFS seems to be a common culprit, probably because it's good at filling the writeback cache, something which implies blocking while writeback happens - which is likely to block various things related to the same filesystem. {{verify}}
-* if it happens on a fileserver, you may want to consider spreading to more fileservers, or using a parallel filesystem
-* tweaking the linux io scheduler for the device  may help  (See [[Computer_data_storage_-_General_%26_RAID_performance_tweaking#OS_scheduling]])
-: if your load is fairly sequential, you may get some relief from using the <tt>noop</tt> io scheduler (instead of <tt>cfq</tt>) though note that that disables [[ionice]])
-: if your load is relatively random, upping the queue depth may help
-[[Category:Unices]]
-[[Category:Warnings and errors]]

INFO: task blocked for more than 120 seconds.: Difference between revisions

Latest revision as of 15:10, 14 July 2023

Navigation menu