Machine Check Events

From Helpful
Revision as of 14:08, 17 June 2024 by Helpful (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


A machine check exception (MCE) refers to faults that the processor can detect and signal to the OS.

Which will frequently be about faulty hardware.


Whether it's a warning or error varies.


For entirely practical reasons you may see more warnings, if only because for various errors it may be a good idea to halt a still-running system, which will be just before they might get logged, and well before they are subsequently viewed by you.


You're probably here because you saw syslog entries like:

[Hardware Error]: Machine check events logged


For more detail, look at things like the mcelog package, and its logfile, e.g. /var/log/mcelog

These are often warnings, but often also warnings you want to know about.

For example, in my case the CPU was being throttled because it was overheating (~90C).


See also:

http://www.mcelog.org/faq.html


CMCI storm detected: switching to poll mode

See Linux_admin_notes_-_health_and_statistics#EDAC