Machine Check Events

From Helpful
Jump to navigation Jump to search
This article/section is a stub — probably a pile of half-sorted notes and is probably a first version, is not well-checked, so may have incorrect bits. (Feel free to ignore, or tell me)

Machine check exceptions refer to faults that the processor detects and signals.

Which will frequently be about faulty hardware.

Whether it's a warning or error varies.

You'll probably see more warnings, if only because they will get logged in a still-running system, while various (fatal) errors hang the system, and at best be shown on screen only at that moment.

You're probably here because you saw syslog entries like:

[Hardware Error]: Machine check events logged

For more detail, look at things like the mcelog package, and its logfile, e.g. /var/log/mcelog

These are often warnings, but often also warnings you want to know about.

For example, in my case the CPU was being throttled because it was overheating (~90C).

See also:

CMCI storm detected: switching to poll mode

See Linux_admin_notes_-_health_and_statistics#EDAC