Troubleshooting when windows spontaneously reboots

From Helpful
Jump to navigation Jump to search
Some fragmented windows-related notes (mostly admin stuff)

Windows admin notes

Windows notes - health and statistics|

Special windows variables · special windows folders

Command line windows

Windows and links

Windows user interface tweaks

Troubleshooting when windows spontaneously reboots

Making your windows installation smaller

something taking 100% CPU on windows

TODO: Cleanup

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Bluescreens

Not seeing a bluescreen

WHEA_UNCORRECTABLE_ERROR

Windows Hardware Error Architecture (WHEA)[1] amounts to specific hardware reporting issues to windows.

This can include recoverable things driver issues, software conflicts, hardware issues.


WHEA in itself incudes things that are solvable in the moment, but if you see this bluescreen you have hit one that is not.

This may still include things that can be solved with you later, such as overheating hardware (monitor temperatures next time, and/or cool better), overclocking issues (don't overclock) bad drivers (uninstall, update/reinstall), software conflicts (harder to figure out what, though).

STOP 7B

What and why

STOP 0x0000007B means Inaccessible boot device.

As microsoft puts it, "The "Stop 7B" error occurs when your configuration is missing a component that is required to boot your device. Examples of these components include the PCI bus and the IDE controller." [2]


In practice, it seems to occur most often in situations where...

  • a motherboard is replaced (since this means replacing the IDE controller with one windows is not yet aware of)
  • sometimes when you move hard disk to a different controller

If Windows can't figure out how to get to the boot device (the one which stores the windows system drive, presumably), it will give a bluescreen.


(If it happens in XP's setup, it's probably because your controller is in AHCI mode. Up to SP2, XP's setup will only work in IDE mode)



Possible solutions


  • If all you did to cause this is move your windows drive from one working controller to another, you may be able to resolve this by booting the old way, changing the controller driver to a generic one for the one boot, after which it will probably automatically install the better driver again.
I'm not sure, though - I have no idea how the windows boot process works, and I


  • Clean reinstall
A full reinstall always works, but backing up data and reinstalling programs obviously takes time.
  • In-place update (but this is XP only(verify))
  • Anticipate
One proposed solution suggest you replace the drivers with generic ones before the change (or just to change the driver to the generic one, for IDE "Standard Dual-Channel PCI IDE Controller"), but that requires a running system, i.e. the original hardware or identical, which is a pretty pointless suggestion if the motherboard is broken and you're not in a big company with identical hardware just lying around.


Unsorted:

There is also a suggestion that includes not using UDMA (one possible reason is having a controller that has a newer DMA mode that the driver didn't know about, but the BIOS enables/uses because it and the drive can use it) by either forcing PIO mode or using a 40-pin cable. This did not work for me, so it may be random baloney, or maybe it only works for revisions of the same controller chipset.

See also

STOP 7E

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

A bluescreen mentioning STOP 0x0000007E, and a driver filename.

what and why

This seems to mean a code exception that happened within a driver, and was not handled.

This means the driver assumed something specific could not happen, or is otherwise buggy.

Testing and solutions

If this was in response to doing something unusual (like unplugging something), it may be the driver can't handle that properly, it may be easy to check this by trying it intentionally, you can work around it by not doing that, and probably want to complain to the vendor.


If you're not sure what caused it, think of hardware and/or drivers you installed recently - though the driver filename is usually a good indication.


The easiest way is often to find a better driver (perhaps check that it's a known problem).

If you want to uninstall the driver may have to do so in safe mode.

If windows automatically installs this failing driver, you may want to remove the according hardware to check that the driver (or even hardware) really is the problem. You could try to narrow down the information it uses to do the automatic install, and remove it and the specific driver, so at least it doesn't automatically install the bad version. Assuming this isn't too hard and that it's the driver that's really at fault, of course.

STOP F4 and STOP 77

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
STOP 0x000000F4 (CRITICAL_OBJECT_TERMINATION)
STOP 0x00000077 (KERNEL_STACK_INPAGE_ERROR)
STOP 0x0000007A (KERNEL_DATA_INPAGE_ERROR)

What and why

Apparently triggered by anything that can stop a page file operation from being successful (including 'resume from hibernation'(verify)).

Usually some hardware trouble. Various cases I've seen this in were related to hard drives.


These bluescreens often point to one of:

  • a failing hard drive (check SMART info, and do a drive check that tries to locate bad sectors)
  • temporary failure, such as a bad cable or badly seated cable, an overheating drive, or whatnot.
  • a marginal power supply could cause this
  • bad drivers (upgrading or downgrading may help)
  • motherboard / hard drive combinations that don't like each other






STOP D1 and STOP 0A (DRIVER_IRQL_NOT_LESS_OR_EQUAL)

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
STOP 0x0000000A IRQL_NOT_LESS_OR_EQUAL
STOP 0x000000D1 IRQL_NOT_LESS_OR_EQUAL


The general answer is "Some hardware, or the driver talking to it, is misbehaving."

Caused by various things, including:

  • a specific misbehaving driver
    • regularly video card drivers (try updating the drivers, or if you have the latest, using a slightly older version)
    • There are various specific driver bugs recorded at microsoft support, related to ACPI, USB, SCSI-emulation, Pentium M(obile) CPUs, a specific windows firewalling bug. Most of those won't be your problem because windows update has resolved these already (this is just to give you an idea).
  • Harware being run faster and/or hotter than it can stably deal with
    • Maybe CPU, GPU
    • sometimes specifically interactions under strain
    • Sometimes some hardware/BIOS setting that interacts badly under some specific conditions
    • Very cheap memory may sometime be stable when run a little slower than its rated speed

Note that memory errors and CPU overheating can cause a crash even if a driver is not buggy - even if it mentions a driver, just because of timing.


Diagnosis/fixing:

  • If it occured fairly quickly after an update or driver install, it's likely a bad driver.
    • The simplest answer is to remove the hardware
    • the better one to find out whether this is a known problem and if there is a known solution. Google for it - if it's a specific driver then chances are there are other people complaining/asking in some forum or other. Don't expect a good solution there. Forums are not known for their singular explanations or simple fixes, but it happens.
  • If it happens when you're playing a game, doing video encodes, or something else CPU and/or CPU-intensive.
    • To see if it is reproducable under strain, you can try prime95 for your processor, furmark for your GPU, and possibly try memtest86 to test your memory.


See also:

STOP 124

What and why

Machine Check Exception.

Which is a CPU detecting some hardware (in or near the CPU) did something verifiably wrong.

See On MCE for more detail.


Possible solutions

STOP C000021A

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


C000021A itself just means 'fatal system error' and can indicate various distinct problems - but apparently(verify) all related to winlogon.exe or csrss.exe


One of them:

Autochk program not found  skipping AUTOCHECK

followed by

STOP: c000021a {Fatal System Error}
The Session Manager Initialization system process terminated unexpectedly
with a status of 0xc000003a (0x00000000 0x00000000).

Apparently related to partition table (changes), seemingly to the partition type of the windows installation, particularly incorrect changes.



Another is related to winlogon.exe crashing -- and perhaps more commonly, a GINA extension failing. You may see a dialog mentioning:

The Windows Logon Process system process terminated unexpectedly 

...when the kernel notices (which may be at startup time). It will bluescreen at shutdown.


On MCE

Machine Check Events are reports from the CPU of an uncorrectable inconsistency in or or near the CPU (includes cache management, and some memory management).

In Windows, MCEs are received by WHEA (Windows Hardware Error Architecture). (Used to be called MCA)


Fatal MCEs point to hardware issues

  • may be caused by hardware that is broken/flaky
  • overclocking CPU and/or RAM can cause them to become flaky
  • If this only happens under full CPU or GPU load, you also want to check that
    • they are not overheating (check its specs on what it's comfortable with, and check with tools like Core Temp, SpeedFan, or such - there are many)
    • your PSU can deliver enough power to them (not often an issue since many people overestimate the amount of power their computer takes)
    • and note that insufficient testing and incorrect CPU binning (basically meaning it's sold as something it can't stably do) still happens, meaning that some CPUs may only be stable when underclocked, and/or with cores disabled, and may even be duds.

Tests you can do:

  • If you suspect RAM, run something like memtest86 (more thorough than something like Prime95)
  • if you suspect overheating of GPU, run a stress test (e.g. FurMark)
  • if you suspect overheating of CPU, run a stress test (e.g. Prime95)
  • if you suspect bad binning, underclock it and run a stress test
  • if you suspect power issues, you can estimate by finding the TDPs of CPU and GPU, but since it's sometimes hard to figure how much a PSU lies about its rating, you can also just try to see if fails only when running high loads on GPU and CPU (e.g. stress tests) at the same time (but not each individually)