Some explanation to some errors and warnings: Difference between revisions

From Helpful
Jump to navigation Jump to search
Line 417: Line 417:


=Windows=
=Windows=
==Display Driver Stopped Responding and Has Recovered==
{{stub}}
...in Vista and Win7.
The source of this message is TDR (Timeout Detection and Recovery), a watchdog that triggers when a driver doesn't finish an operation in (by default) two seconds.
Windows will assume the (graphics card) driver is hanging, will reset the graphics subsystem, so that you don't have to restart the computer to recover.
When you see this:
Updating your video card drivers may well help -- because chances are its maker recently got a lot of bug reports and fixed it. <!-- Specific rather than generic drivers are a good idea anyway, or at least be tuned to the presence of TDR.-->
While two seconds is a ''lot'' (in terms of drivers), computers that are old, slow, and extremely busy may occasionally see false triggers.
You can change some registry values to set the timeout higher, or even disable TDR (I wouldn't recommend it - waiting longer is still handier than a hard computer reset).
See also:
* http://msdn.microsoft.com/en-us/windows/hardware/gg487368.aspx
[[Category:Windows]]
[[Category:Warnings and errors]]
==The application failed to initialize properly==
==The application failed to initialize properly==
{{stub}}
{{stub}}

Revision as of 16:33, 14 July 2023

General

0x80004005 (NS ERROR FAILURE) and other firefox errors

...is a very general-purpose error (not even just in mozilla)


When scripting triggers it, then the error also usually points to a more specific problem. To figure out what failed, look for the origination function.


For example, in:

[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE)
[nsIDOMHTMLSelectElement.selectedIndex]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" 
location: "JS frame ::

...it's nsIDOMHTMLSelectElement.selectedIndex, a good thing to search for.

This particular error was caused by trying to set an out-of-bounds selectedIndex on a <SELECT> drop-down (and is an example from [1]).


nsIXMLHttpRequest.*

XmlHTTPRequest-related errors will usually occur in either nsIXMLHttpRequest.open or nsIXMLHttpRequest.send

The specific error is often the best indication of the problem.

The actual problem to fix (often in the scripting logic) is regularly one of:

  • stale references, often between documents (think (popup) windows, frames, iframes, and embeds), or the browser got confused about when it should clean up. For example, if the document an XHR was created in does not exist anymore, or using an old reference to a new embedded plugin object (e.g. video)
  • that you violated the specs or did something nonsensical, e.g. trying to send() more than once(verify) or trying to set headers after sending data (verify)
  • Use of firebug:
    • If you are using firebug, you should be aware that there are older versions that did not handle XHR in frames well. Firebug would be the sole cause of the last mentioned error.
    • triggering XHR from the firebug command line is sandboxed, and may cause this in certain cases
  • trying cross-domain XHR, or using an absolute URL (verify)


You could even look at the Firefox XHR source code, e.f. for send(), to see what cases trigger this error.


TODO: Read:


Unsorted

0x80040111 (NS ERROR NOT AVAILABLE)

Reason

The direct reason is a missing object attribute - you are probably expecting an attribute to be present on an object when it is not always.


When this happens around an XMLHttpRequest, one of the likeliest causes around is an onerror handler that tries to read the result's status (or statusText).

The W3 specs tell you that you shouldn't try to read status in the onerror handler because for some problems it may not be set, and that accessing it must (!) then lead to an exception raise.

In other words, this error is then correct behaviour.

This error is more specific to Gecko (Firefox, Mozilla) because it adheres to those specs closer to the letter in this regard. So for portable code you want to adhere to that always.


The underlying cause is often that the browser never got a response with a HTTP status to parse out, for example because:

  • a connection broke before receiving a response at all, e.g. because of some connectivity problem
  • a request was broken off on the client (possibly specifically because of the next reason:)
  • an ongoing AJAX call is canceled by page unload
    • (another somewhat common form of this is when you trigger AJAX from an input form that also causes a page-reload-style submission of the calling page (often a form with a submit-type button))
  • or sometimes a (seriously) malformed server response, such as
    • deformed data made by a dynamic script
    • no data at all (no HTTP headers, no body)

Fix

If this happens in your own handler code, and you can't or don't want to remove the status check, the simplest workaround is usually to wrap this read in a try-catch, since the error handling would often be "oh well, forget it then" code anyway.

If you use XHR from some library (and implicitly its error handler), it's a bug in that library that has not yet been fixed, so search around for a newer version, bother its creator, and/or fix it yourself.

If such a library it lets you write your own callbacks and its documentation didn't warn you about this, you might wish to bother them about that - it's nice to be able to have code that can react to this if and when it chooses to.


When caused by the form submission problem, you can usually avoid it. One solution is to use only a <button>, <input type="button">, or anything else that looks clickable enough but does not submit the form (like a submit button would), so that the only event is the AJAXing.

(A somewhat more hackish solution is to omit the <form>, so that a submit-type button wouldn't know where to go, so won't do anything -- but this may not work so predictably across all browsers.)


See also


*nix

INFO: task blocked for more than 120 seconds.

Under heavy IO load on servers you may see something like:

INFO: task nfsd:2252 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

...probably followed by a call trace that mentions your filesystem, and probably io_schedule and sync_buffer.


This message is not an error, it's telling you that this process has not been scheduled on the CPU at all for 120 seconds, because it was in uninterruptable sleep state. (The code behind this message sits in hung_task.c and was added somewhere around 2.6.30. This is a kernel thread that detects tasks that stays in the D state for a while)


At the same time, 120 real-world seconds is an eternity for the CPU, and most programs, and most users.

Not being scheduled for that long typically signals resource starvation, usually IO, often some disk API. Which means you usually don't want to silence or ignore that message, because you want to find out when and why this happened, and probably avoid it in the future.


The stack trace can help diagnose what it was doing. (which is not so informative of the reason - the named program is often the victim of another one misbehaving, though it is sometimes the culprit)


Reasons include

  • the system is heavily swapping, possibly to the point of trashing, due to memory allocation issues
could be any program
  • the underlying IO system is very slow for some reason
I've seen mentions of this happening in VMs that share disks
  • specific bugs (in kernel code, systemd) have caused this as a side effect






Notes:

  • if it happens constantly your IO system is slower than your IO use
  • can happen to a process that was ioniced into the idle class,
which means ionice is working as intended, because idle-class is meant as an extreme politeness thing. It just indicates something else is doing a consistent bunch of IO right now (for at least 120 seconds), and doesn't help find the actual cause
e.g. updatedb, which may be the recipient if it were ioniced
  • if it happens only nightly, look at your cron jobs
  • a trashing system can cause this, and then it's purely a side effect of program using too more memory than there is RAM
  • being blocked by a desktop-class drive with bad sectors (because they retry for a long while)


  • NFS seems to be a common culprit, probably because it's good at filling the writeback cache, something which implies blocking while writeback happens - which is likely to block various things related to the same filesystem. (verify)
  • if it happens on a fileserver, you may want to consider spreading to more fileservers, or using a parallel filesystem


if your load is fairly sequential, you may get some relief from using the noop io scheduler (instead of cfq) though note that that disables ionice)
if your load is relatively random, upping the queue depth may help


BUG: soft lockup - CPU 1 stuck for 11s

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
BUG: soft lockup - CPU#1 stuck for 11s! [nameofwhatwasblocked]

...with varying CPU number, length, and process name.

Will be followed by more process details, such as the PID, the modules loaded in, and the execution state of the process (registers and call trace).


Despite the call trace, this message is itself an informational warning, not an error or crash.


That said, it's roughly saying that the scheduler didn't get it to budge towards doing anything at all for a couple billion cycles, and that is probably something you want to understand the cause of. And avoid in the future if you can.


One possible cause is (extremely overcommitted memory leading to) swapping to the point of trashing, basically meaning the CPU has to wait for disk before a scheduled process can properly continue.


It looks like certain buggy drivers (e.g. ndiswrapper stuff) can be reported as the process blocked waiting for that driver.

Specific bits of hardware issues could do the same, broken or interacting in a non-conforming way (noapic sometimes helps, bios update can help)

Argument list too long

...in a linux shell, often happens when you used a * somewhere in your command.


The actual reason is a little lower level: Shells will expand shell globs before it executes a command, so e.g. cp * /backup/ actually might happen to expand to a long list of files.

Either way, it may create a very large string to be handed to the exec().


You get this error when that argument list is too long for the chunk of kernel memory reserved for passing such strings - which is hard-coded in the kernel (MAX_ARG_PAGES, usually something like 128KB).

You can argue it's a design flaw, or that it's a sensible guard against a self-DoS, but either way, that limit is in place.


There are various workable solutions:

  • if you meant 'everything in a directory', then you can often specify the directory and a flag to use recursion
  • if you're being selective, then find may be useful, and it allows doing things streaming-style, e.g.
find . -name '*.txt' -print0 | xargs -0 echo (See also find and xargs)
  • Recompiling the kernel with a larger MAX_ARG_PAGES - of course, you don't know how much you'll need, and this memory is permanently inaccessible for anything else so just throwing a huge number at is is not ideal


Note

  • that most of these split the set of files into smaller sets, and execute something for each of these sets. : In some cases this significantly alters what the overall command does.
You may want to think about it, and read up on xargs, and its --replace.
  • for filename in `ls`; do echo $filename; done is not a solution, nor is it at all safe against special characters.
ls | while read filename ; do echo $filename; done (specifically for bourne-type shells) works better, but I find it harder to remember why exactly so use find+xargs.


Word too long

A csh error saying that a command is over 1024 charaters long (1024 being the default, as of this writing at least).


Which is usually caused by a long value.

And often specifically by a line like:

setenv PATH ${PATH}:otherstuff

...often specifically PATH or LD_LIBRARY_PATH as they are most easily already 1000ish characters long.

You can check that with something like

echo $PATH | wc -c
echo $LD_LIBRARY_PATH | wc -c


In general, you have a few options:

  • switch to a shell that doesn't have this problem
  • recompile csh with a larger BUFSIZE
  • figure out the specific cause
    • typically: clean the path of long or unnecessary or duplicate entries

Warning: "MAX NR ZONES" is not defined

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

This is related to a (automatically generated) file called bounds.h in your kernel source, and probably means you need to do a make prepare in your /usr/src/linux.


Wrong ELF class: ELFCLASS32

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Usually means a 64-bit app is trying to load a 32-bit library.

In my case, a 64-bit binary trying to load a 32-bit .so file, largely because LD_LIBRARY_PATH included the 32-bit but not the 64-bit library directory.

If this is in a compiled application, it may mean you need to compile it from scratch before it notices the right one. (verify)

Could be triggered by LD_PRELOAD tricks.


RTNETLINK answers: Invalid argument

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

...often in response to some use of the ip command, such as ip link set lo up


Cause can vary with the specific command that failed.

From anecdotal evidence, a relatively common cause is interaction between having traffic shaping installed and certain versions of iproute2. If iproute2 is indeed the culprit, a slight upgrade or downgrade may fix it temporarily (usually enough until it is fixed in the next package version).


Authentication token lock busy

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


This usually happens when

  • you are trying to change a password
  • ...while the filesystem that contains /etc/passwd and /etc/shadow (usually the root filesystem) is mounted read-only.


...for example because you booted up using the init trick, are in in some maintenance mode/runlevel, or intentionally made the root filesystem read-only.


If you're sure you want to (e.g. this isn't triggered by an automatic remount-readonly to prevent filesystem corruption), then you can do an in-place re-mount of a filesystem, which lets you change between read-only and read-write.

In the case of the root filesystem:

mount -o remount,rw /


LCP: timeout sending Config-Requests

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

In a direct sense, this means that pppd did not receive any LCP configuration requests from the peer, or was unable to agree on LCP parameters. Which doesn't mean much to me either.


If there is something else failing, say, you see "cannot stat() /dev/pts/1" in your logs, you have some other problem on the server end.

If not, it's likely to be pppd negotiation.



Another example: I had a problem connecting to my ISP with a USB ADSL modem. This was not likely to be modem driver trouble, more likely to be something while the connection is established. Fiddle with your ppp peers files. Chances are you're using a stock one that doesn't work with your ISP without some tweaking. In my case the VPI and VCI were incorrect.


See also:

Windows

Display Driver Stopped Responding and Has Recovered

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

...in Vista and Win7.


The source of this message is TDR (Timeout Detection and Recovery), a watchdog that triggers when a driver doesn't finish an operation in (by default) two seconds.

Windows will assume the (graphics card) driver is hanging, will reset the graphics subsystem, so that you don't have to restart the computer to recover.


When you see this:

Updating your video card drivers may well help -- because chances are its maker recently got a lot of bug reports and fixed it.


While two seconds is a lot (in terms of drivers), computers that are old, slow, and extremely busy may occasionally see false triggers.

You can change some registry values to set the timeout higher, or even disable TDR (I wouldn't recommend it - waiting longer is still handier than a hard computer reset).


See also:


The application failed to initialize properly

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


A very generic error, with an error code that tends to tell you more.

Common codes and their likely reasons include:

  • 0xc0000135 - Means the .NET framework is not installed. Download a redistributable. The latest will probably do.


  • 0xC0000005 (STATUS_ACCESS_VIOLATION): A memory access violation, often caused by a program bug, and possibly by DEP (Data execution prevention) being overly strict. If you think it's the latter, you could disable DEP for the program, or completely, to test that theory.


  • 0x0000022 - no read access to a system file, probably because it was copied in badly (bad installer, user), possibly because a virus scanner is blocking it, etc.


See also:


The local device name is already in use

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Windows (XP, 2003, and likely most others) networking may give this for shares under, apparently, a few different conditions, including:

  • Server not reachable (right now/anymore) for reasons such as
    • because of a change in server configuration
    • because firewalls are blocking it
    • name does not resolve (anymore)
  • There is already a mapping to a specific UNC path (possibly some persistent mapping conflicting with manual maps)
  • The drive letter is already used for a mapping(verify)


(verify):

  • The server may have forgotten your credentials while your client is trying to reconnect assuming they are still valid. May happen in cases like:
    • you are reconnecting from a different IP(verify)
    • your side disconnected, e.g. because of a network timeout, e.g. because the computer revived from hibernation(verify)


See also