Some explanation to some errors and warnings

From Helpful
Revision as of 13:50, 6 September 2024 by Helpful (talk | contribs) (→‎INFO: task blocked for more than 120 seconds.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

General

Undefined symbol

A compilation error that roughly means that something was named, but not defined.

You can get this during compilation, during linking (see also Programming_notes/Compiling_and_linking), where it probably means you are forgetting to link one of your own object files, or forgetting to link in a library


If you get it at runtime it's about a shared object (.so, like a .dll), and the likelier


Among command-line tools, nm is probably handiest.


0x80004005 (NS ERROR FAILURE) and other firefox errors

...is a very general-purpose error (not even just in mozilla)


When scripting triggers it, then the error also usually points to a more specific problem. To figure out what failed, look for the origination function.


For example, in:

[Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE)
[nsIDOMHTMLSelectElement.selectedIndex]" nsresult: "0x80004005 (NS_ERROR_FAILURE)" 
location: "JS frame ::

...it's nsIDOMHTMLSelectElement.selectedIndex, a good thing to search for.

This particular error was caused by trying to set an out-of-bounds selectedIndex on a <SELECT> drop-down (and is an example from [1]).


nsIXMLHttpRequest.*

XmlHTTPRequest-related errors will usually occur in either nsIXMLHttpRequest.open or nsIXMLHttpRequest.send

The specific error is often the best indication of the problem.

The actual problem to fix (often in the scripting logic) is regularly one of:

  • stale references, often between documents (think (popup) windows, frames, iframes, and embeds), or the browser got confused about when it should clean up. For example, if the document an XHR was created in does not exist anymore, or using an old reference to a new embedded plugin object (e.g. video)
  • that you violated the specs or did something nonsensical, e.g. trying to send() more than once(verify) or trying to set headers after sending data (verify)
  • Use of firebug:
    • If you are using firebug, you should be aware that there are older versions that did not handle XHR in frames well. Firebug would be the sole cause of the last mentioned error.
    • triggering XHR from the firebug command line is sandboxed, and may cause this in certain cases
  • trying cross-domain XHR, or using an absolute URL (verify)


You could even look at the Firefox XHR source code, e.f. for send(), to see what cases trigger this error.


TODO: Read:


Unsorted

0x80040111 (NS ERROR NOT AVAILABLE)

Reason

The direct reason is a missing object attribute - you are probably expecting an attribute to be present on an object when it is not always.


When this happens around an XMLHttpRequest

...then one of the likeliest causes around is an onerror handler that tries to read the result's status (or statusText).

The W3 specs tell you that you shouldn't try to read status in the onerror handler because for some error paths it may not be set, and that accessing it must (!) then lead to an exception raise.

In other words, this error is then correct behaviour. For portable code you want to adhere to that always.

This error is slightly more specific to Gecko (Firefox, Mozilla) because it adheres to those specs closer to the letter in this regard.


The underlying cause to such the request that failed is often that the browser never got a response with a HTTP status to parse out, for example because:

  • a connection broke before receiving a response at all, e.g. because of some connectivity problem
  • a request was broken off on the client (possibly specifically because of the next reason:)
  • an ongoing AJAX call is canceled by page unload
    • (another somewhat common form of this is when you trigger AJAX from an input form that also causes a page-reload-style submission of the calling page (often a form with a submit-type button))
  • a malformed server response, such as
    • no data at all (no HTTP headers, no body)
    • deformed data made by a dynamic script


Fix

If this happens in your own handler code, and you can't or don't want to remove the status check, the simplest workaround is usually to wrap this read in a try-catch, since the error handling would often be "oh well, forget it then" code anyway.


If you use XHR from some library (and implicitly its error handler), it's a bug in that library that has not yet been fixed, so search around for a newer version, bother its creator, and/or fix it yourself.

If such a library it lets you write your own callbacks and its documentation didn't warn you about this, you might wish to bother them about that - it's nice to be able to have code that can react to this if and when it chooses to.


When caused by the form submission problem, you can usually avoid it. One solution is to use only a <button>, <input type="button">, or anything else that looks clickable enough but does not have the browser submit the form (like a submit button would), so that the only event is the AJAXing.

(A somewhat more hackish solution is to omit the <form>, so that a submit-type button wouldn't know where to go, so won't do anything -- but this may not work so predictably across all browsers.)


See also

*nix

umount: /mount/path: device is busy" and "Device or resource busy while trying to open ...

Means what it says - programs have open handles to files or directories that come from this device.

This may be a shell, often its current directory or the one it was started from.


To see what's open, You can see which processes have things open, in this example for /mount/path, with a command like:

fuser -vm /mount/path

or

lsof | grep /mount/path



Linux ≥2.4.11 lets you do a lazy (-l) umount, which only detaches the the filesystem from the mount point, but only does full cleanup once the open handles are actually closed. If something is misbehaving this may easily be never, and you won't help you do an fsck or mount.


For the case of unreachable NFS there is -f (force).


combined with Input/output error can be a bit of a catch-22, depending on the cause, as it may not let you do the thing that frees it up.

If you try to open a device with fdisk that was partition-scanned (present in /dev) and get something like

Unable to open /dev/sdb

...then the device dropped away, either because it was yanked, or e.g. a RAID driver deciding to do so(verify).


Name or service not known

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

...an error, with an errno of -2.

Typically a host name lookup problem, and one that can have many different causes.


Before digging in to diagnose the problem, check the same thing on a different computer. If any test fails on other computers on the same network, it's likely to be a problem external to you (e.g. DNS server or proxy failing, or such).



Possible causes / things to look at


Nonsense input

If a hostname lookup is being done that makes no sense, for example gethostbyname() on an URL instead of a hostname.


host can't look up itself

If one of the following fails (replacing hostx or your favorite resolving utility):

hostx localhost
hostx `hostname`

...then it's likely your /etc/hosts is broken, or possibly the hostname you have set (there are some extra details to this when you set an FQDN as your hostname).


nameserver / proxy trouble

  • If the above work but things like:
hostx google.com
hostx yourisp.com

...all fail, then it might be DNS trouble - including not having a DNS server set (in *nices, look at /etc/resolv.conf)


nsswitch

  • Another reason for things not to work -- or sometimes for them to have weird patterns of working and not working -- can be that your /etc/nsswitch.conf is malconfigured, or configured to include something that does not work consistently (see in particular its hosts: line).

INFO: task blocked for more than 120 seconds.

Under heavy IO load on servers you may see something like:

INFO: task nfsd:2252 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

...probably followed by a call trace that mentions your filesystem, and probably things like io_schedule and sync_buffer.


This message is not an error, it's telling you that this process has not been scheduled on the CPU at all for 120 seconds, (because it was in uninterruptable sleep state).

It's also trying to hint at what is happening -- the reason it's printing a call trace is to help you diagnose what that process was doing - or at least what process might be involved here (more below).

(The code behind this message sits in hung_task.c and was added somewhere around 2.6.30. This is a kernel thread that detects tasks that stay in the D state for a while)


While not an error, 120 real-world seconds is an eternity for the CPU (tens of billions of instructions later), and pretty long for most programs and most users.

You usually want to find out when this happened, why this happened, and probably avoid it in the future, so you usually don't want to silence or ignore that message.


Note that the call trace is often not so informative of the actual reason, because while sometimes the call trace will be the culprit holding things up, it perhaps more frequently names the victim of something else misbehaving.

Not being scheduled for 120 second typically signals resource starvation, usually IO.

Reasons include

  • the system is heavily swapping, possibly to the point of trashing, due to memory allocation issues
could be any program - but probably one with high RES
  • the underlying IO system is very slow for some reason
I've seen mentions of this happening in VMs that share disks
  • specific bugs (in kernel code, systemd) have caused this as a side effect
  • being blocked by a drive with bad sectors (in particular desktop-class, because they retry for a long while)


Notes:

  • NFS seems to be a common culprit, probably because it's good at filling the writeback cache, something which implies blocking while writeback happens - which is likely to block various things related to the same filesystem. (verify)
  • can happen to a process that was ioniced into the idle class,
which means ionice is working as intended, because idle-class is meant as an extreme politeness thing. It just indicates something else is doing a consistent bunch of IO right now (for at least 120 seconds), and doesn't help find the actual cause
e.g. updatedb, which may be the recipient if it were ioniced
  • if it happens only nightly, look at your cron jobs
  • if it happens on a fileserver, you may want to consider spreading to more fileservers, or using a parallel filesystem
if your load is fairly sequential, you may get some relief from using the noop io scheduler (instead of cfq) though note that that disables ionice)
if your load is relatively random, upping the queue depth may help


BUG: soft lockup - CPU 1 stuck for 11s

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
BUG: soft lockup - CPU#1 stuck for 11s! [nameofwhatwasblocked]

...with varying CPU number, length, and process name.

Will be followed by more process details, such as the PID, the modules loaded in, and the execution state of the process (registers and call trace).


Despite the call trace, this message is itself an informational warning, not an error or crash.


That said, it's roughly saying that the scheduler didn't get it to budge towards doing anything at all for a couple billion cycles, and that is probably something you want to understand the cause of. And avoid in the future if you can.


One possible cause is (extremely overcommitted memory leading to) swapping to the point of trashing, basically meaning the CPU has to wait for disk before a scheduled process can properly continue.


It looks like certain buggy drivers (e.g. ndiswrapper stuff) can be reported as the process blocked waiting for that driver.

Specific bits of hardware issues could do the same, broken or interacting in a non-conforming way (noapic sometimes helps, bios update can help)

(98)Address already in use

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
📃 These are primarily notes, intended to be a collection of useful fragments, that will probably never be complete in any sense.
  • usually also make sock: could not bind to address
  • regularly also mentioning an IP/port combination


In general

Generally means something is already bound and listening on the relevant address+port combination.


But also, it can mean something used it in the last few minutes - and the system still marks it as in use for a few minutes.

Read up on the TIME_WAIT state of sockets to understand why, but the shortish version is that a TCP socket closed by this side will linger in this state for a minute or three, to ensure that possible delayed packets from the other end are properly ignored. This is largely to avoid packets accidental delivery into a different program that took the same port (though this is much less likely than it sounds: it happens only for identical local_ip+local_port+remote_ip+remote_port combinations. A host will typically have at least thousands of ports they cycle through, so it's unlikely to get the same IP on the same port).


Anyway, this is relevant around daemon restart, as they listen on a port they cannot change.

Daemons can can choose to ignore the potential and unlikely problem by using SO_REUSEADDR option while opening the socket.



One way to figure out what that is is:

netstat -lnp | less

...and see whether anything has the port in the 'Local Address'


In Apache

If you get this error trying to start apache (and you've checked there is no other web server), it often means you have more than one Listen in your configuration, meaning apache fails because you told two parts to bind to the same (IP/)port.

Note the special case of 0.0.0.0 (and/or [::] when using IPv6) means 'bind on all interfaces (that are up)', so 0.0.0.0 plus a specific IP is also an effective duplicate.


To fix this, look/grep for Listen in all apache config (including vhost), and check that it's in there only once.

(Double mentions also seem to cause the "Unable to open logs" apache error (perhaps also a double-open?)(verify))


Argument list too long

...in a linux shell, often happens when you used a * somewhere in your command.


The actual reason is a little lower level: Shells will expand shell globs before it executes a command, so e.g. cp * /backup/ actually might happen to expand to a long list of files.

Either way, it may create a very large string to be handed to the exec().


You get this error when that argument list is too long for the chunk of kernel memory reserved for passing such strings - which is hard-coded in the kernel (MAX_ARG_PAGES, usually something like 128KB).

You can argue it's a design flaw, or that it's a sensible guard against a self-DoS, but either way, that limit is in place.


There are various workable solutions:

  • if you meant 'everything in a directory', then you can often specify the directory and a flag to use recursion
  • if you're being selective, then find may be useful, and it allows doing things streaming-style, e.g.
find . -name '*.txt' -print0 | xargs -0 echo (See also find and xargs)
  • Recompiling the kernel with a larger MAX_ARG_PAGES - of course, you don't know how much you'll need, and this memory is permanently inaccessible for anything else so just throwing a huge number at is is not ideal


Note

  • that most of these split the set of files into smaller sets, and execute something for each of these sets. : In some cases this significantly alters what the overall command does.
You may want to think about it, and read up on xargs, and its --replace.
  • for filename in `ls`; do echo $filename; done is not a solution, nor is it at all safe against special characters.
ls | while read filename ; do echo $filename; done (specifically for bourne-type shells) works better, but I find it harder to remember why exactly so use find+xargs.

Word too long

A csh error saying that a command is over 1024 charaters long (1024 being the default, as of this writing at least).


Which is usually caused by a long value.

And often specifically by a line like:

setenv PATH ${PATH}:otherstuff

...often specifically PATH or LD_LIBRARY_PATH as they are most easily already 1000ish characters long.

You can check that with something like

echo $PATH | wc -c
echo $LD_LIBRARY_PATH | wc -c


In general, you have a few options:

  • switch to a shell that doesn't have this problem
  • recompile csh with a larger BUFSIZE
  • figure out the specific cause
    • typically: clean the path of long or unnecessary or duplicate entries


Text file busy

Cannot create regular file: Text file busy

During installation, compilation and such, you may see:

cannot create regular file filename: Text file busy


Most likely, the file you are attempting to replace is an executable, and it is currently being run. (You could check this with fuser or lsof)


Apparently this is cp being careful, figuring this may be a bad idea.

You can tell it that yes, you want to replace it, by using cp -f (force).


That won't have direct effect on the running process, because of OS and filesystem semantics: it mapped the old executable when starting, which will keep existing as a (probably now filenameless) inode until that file is closed (i.e. the process stops).


It could have indirect effects, e.g. if you replace its dynamically loaded dependencies.

It's up to your informed decision whether to force the copy, or not do the copy, kill the process, or whatnot.

bad interpreter: Text file busy

Pretty much the same situation as above.

But often specifically when a script is currently being written to. That is, chances are it's a script you are trying to run, you have it open an editor, saved it, and switched to trying to run it so quickly it wasn't done saving yet. (verify)

RPC: Program not registered

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Likely means the portmapper on port 111 isn't reachable.

Often because it's not installed or is firewalled.


In my case (trying to use NFS) it borked as soon as a connection comes in because of bad error handling. I worked around it by installing a different version.


IRQF DISABLED is not guaranteed on shared IRQs

SIOCADDRT: No such process

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

...related to networking.

Seems to be reported in cases where adding a default route (route add default gw someip) fails.

The reason for this is usually that the default route you specified did not belong to a known network. (I'm guessing route tries to complete the entry to be added by trying to find the device it should route to, by looking up the net the IP belongs to, but finding none.)


If that sounds plausible for your network config, the solution is correcting your network config.


Warning: "MAX NR ZONES" is not defined

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

This is related to a (automatically generated) file called bounds.h in your kernel source, and probably means you need to do a make prepare in your /usr/src/linux.

Wrong ELF class: ELFCLASS32

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Usually means a 64-bit app is trying to load a 32-bit library.

In my case, a 64-bit binary trying to load a 32-bit .so file, largely because LD_LIBRARY_PATH included the 32-bit but not the 64-bit library directory.

If this is in a compiled application, it may mean you need to compile it from scratch before it notices the right one. (verify)

Could be triggered by LD_PRELOAD tricks.

RTNETLINK answers: Invalid argument

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

...often in response to some use of the ip command, such as ip link set lo up


Cause can vary with the specific command that failed.

From anecdotal evidence, a relatively common cause is interaction between having traffic shaping installed and certain versions of iproute2. If iproute2 is indeed the culprit, a slight upgrade or downgrade may fix it temporarily (usually enough until it is fixed in the next package version).


Authentication token lock busy

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


This usually happens when

  • you are trying to change a password
  • ...while the filesystem that contains /etc/passwd and /etc/shadow (usually the root filesystem) is mounted read-only.


...for example because you booted up using the init trick, are in in some maintenance mode/runlevel, or intentionally made the root filesystem read-only.


If you're sure you want to (e.g. this isn't triggered by an automatic remount-readonly to prevent filesystem corruption), then you can do an in-place re-mount of a filesystem, which lets you change between read-only and read-write.

In the case of the root filesystem:

mount -o remount,rw /


LCP: timeout sending Config-Requests

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

In a direct sense, this means that pppd did not receive any LCP configuration requests from the peer, or was unable to agree on LCP parameters. Which doesn't mean much to me either.


If there is something else failing, say, you see "cannot stat() /dev/pts/1" in your logs, you have some other problem on the server end.

If not, it's likely to be pppd negotiation.



Another example: I had a problem connecting to my ISP with a USB ADSL modem. This was not likely to be modem driver trouble, more likely to be something while the connection is established. Fiddle with your ppp peers files. Chances are you're using a stock one that doesn't work with your ISP without some tweaking. In my case the VPI and VCI were incorrect.


See also:


XINE was unable to initialize any audio drivers

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Diagnosing

The first test to do is probably checking whether you can get sound at all, when not using xine.


You probably want to check specifically with the underlying interface that xine is using (and with another, if available). For example, use mplayer with -ao oss and/or -ao alsa.


A fairly likely problem is user permissions (and/or user-specific configuration). Do the checks as the user that the player is using (your user login), and possibly also as root, to see whether there's a difference.

If there is, see if there's a system group called audio. When there is, it's used to control what users may use the audio device, and you need to be part of it. There's usually a GUI way of adding you to it. The command line way is gpasswd -a me sound


(verify):

Otherwise, when using ALSA, you may want to look at its configuration (/etc/asound.conf, ~/.asoundrc).


(verify):

In some cases, the device may be locked, or only one application can access it at a time (almost a fact on OSS, except with emu10k cards and a few others).


(verify):

When the problem is specifically with amarok (and with nothing else using the underling audio interface), there may be some amarok configuration problem. Edit edit (or delete) ~/.kde/share/config/amarokrc and see whether that solves it. Alternatively, use the helix engine (realplayer) instead (or phonon, when that's available already).


Redhat System is booting up. See pam nologin(8)

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

That man page explains:

When a file called /var/run/nologin or /etc/nologin exists, PAM blocks non-root logins, and sends the contents of this text file as an explanation.

This can be used to temporarily block new logins.


The real question is what creates this file, which can include but is not limited to

  • SELinux relabeling during boot
  • some past bugs related to systemd (verify)
https://bugzilla.redhat.com/show_bug.cgi?id=1043212

Windows

Plugged in, not charging

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


Device Cannot Start (Code 10)

Code 10 basically means 'unknown error'. A more specific error may be shown, but regularly isn't.

Unless the Event Log has something more to say -- it seems it usually doesn't -- this and/or updating windows in general (think service packs) is your best bet.


It's often a driver level problem, and if so, it may be a workable solution to upgrade, downgrade, or even just reinstall the driver. Reinstalling is problably done most easily by ununstalling it (in the Device Manager) and letting windows reinstall it, or sometimes by manually installing it, or pointing the windows hardware installer to a specific specific driver (it may be that there is more than one driver you can use - one of them will be preferred, and you may want to try the others).


All that said, the device that made me write this page still isn't working, even after trying five different driver variations (...but then, it's a known problem with what seems to be a cheap knockoff).


See also:

"The error code is 2908"

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.
The installer has encountered an unexpected error installing this package. 
This may indicate a problem with this package. The error code is 2908.

or

Internal Error 2908



Installers seem likely to say this when your .NET framework installation is broken.

Removing and reinstalling .NET usually fixes it. There is probably a less dramatic, more specific fix.

QuickTime and a TIFF (LZW) decompressor are needed to see this picture

Problem

In a Microsoft Office document, you see one of the following:

  • "QuickTime(TM) and a TIFF (LZW) decompressor are needed to see this picture"
  • "QuickTime(TM) and a Photo - JPEG decompressor are needed to see this picture."
  • or translations like "Zur Anzeige wird der Quicktime Dekompressor "TIFF (LZW) benotigt"

...instead of an image.


Reason

A combination of:

  • The Mac version of MS Office, which allows embedding images via Quicktime
  • Apple having omitted certain formats from Quicktime for Windows that are present in Quicktime for Mac (including PICT, possibly the most common problem)

This basically means that you can add images to Office documents on Mac that will not be viewable on non-Macs.


Workarounds

There is no immediate patch to read documents that show this problem, because this isn't fixed in any Quicktime version or patch (that I know of, to date).


One workaround is to go to the Mac you made the document on, and make sure the embedding is done with a more usual/portable image format.

Converting it while in the document may be nontrivial work, but you can often avoid re-embedding and re-positioning the image: Various in-document image-related actions automatically mean conversion to a more usual format as a side effect. When this works it's probably the simplest fix.


If you have no access to a mac, you could choose to extract just the images, which you can do without access to a mac, via a trick: when you tell Office to export to a web page, images are either directly converted to something useful, or in the case of PICT/TIFF, images are saved to .pcz files, which are gzipped PICT files. (You can extract the image from that .pcz using winrar, 7zip, gzip for windows, or something like it).

You should probably rename it to give it the .pict extension. It may still be quite hard to view and convert this file. In my case, the Quicktime for Windows image viewer crashed on it, and photoshop complained that it could only open raster PICTs.

Irfanview managed to view and convert it -- once the optional irfanview plugins were installed.


Further notes:

One page suggests that this is caused specifically by pasting such images into a document - which seems to suggesting that 'Insert image' may embed in a more portable way. I don't know how true that is(verify).


See also


Display Driver Stopped Responding and Has Recovered

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

...in Vista and Win7.


The source of this message is TDR (Timeout Detection and Recovery), a watchdog that triggers when a driver doesn't finish an operation in (by default) two seconds.

Windows will assume the (graphics card) driver is hanging, will reset the graphics subsystem, so that hopefully you don't have to restart the entire computer to recover.


When you see this:

Updating your video card drivers might help -- because chances are its maker recently got a lot of bug reports and fixed it.


While two seconds is a lot (in terms of drivers), computers that are old, slow, and extremely busy may occasionally see false triggers.

You can change some registry values to set the timeout higher, or even disable TDR (I wouldn't recommend it - waiting longer is still handier than a hard computer reset).


See also:

The application failed to initialize properly

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


A very generic error, with an error code that tends to tell you more.

Common codes and their likely reasons include:

  • 0xc0000135 - Means the .NET framework is not installed. Download a redistributable. The latest will probably do.


  • 0xC0000005 (STATUS_ACCESS_VIOLATION): A memory access violation, often caused by a program bug, and possibly by DEP (Data execution prevention) being overly strict. If you think it's the latter, you could disable DEP for the program, or completely, to test that theory.


  • 0x0000022 - no read access to a system file, probably because it was copied in badly (bad installer, user), possibly because a virus scanner is blocking it, etc.


See also:


Insufficient resources exist to complete the requested service

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Often means you have one misbehaving process.


Regularly means that process leaks windows handles (rather than memory or such). To check whether this is the problem, and which process is misbehaving, you can use task manager. You'll have to add the column called something like 'handle count'.

For most sort of processes, a few dozen to a few hundred is normal, for some system processes and services use one or two thousand, and for some large-scale ffil


Apparently processes may be limited to something like 10000 handles, which means that a few types of processes may be limited, but it should be impossible to run the system out of handles.

However, various tests show the limit may also well be 2^24 (16 million), which is enough to strangle the system.

(Having that many handles also affects memory use, but not by that much, so it's more likely that the OS gets in trouble because of loads of handles than because of memory limits)


The local device name is already in use

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Windows (XP, 2003, and likely most others) networking may give this for shares under, apparently, a few different conditions, including:

  • Server not reachable (right now/anymore) for reasons such as
    • because of a change in server configuration
    • because firewalls are blocking it
    • name does not resolve (anymore)
  • There is already a mapping to a specific UNC path (possibly some persistent mapping conflicting with manual maps)
  • The drive letter is already used for a mapping(verify)


(verify):

  • The server may have forgotten your credentials while your client is trying to reconnect assuming they are still valid. May happen in cases like:
    • you are reconnecting from a different IP(verify)
    • your side disconnected, e.g. because of a network timeout, e.g. because the computer revived from hibernation(verify)


See also

CDBOOT: Cannot boot from CD - Code: 5

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

(Windows 7 installer)

Seems to mean the Win7 bootloader does not understand your drive controller, meaning it can't find its own CD/DVD, and gives up.


Mostly happens on computers that are no older than very-approximately 2005 -- the bootloader seems to just assume you have relatively recent and standard hardware.(verify)


Workarounds:

  • Install from USB (but if this is an old enough computer, it might not understand that either)
  • Boot from XP, then run the Win7 installer from there
    • No-brainer, but installing XP only to install Win7 will add ~40 minutes to the whole install process
  • Apparently you can load the DVD from a bootloader like gujin. Means a few extra details related to finding the DVD in the drive, but that's often easy enough
  • There are ways to alter the image so that it'll boot on older hardware
  • sometimes putting the drive on a different controller helps
  • probably more...


That darn IPC error

"The specified network name is no longer available"

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

This Windows error, most commonly received during network data copy, seems to be caused by the underlying network connection dropping, often for just a fraction of a second, sometimes more permanently.


Apparently, a common culprit is having a a duplex mismatch between networking ends: full duplex on one and and half on the other, particularly on direct network card-to-card connections where these settings can be forced, but also when the network card or switch is just dumb about autodetection.


Other possibilities seem to include:

  • bad drivers,
  • a bad hub/switch, (e.g. a cheap one that freezes up sometimes)
  • bad wireless connections,
  • a damaged cable


See also


Windows handles, handle-related problem

A windows Handle is a reference to one of about three dozen types of resources (Kernel Objects) - specifically resources that are managed by windows itself, and used only through a windows API. (Handles themselves are uint32 numbers, the have meaning as a (dynamically allocated) identifier)


The types applicable varies with version of windows, and a little on how you count. Some of the more significant are event handles (for critical section type things), file handles, registry-key handles, section handles (shared memory stuff), thread handles, window handles, some other types relating to GUI elements and DirectX, and a few more of which there can easily be at least a few hundred system-wide.


Windows does not deal well with running out of allocatable handles, even in user-space programs. In theory it's hard to run out, because many of the types have limits (per process and/or system-wide(verify)).

Handle leaks of some of the types can easily crash the system, so some bugs can cause trouble in windows.


There is an overall handle limit per process (224, ~16 million) (though the amount of memory also limits how many you can actually allocate).


See also:


Windows file or directory can't be removed

Windows (XP?) has had a problem for years where you can't delete a file after some specific actions.

There are a few things that bring it out this bug, which include:

  • Trying to delete it while it is still open. Sometimes this will cause it to be undeletable afterwards (because explorer.exe keeps it open)
  • Directories with files like this (implicitly, because it contains an open file)

Windows will say it is in use, but there's no application with it open.


The quick and dirty solution is rebooting, but there are other ways.

In particular, Process Explorer lets you both search for open handles by their path name. (and close handles, but that's a bad idea when they are actively in use, in that the program may not be expecting it and the file may be in a half-updated state, but if it's lingering only because of a bug, the difference between this and what will happen at the next shutdown/reboot isn't much)


Notes on handle leaks, and inspecting handles

Most processes tend to have have a few dozen to a few hundred handles open, so a system with a bunch programs installed and running may easily have 10000-25000 handles open in total.


One or two thousand is normal for explorer.exe, some services, 'System', and a few others (each partly because it's a bunch of things in one). Lots of apps open (e.g. 80 chrome tabs) also adds up. In some cases, up to 60Kish total isn't crazy.


More than a few thousand, and steady growth to thousands (over minutes or hours), is likely to indicate that a program doesn't close them as it should.

Eventually this will be a problem to the system - eventually the OS will run out of some resource or other.


Inspecting

You can identify the count with just Windows's own Task Manager (Ctrl-Shift-Esc):

  • Recent windowses: Go to the Details tab, right click the header, 'Select columns', add the 'Handles' column
  • Older windowses: Go to the process tab, via View → Select columns / recent window, add the 'Handles' column

In some cases, just knowing the process that is misbehaving is enough information to know what to update/downgrade/uninstall. For example, I had a webcam helper that had a handle leak related to it looking for its registry settings every second, but not closing that handle).


In other cases it's not so simple, particularly when the offending process is 'System' as you can't tell what specific part is misbehaving. It's likely to be some driver or other, but it's hard to tell from just the count.

If you want more detail:

  • On Windows Vista and Windows 7 there is "Resource Monitor" (part of(verify) and reachable via Performance Monitor).
    • In the CPU tab you can view the handles for the selected processes (seems to show only some types of handles(verify))
    • seems not to report on System process (verify)

Some usage notes for Handle.exe:

  • without the -a option it shows only file (and section?) handles. To show all types, use -a.
  • invoked without options at all it shows (the file handles for) all processes.
  • you can inspect a specific process using -p, giving it a PID or a process name (partial is allowed, seems to be a starts-with test)
  • You can get an overall summary (count per type) by using handle.exe -s, though this doesn't mean much if you don't know how much of each type is to be expected


In one case of the misbehaving System process, Handle.exe showed thousands of thread handles it couldn't access - but the process's thread count was low, which likely meant it leaked thread handles. It turned out to be a specific driver.

Error 10051

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


It seems that various software has a habit(verify) of defining its own errors starting above 10000 (presumably to not conflict with some more standard error codes around 0).

So coming from specific software, errors 10000 through... well, most software doesn't end up defining more than a few dozen... can mean anything and you may need to look it up in their manual.

That said, if it's related to networking, it's probably coming from winsock, and 10051 is WSAENETUNREACH, basically 'network unreachable'. This is fairly likely, just because winsock is common and most other software may not define ~50 errors.


But for example, this software uses 10051 to mean "can't open device", and it'll probably show another more meaningful error code from the actual API (e.g. 0x8889000f is WASAPI saying AUDCLNT_E_ENDPOINT_CREATE_FAILED).