When installs break other installs

From Helpful
Jump to navigation Jump to search

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Virtual environments and packaging

The problem: when installs break other installs

Language agnostic packaging · App images

Python packaging · Ruby packaging · Rust packaging · R packaging


When people give each other programs, and those are installed into a shared environment, there are some details you need to address.

A large one is the problem we call dependency hell:

software loads other libraries,
works only with specific ranges of versions of that library,
and you have a lot of software,
then you will at some point start depending on multiple versions
you better have an answer to why that does not break anything

And some answers to that make it worse. Say, if you just put their own binaries, libraries, and/or package paths in front of everything else. Which makes you more likely to work, and other things more likely to break.

A more common situation is a mess of instructions

("yeah it overrides the system package directory, so install everything into that as well. Oh, the fix is simple, just learn this environment hacking tool" or "yeah you need to ensure it is not distracted by seeing a system MPI at install time but, then it will still work correctly at runtime, maybe we should fix that but - other than a build time error that we made a warning because it isn't really an error usually - it isn't broken sooo", or "oh it works on my system, which distribution do you use? Hmmm, which version? Hmm, what did you install before? Can you send me logs? Hmm, yeah I don't know maybe try uninstall and reinstall?")

And you might even manage that on your one workstation, in a "fiddle until it works and then don't touch it" way.

But when your everyday involves distributing software to varied computers, shared production environment where new versions come in steadily, a bunch of niche/custom software, or clusters?

Good luck with your sanity, and/or your helpdesk's sanity.

When you start addressing this, you find out that other interests share very similar problems:

  • Installation in user environments
  • installing in dev environments
  • starting your own projects
  • packaging software for other people
  • scaling out to more computers

Solution 1: Avoid the problem by using someone else's solution

Software installed by your OS package manager tends to not create conflicts within the software that same package manager installed

...in part because the packages have had to follow specific rules to become that package in the first place.
...in part because the package manager might still manage (or at least point out) minor conflicts that still happen

(until you start mixing package managers, but surely that's not everyday practice, right? /s)


  • generally just works
because you are offloading a lot of design and consideration to package maintainers


  • the willingness for package maintainers to do this work is roughly proportional to personal and public benefit
they are not going to do your work for you and your less-usual software
Either become one of them, or use something else.

Solution 2: Do it yourself when you have to

Custom software, niche software, or anything else not in common package managers, though?

You have to do it yourself.

How do you install it for yourself?

How do you make it easier for others?

Technical background - what is there to do?

What are the concrete moving parts here? The bits we might want to patch up?

For linux:

  • most of the things that get resolved at runtime from your system install are
  • there are further things, often picked up in their own ways. Consider:
    • compilers
    • other runtimes external to programs (mpi, java, nvidia stuff)
  • Scripting language runtimes could count as either one
and hashbangs may make things better - or worse, because they refer into an environment you also have to control (you can count on a distro being consistent-ish, but not in the long run)

We would like predictability for each of these.

Solution 2.1: a quick fix for personal/local use

When the issue is mostly path and libraries, and that you can't oversee when they are hooked in, then there is a quick fix in making the user responsible for doing that explicitly.

You've probably thought about writing things like:

function activate-myprog {
  export PATH=$PATH:/opt/myprog/bin
  export LD_LIBRARY_PATH=$PATH:/opt/myprog/lib

This is a halfway decent fix already.

Sure it's manual, sure it can still have issues, yet:

As an admin writing these on purpose, you've thought about the order in the paths, and
it loads nothing by default, while making it easy to have users choose one at a time, avoiding most conflicts
you can install multiple versions (mostly) without conflict
...or at least centralize your knowledge of the conflicts


For personal use, this work fairly well for something so basic.

For coworkers, the explanation is now at worst, "activate what you need, and please start a new shell for each of these" (maybe adding "You don't always need a new shell, but it avoids potential problems")

(Also, on various shells, if you start the name with something like activate-, you get tab completion of all your activate- things just because these functions exist).


It doesn't solve cases like

  • running things on cluster nodes, because you probably can't cleanly do that from batch scripts that its queue manager wants.
Particularly if it's not necessarily the same shell.
  • where there are deeper, external dependencies on (varying implementations and/or version of) system-ish things, like MPI or a compiler
  • depending on specific versions of software can get hairy
  • ...and anywhere where "don't touch my workstation y'bastards" does not work.

It may also not be trivial to explain to other people how to do this well.

So people have thought up some frameworks that stay cleaner over time.

Solution 2.2: the inbetween - custom, but within a more structured system

If you're geared to do this in a more automated, more modular way, then we might want to give a unique, controlled, environment to individual programs, and development environments for projects, nodes in clusters / swarms, and more.

This is sometimes called virtual environments.

This solves varied needs, but most commonly:

isolate specific libraries to just the program that needs them
and not accidentally conflict with others (as it might when installed system-wide)
ability to install things just into specific project without having permissions to do so system-wide
get build tools to create a specific environment (or run in one), making your dev more easily transplanted
makes software that relies on very specific (often older) versions a lot less fragile than installing everything into system and hoping for the best

The above is intentionally still abstract/vague, because implementations vary.

For example,

C is decent for shared libraries when people adhere to a versioning system
and relies on things like LD_LIBRARY_PATH when people do not.
Python has the concept of system-shared libraries, but does not have good versioning.
You can isolate an environment by (roughly) pointing that at your own instead of the system's.
Java has only a "load extra classes you need from here", which is essentially a manual thing. By default it never shares, every app is larger and independent.

(...yes, I know, all of those come with footnotes)


Package vendoring

Package vendoring, vendoring, vendored packages, or bundling, all seems to mean "we copied in the (specific version of the) code we rely on, because we don't really trust the package management system / its dependencies to not break our builds / running system over time"

Sometimes combined with 'dependency isolation' to avoid other packages leaking in and causing trouble.

If you are doing that more consistently, what you are doing amounts to app images.