When installs break other installs
Virtual environments and packaging The problem: when installs break other installs Language agnostic packaging · App images Python packaging · Ruby packaging · Rust packaging · R packaging |
Intro
When people give each other programs, even those with some sort of customized install script, a lot of them have the habit of just putting their own binaries, libraries, and/or package paths in front of everything else.
- Which can break other software (occasionally even system utilities)
- Alternatively, it creates a mess of instructions
- ("yeah it overrides the system package directory, so install everything into that as well. Oh, the fix is simple, just learn this environment hacking tool", or "yeah you need to ensure it is not distracted by seeing a system MPI at install time but, then it will still work correctly at runtime, maybe we should fix that, but other than an error that isn't really an error it isn't broken sooo", or "oh it works on my system, which distribution do you use? Hmmm, which version? Hmm, what did you install before? Hmm, yeah I don't know maybe try uninstall and reinstall?")
And you might even manage that on your one workstation, in a "fiddle until it works" way.
But when your everyday involves distributing software to varied computers, shared production environment where new versions come in steadily, a bunch of niche/custom software, or clusters?
Good luck with your sanity, and/or your helpdesk's sanity.
When you start addressing this, you find out that other interests share the same problems:
- Installation in user environments
- installing in dev environments
- starting your own projects
- packaging software for other people
Solution 1: Avoid the problem by using someone else's solution
Software installed by your OS package manager tends to not create conflicts within the software that same package manager installed
- ...in part because the packages have had to follow specific rules to become that package in the first place.
- ...in part because the package manager might still manage (or at least point out) minor conflicts that still happen
(until you start mixing package managers, but surely that's not everyday practice, right? /s)
Upside:
- generally just works
- because you are offloading a lot of design and consideration to package maintainers
Limitation:
- the willingness for package maintainers to do this work is roughly proportional to personal and public benefit
- they are not going to do your work for you and your less-usual software
- Either become one of them, or use something else.
Solution 2: Do it yourself when you have to
Custom software, niche software, or anything else not in common package managers, though?
You have to do it yourself.
How do you install it for yourself?
How do you make it easier for others?
Technical background - what is there to do?
What are the concrete moving parts here? The bits we might want to patch up?
For linux:
- most of the things that get resolved at runtime from your system install are
- non-absolute-path executable names, via PATH
- library names, via the runtime dynamic linker
- which most people take to mean 'change LD_LIBRARY_PATH' - workable though not the cleanest way to use it
- there are further things, often picked up in their own ways. Consider:
- compilers
- other runtimes external to programs (mpi, java, nvidia stuff)
- Scripting language runtimes could count as either one
- and hashbangs may make things better - or worse, because they refer into an environment you also have to control (you can count on a distro being consistent-ish, but not in the long run)
We would like predictability for each of these.
Solution 2.1: a quick fix for personal/local use
When the issue is mostly path and libraries, and that you can't oversee when they are hooked in, then there is a quick fix in making the user responsible for doing that explicitly.
You've probably thought about writing things like:
function activate-myprog { export PATH=$PATH:/opt/myprog/bin export LD_LIBRARY_PATH=$PATH:/opt/myprog/lib }
This is a halfway decent fix already.
Sure it's manual, sure it can still have issues, yet:
- As an admin writing these on purpose, you've thought about the order in the paths, and
- it loads nothing by default, while making it easy to have users choose one at a time, avoiding most conflicts
- you can install multiple versions (mostly) without conflict
- ...or at least centralize your knowledge of the conflicts
Upsides
For personal use, this work fairly well for something so basic.
For coworkers, the explanation is now at worst, "activate what you need, and please start a new shell for each of these" (maybe adding "You don't always need a new shell, but it avoids potential problems")
(Also, on various shells, if you start the name with something like activate-, you get tab completion of all your activate- things just because these functions exist).
Limitations
It doesn't solve cases like
- running things on cluster nodes, because you probably can't cleanly do that from batch scripts that its queue manager wants.
- Particularly if it's not necessarily the same shell.
- where there are deeper, external dependencies on (varying implementations and/or version of) system-ish things, like MPI or a compiler
- depending on specific versions of software can get hairy
- ...and anywhere where "don't touch my workstation y'bastards" does not work.
It may also not be trivial to explain to other people how to do this well.
So people have thought up some frameworks that stay cleaner over time.
Solution 2.2: the inbetween - custom, but within a more structured system
If you're geared to do this in a more automated, more modular way, then we might want to give a unique, controlled, environment to individual programs, and development environments for projects, nodes in clusters / swarms, and more.
This is sometimes called virtual environments.
This solves varied needs, but most commonly:
- isolate specific libraries to just the program that needs them
- and not accidentally conflict with others (as it might when installed system-wide)
- ability to install things just into specific project without having permissions to do so system-wide
- get build tools to create a specific environment (or run in one), making your dev more easily transplanted
- makes software that relies on very specific (often older) versions a lot less fragile than installing everything into system and hoping for the best
The above is intentionally still abstract/vague, because implementations vary.
For example,
- C is decent for shared libraries when people adhere to a versioning system
- and relies on things like LD_LIBRARY_PATH when people do not.
- Python has the concept of system-shared libraries, but does not have good versioning.
- You can isolate an environment by (roughly) pointing that at your own instead of the system's.
- Java has only a "load extra classes you need from here", which is essentially a manual thing. By default it never shares, every app is larger and independent.
(...yes, I know, all of those come with footnotes)
Jargon
Package vendoring
Package vendoring, vendoring, vendored packages, or bundling, all seems to mean "we copied in the (specific version of the) code we rely on, because we don't really trust the package management system / its dependencies to not break our builds / running system over time"
Sometimes combined with 'dependency isolation' to avoid other packages leaking in and causing trouble.
If you are doing that more consistently, what you are doing amounts to app images.