When installs break other installs

From Helpful
Jump to navigation Jump to search

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Virtual environments and packaging

The problem: when installs break other installs

Language agnostic packaging · App images

Python packaging · Ruby packaging · Rust packaging · R packaging

Intro

When people give each other programs, even those with some sort of customized install script, a lot of them have the habit of just putting their own binaries, libraries, and/or package paths in front of everything else.

Which can break other software (occasionally even system utilities)
  • Alternatively, it creates a mess of instructions
("yeah it overrides the system package directory, so install everything into that as well. Oh, the fix is simple, just learn this environment hacking tool", or "yeah you need to ensure it is not distracted by seeing a system MPI at install time but, then it will still work correctly at runtime, maybe we should fix that, but other than an error that isn't really an error it isn't broken sooo", or "oh it works on my system, which distribution do you use? Hmmm, which version? Hmm, what did you install before? Hmm, yeah I don't know maybe try uninstall and reinstall?")

And you might even manage that on your one workstation, in a "fiddle until it works" way.

But when your everyday involves distributing software to varied computers, shared production environment where new versions come in steadily, a bunch of niche/custom software, or clusters?

Good luck with your sanity, and/or your helpdesk's sanity.



When you start addressing this, you find out that other interests share the same problems:

  • Installation in user environments
  • installing in dev environments
  • starting your own projects
  • packaging software for other people


Solution 1: Avoid the problem by using someone else's solution

Software installed by your OS package manager tends to not create conflicts within the software that same package manager installed

...in part because the packages have had to follow specific rules to become that package in the first place.
...in part because the package manager might still manage (or at least point out) minor conflicts that still happen

(until you start mixing package managers, but surely that's not everyday practice, right? /s)


Upside:

  • generally just works
because you are offloading a lot of design and consideration to package maintainers

Limitation:

  • the willingness for package maintainers to do this work is roughly proportional to personal and public benefit
they are not going to do your work for you and your less-usual software
Either become one of them, or use something else.

Solution 2: Do it yourself when you have to

Custom software, niche software, or anything else not in common package managers, though?

You have to do it yourself.

How do you install it for yourself?

How do you make it easier for others?



Technical background - what is there to do?

What are the concrete moving parts here? The bits we might want to patch up?

For linux:

  • most of the things that get resolved at runtime from your system install are
  • there are further things, often picked up in their own ways. Consider:
    • compilers
    • other runtimes external to programs (mpi, java, nvidia stuff)
  • Scripting language runtimes could count as either one
and hashbangs may make things better - or worse, because they refer into an environment you also have to control (you can count on a distro being consistent-ish, but not in the long run)

We would like predictability for each of these.



Solution 2.1: a quick fix for personal/local use

When the issue is mostly path and libraries, and that you can't oversee when they are hooked in, then there is a quick fix in making the user responsible for doing that explicitly.

You've probably thought about writing things like:

function activate-myprog {
  export PATH=$PATH:/opt/myprog/bin
  export LD_LIBRARY_PATH=$PATH:/opt/myprog/lib
}


This is a halfway decent fix already.

Sure it's manual, sure it can still have issues, yet:

As an admin writing these on purpose, you've thought about the order in the paths, and
it loads nothing by default, while making it easy to have users choose one at a time, avoiding most conflicts
you can install multiple versions (mostly) without conflict
...or at least centralize your knowledge of the conflicts


Upsides

For personal use, this work fairly well for something so basic.

For coworkers, the explanation is now at worst, "activate what you need, and please start a new shell for each of these" (maybe adding "You don't always need a new shell, but it avoids potential problems")

(Also, on various shells, if you start the name with something like activate-, you get tab completion of all your activate- things just because these functions exist).


Limitations

It doesn't solve cases like

  • running things on cluster nodes, because you probably can't cleanly do that from batch scripts that its queue manager wants.
Particularly if it's not necessarily the same shell.
  • where there are deeper, external dependencies on (varying implementations and/or version of) system-ish things, like MPI or a compiler
  • depending on specific versions of software can get hairy
  • ...and anywhere where "don't touch my workstation y'bastards" does not work.


It may also not be trivial to explain to other people how to do this well.


So people have thought up some frameworks that stay cleaner over time.

Solution 2.2: the inbetween - custom, but within a more structured system

If you're geared to do this in a more automated, more modular way, then we might want to give a unique, controlled, environment to individual programs, and development environments for projects, nodes in clusters / swarms, and more.


This is sometimes called virtual environments.


This solves varied needs, but most commonly:

isolate specific libraries to just the program that needs them
and not accidentally conflict with others (as it might when installed system-wide)
ability to install things just into specific project without having permissions to do so system-wide
get build tools to create a specific environment (or run in one), making your dev more easily transplanted
makes software that relies on very specific (often older) versions a lot less fragile than installing everything into system and hoping for the best


The above is intentionally still abstract/vague, because implementations vary.

For example,

C is decent for shared libraries when people adhere to a versioning system
and relies on things like LD_LIBRARY_PATH when people do not.
Python has the concept of system-shared libraries, but does not have good versioning.
You can isolate an environment by (roughly) pointing that at your own instead of the system's.
Java has only a "load extra classes you need from here", which is essentially a manual thing. By default it never shares, every app is larger and independent.

(...yes, I know, all of those come with footnotes)



Jargon

Package vendoring

Package vendoring, vendoring, vendored packages, or bundling, all seems to mean "we copied in the (specific version of the) code we rely on, because we don't really trust the package management system / its dependencies to not break our builds / running system over time"


Sometimes combined with 'dependency isolation' to avoid other packages leaking in and causing trouble.

If you are doing that more consistently, what you are doing amounts to app images.