Docker notes

From Helpful
Jump to: navigation, search
Notes related to (mostly whole-computer) virtualization, emulation and simulation.

Some overview · Docker notes · Qemu notes · Some overview ·

Intro

What?

Not a separated machine, but processes within a host linux kernel (since approx 3.13) -- that happen to be isolated in all the ways that matter.

They are more lightweight than running a classical VM

in part because they virtualize the OS, not the hardware
in part because of practical details, e.g. how the images are created/used
in part because persistent storage is intentionally separated


Actually, the isolation is mostly recent kernel features, Docker is just one of various toolkits on top that make it practical, and with its own take on it. For some more background and comparison, see Virtualization,_emulation,_simulation#Linux_containers.

There are comparable things, e.g. SmartOS had been doing this before Linux, though more for reasons of operational efficiency and security, whereas docker started more with the microservice angle.

What's it useful for?

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Depends on who you ask.


Angles include:


Separating apps things to relieve dependency hell

Where historically, executables were often more monolithic things that controlled everything, modern systems tend to have applications live in an ecosystem of libraries and more, which keep working by merit of that environment staying sane.

The larger said ecosystem, the more friction and fragility there is, and harder to administer, e.g. more chance for "oh god I tried to update X and now everything is in a weird sorta-broken state that I don't understand.".


Keeping things mixed works out as a good solution for the OS itself, largely because there are people putting time into keeping that well defined.

Apps on the other hand do what they want, and are sometimes best kept isolated from not only each other but also from the OS, basically whenever that ends up simplifying the management.


Note that docker's a little overkill for just this - there's other ways to do this, some simpler, some newer.


Portability

The just-mentioned separation also means each container'd app will run the same regardless of your hardware or OS. Same as a VM, really.


Useful layer of abstraction

The tech for OS containers had existed in a usable state for over decade. Docker just made them a lot easier to actually use.

Or, as some other people put it, "docker is to apt what apt is to tar."

On top of that there are things like Docker Hub, repositories to share images people build, which makes it a software-service repository.

(Which confuses the above analogy in that it's like apt for docker. Also docker images are tarballs. Am I helping yet? :) )



Development and automating deployment

It's not too hard to set up so that a code push automatically triggers: testing, building an image, and starting it on a dev server.

Easy and fast container startup is pretty convenient to automate testing of a new version, both of the unit/regression sort, as of some amount of larger deployment. But also valid on just a single host.

docker diff can sometimes be rather useful in debugging works-for-me issues


Large-scale deployment

In clusters it's very handy to not have exactly reproducable environments, without fine management of each node and with minimal overhead.

When is it less useful?

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
  • When there is no justification reason to use it.
because you'ld be adding complexity without any reason.
and distributed software is, by nature, harder to design and a lot harder to debug than a monolithic setup.
  • if you think it's the scalability band-aid
Docker only makes the deployment step easier, but efficiency and scalability are properties of design
  • takes some relearning on how to do things like least-privilege access, restorable backups, monitoring,
  • if security is your main interest
you get isolation, yes, but it's not the primary goal, and there are footnotes you must know
VMs are arguably better there, at least until security becomes a central focus for docker
  • some potential upsides, like automated setup and provisioning, aren't automatic.
they only really happen when you think about how to do them for your case, and it has to make sense for your use case.

Single-purpose

Security model

Good and bad ideas

Technical Intro

Some concepts

An image file is a completely specified environment that can be instantiated.

It is basically a snapshot of a filesystem.
They are often layered on top of other images
which makes it easier to build, and easier to version-control each layer
and since layers they are references (to things you also have) rather than copies, it makes heavily layered images much smaller
For example's sake you can rely on the default fetching of image files from Docker Hub.
(you can create image files yourself, and will eventually want to)


A container is a instance of an image file - either running, or stopped and not cleaned up yet.

Note that the changes to files from the image are not written to that image. They are copy-on-write to container-specific state.



You can refer to images either with identifier (full or shortened hex form), or with an alias.

repository:tag
is mostly just human-readable aliases for an image id.

And yes, the naming is confusing:

  • alias and tag sometimes mean the same thing, tag also to the post-column part (that is usually used for versioning)
  • repository is just a name, often grouping similar builds (and does not refer to the store of images as a whole, networkwise or not)


You can add tags with docker tag. Note that removing images by name will only actually remove the image if this alias was the only thing pointing at the underlying identifier (you can consider them similar to filesystem hardlinks).


Also note that omitting the tag implies :latest, which is generally not what you want.

...because it refers to the last build(/tag?) command without an explicit tag (or the). This is going to bite you if you don't always specify the tag. Simplest solution: always specify a tag.


For example, when docker ps shows me I have

REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
ubuntu              xenial              d355ed3537e9        4 weeks ago         119 MB
I can run it with
docker run -it ubuntu:xenial bash
or
docker run -it d355ed3537e9 bash
, but
ubuntu
will cause a download because it implies
ubuntu:latest
which is not in the list.

Introduction by example

Install docker. Via package management often does 90% of the setup work.

You may need/want to give a specific users extra rights, so that you don't need to be root or use sudo all the time. But in a pinch that works too.


Instantiating things

root@host# docker run -i -t ubuntu /bin/bash
root@3b66e09b0fa2:/# ps faux
root         1  0.0  0.0  18164  1984 ?        Ss   12:09   0:00 bash
root        15  0.0  0.0  15560  1104 ?        R+   12:11   0:00 ps faux

What happened:

  • it found an image called ubuntu (downloaded it from docker hub if not present yet)
  • instantiated it, with bash as entry point / main process
  • you ran ps within it. (Yes, the only processes inside right then are that shell and that command)


Notes:

  • The entry point is the main process, and also determines the container lifetime.
You would generally use an independent, long-running command. Exactly what is up to you.
In the microservice philosophy often the actual service (sometimes a process monitor)
note that the main process's stdout will go to the logs
It is also valid to see a container as half of an OS, with a stack of software, plus perhaps a ssh daemon. It's often cleaner to avoid that if you can, but it can sometimes make a lot of sense.
note:
docker kill
kills the main process, and thereby the container.
In this example the main process is bash, purely as a "prove it's a separate thing" example and means the container only lives until you log out.
also note that
-i -t
, for 'interactive' and 'allocte a tty' are only necessary because we want an interactive shell, which is not typical
  • If you didn't have the
    ubuntu
    image, then ubuntu:latest would have been downloaded from docker hub.
  • by default, the container id also becomes its hostname
  • the full container id is long. Most places where docker prints one or wants one needs only a few bytes (docker usually shows six bytes, twelve hex characters), because that's almost always unique
  • in many cases, you can also use the name, but note that these need to be unique

Status

To see what's running (or, continuing the exampe above, that it's running), run
docker ps
on the host, which shows something like:
CONTAINER ID     IMAGE            COMMAND       CREATED          STATUS           PORTS      NAMES
3b66e09b0fa2     ubuntu:latest    "bash"        9 seconds ago    Up 8 seconds                drunk_mestorf


  • docker ps
    to list running containers
  • docker ps -a
    to list running and old containers


  • docker images
    lists present images
  • docker images -a
    includes non-named ones,

Cleanup

In default config, container state sticks around after the container stops. This can be very useful for debug, but you do eventually want to clean them.

Roughly:

  • docker rm ids
    to remove their state (e.g. based on
    docker ps -a
    )


If you don't care about post-mortem debugging, you can start containers with
--rm
to always have it immediately clean up after itself.


If you're done with images:

  • docker rmi ids
    to remove them, e.g. based on
    docker images
    .
It'll refuse when something currently depends on it.
  • docker image prune
    should clean up dangling images (or, with -a, unused)
dangling images are mostly build layers that nothing refers to (no repository and tag)

Storage

You may much prefer to keep all interesting data in a database (some NoSQL may be much nicer for general-purpose storage than RDBMSes).


If you want things you create in a container to persist, look at data volumes. #How_does_storage_work.3F

Keep in mind that these are not necessarily the answer to anything you want to deploy, scale, manage, or be portable.

On image building

The two basic ways

There are two basic ways to build an image:

  • manually: start with something close to what you want, make the changes you want
saving this container saves all filesystem changes within it
good for a one-time quick fix, less ideal for some habits you'll need at larger scale
  • automate: write a dockerfile
docker build creates an image from a dockerfile - basically from a series of commands
faster to transfer, since images are cached
https://docs.docker.com/engine/reference/builder/


habits and tips

The microservice philosophy

"avoid sshd"

Resource management

How does storage work?

Limiting CPU, memory

Various commands

Starting, stopping

Pratical sides to

docker security

GUI apps in docker

You can host a VNC, RDP, or similar screen if you want it to be and entirely independent instance.


If you're doing it for the don't-break-my-libraries reasons, then you may want to consider displaying directly to the host's X server.


Since X is a networked protocol in the first place, that connection is not very hard - the most you have to worry about is X authentication, and potentially about X extensions.

Keep in mind that sound is a little extra work.

As is GPU access



microservices in docker

SELinux and docker

Semi-sorted

"Cannot connect to the Docker daemon. Is the docker daemon running on this host?" or "Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock"

Typically means you don't have permissions to
/var/run/docker.sock
.

Which is often owned by root:docker, so while sudo works, the cleaner solution is to add the user (that you want to do this management) to the docker group.



Docker for windows

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Okay, so this one's confusing.

As far as I can tell, so far...


Originally, docker on windows meant "Sure you can run linux containers on windows: Install VirtualBox on windows, run linux in that, and docker in that". This was somewhat awkward, but only somewhat, and quite valid: the VM's overhead appears only once and the instances within it work as usual, and tooling was introduced to manage this from the windows side, meaning you could have linux docker basically as-is on windows (and of course run windows apps (or VMs) on the same metal and OS).

(Though you do need windows Pro, or Server, because you'll want Hyper-V which is missing in the most basic windows versions)


The confusion came later, when MS said "Now we can run docker containers natively" and 'does not require VirtualBox'.

Because what it actually means is "we have made our own thing that runs only on windows".


And called it docker, because you can build these with dockerfiles.

But the way these are run are completely unrelated to linux docker. There is zero technical overlap.


Sure, docker for windows is useful in itself. Technically it's mostly some in-kernel isolation that imitates what linux did, which is useful.


The ability to use windows containers alongside linux-in-VM is useful too, as is the fact that the tooling makes it easy to manage both these systems, making it easier to manage a mix on the same metal, and in the same swarm.


It just makes a lot more sense to see docker windows as an alternative to Windows VMs, than to compare it to docker, or say it is docker.

It is confusing to call it the same thing just because the tools can handle both, particularly because that name by its nature indicates containers, portability, not having VM overhead, and such.

Well, used to. It now is an umbrella including some things you can only ever do via VMs, which is weird and confusing to the point it seems like bad marketing, or maybe really good marketing.

Confusing, because you'll notice we've extended the situation to half a dozen possible distinct concepts:

  • running linux containers on linux (= docker)
  • running linux containers on windows via a VM (= typical if you want the combination)
  • running linux containers on windows natively (MS have basically said this won't happen)
  • running windows containers on windows (= "docker windows")
  • running windows containers on linux natively (no, WSL cannot do that. And it can't really happen unless MS is fully on board with the idea, mainly because licenses)
  • running windows containers on linux via a windows VM (you can, but I don't suspect many people do this)



People have raised the question whta MS is working towards in the long term.

It's not that MS doesn't understand the concept of an lightweight secure VM.

It's not that MS are bandwagoning for some open-source street cred.

They may just want to make sure they're somehow a player in a type of market that might exclude them.


So many have suggested this it's also a clever but surreptitious marketing strategy: You get people to think they are not committing to a specific tech -- while leading them to do so.

If you get people to see docker as a mixed-OS thing, mainly due to the tooling, then if you squint you can say say they do it a little better.

Something you can't say if it a case of "windows also does something similar" more so because it'd make it clearer just how little this is about interoperability.


Also, if you get people to see docker as a windows-based application packager, then the next server your business buys may be a licensed windows server for "you never know, it will run all these hip dockery things" reasons.

Unsorted