Difference between revisions of "Docker notes"

From Helpful
Jump to: navigation, search
m (Status and cleanup, of images and instances)
m
(2 intermediate revisions by the same user not shown)
Line 315: Line 315:
 
See '''existing images''' on the host:
 
See '''existing images''' on the host:
 
* {{inlinecode|docker images}} lists present images
 
* {{inlinecode|docker images}} lists present images
* {{inlinecode|docker images -a}} includes non-named ones,
+
* {{inlinecode|docker images -a}} includes non-tagged ones,
  
 
  REPOSITORY            TAG                IMAGE ID            CREATED            SIZE
 
  REPOSITORY            TAG                IMAGE ID            CREATED            SIZE
Line 337: Line 337:
  
  
'''Cleanup'''
+
'''Image cleanup'''
  
 
If you're done with ''images'':
 
If you're done with ''images'':
  
* {{inlinecode|docker rmi ''ids''}} to remove them, e.g. based on {{inlinecode|docker images}}.  
+
* remove image by id (or tag), e.g. based on an entry from {{inlinecode|docker images}}.  
: It'll refuse when something currently depends on it.
+
docker rmi ''ids''
 +
: It'll refuse when something currently depends on it
  
* {{inlinecode|docker image prune}} should clean up dangling images (or, with -a, ''unused'')
+
* Clean all dangling images (those without tags)
: dangling images are mostly build layers are no longer referred to (no [[#repository and tag|repository and tag]])
+
docker image prune
  
 +
* Clean all unused images ()
 +
docker image prune -a
 +
 +
 +
'''Container cleanup'''
  
 
In default config, container instance state sticks around after the container stops.
 
In default config, container instance state sticks around after the container stops.
  
This can be very useful for debug during development, but you do eventually want to clean them:
+
This can be very useful for debug during development for post-mortem debugging {{comment|(while in production you may want --rm so that this happens when they stop)}}, but you do eventually want to clean them:
* {{inlinecode|docker rm ''contid''}}
+
* clean specific stopped container
 +
docker rm ''contid''
  
 +
* clean all stopped containers:
 +
docker container prune
  
If you don't care about post-mortem debugging, you can start containers with {{inlinecode|--rm}} to always have it immediately  clean up after itself.
 
  
  
<!--
 
docker volume ls
 
  
  
 +
'''More cleanup'''
  
Stop all containers
+
You may like {{inlinecode|docker system prune}}, which is roughly equivalent[https://docs.docker.com/config/pruning/] to
docker stop $(docker ps -a -q)  
+
: {{inlinecode|docker container prune}}
Cleanup all containers:
+
: {{inlinecode|docker image prune}} (with optional --all, see above)
docker rm $(docker ps -a -q)
+
and also
 +
: {{inlinecode|docker network prune}}
 +
: {{inlinecode|docker volume prune}} (if you add --volumes)
  
  
Remove all images (WARNING: be sure that's what you wanted):
+
'''Further shell-fu:'''
docker rmi $(docker images -a -q)
+
  
 +
* Stop all containers
 +
docker stop $(docker ps -a -q)
  
Bulk removing images e.g. like
+
* bulk remove images via tag wildcard: consider  at things like
 
  docker rmi $(docker images --filter=reference="user/spider:22*" -q)
 
  docker rmi $(docker images --filter=reference="user/spider:22*" -q)
 +
: note that not thinking may mean you do <tt>docker rmi $(docker images -a -q)</tt>, i.e. remove all (unused) images
  
 
+
<!--
 
+
* Cleanup all containers (basically the same as docker image prune)
 
+
docker rm $(docker ps -a -q)
 
+
docker image prune
+
 
+
docker volume prune
+
 
+
You might also want to know about {{inlinecode|docker system prune}}, which cleans up a lot of old things
+
 
+
 
+
 
-->
 
-->
  
Line 1,084: Line 1,087:
 
  ubuntu              xenial              d355ed3537e9        4 weeks ago        119 MB
 
  ubuntu              xenial              d355ed3537e9        4 weeks ago        119 MB
 
I can run this one image with {{inlinecode|docker run -it ubuntu:xenial bash}} or {{inlinecode|docker run -it d355ed3537e9 bash}} -- however, {{inlinecode|docker run -it ubuntu bash}} will cause a download because it implies {{inlinecode|ubuntu:latest}} which is not in the list. Note that this setup requires that I pulled it specifically as ubuntu:xenial
 
I can run this one image with {{inlinecode|docker run -it ubuntu:xenial bash}} or {{inlinecode|docker run -it d355ed3537e9 bash}} -- however, {{inlinecode|docker run -it ubuntu bash}} will cause a download because it implies {{inlinecode|ubuntu:latest}} which is not in the list. Note that this setup requires that I pulled it specifically as ubuntu:xenial
 +
 +
 +
-->
 +
 +
 +
===Compose===
 +
<!--
 +
Compose can be used as both an
 +
 +
 +
https://medium.com/swlh/incremental-docker-builds-for-monolithic-codebases-2dae3ea950e
 +
 +
 +
-->
 +
 +
===Build systems===
 +
 +
<!--
 +
 +
 +
Compose
 +
 +
 +
 +
Automated builds (Docker hub) mean you can
 +
 +
https://docs.docker.com/docker-hub/builds/
 +
 +
 +
 +
 +
https://itnext.io/docker-makefile-x-ops-sharing-infra-as-code-parts-ea6fa0d22946
 +
 +
 +
 +
  
  

Revision as of 17:28, 10 September 2019

Notes related to (mostly whole-computer) virtualization, emulation and simulation.

Some overview · Docker notes · Qemu notes

Intro

What?

Practically

Not emulated, not even a hypervisor style VM, but processes that run directly in the host linux kernel (since approx 3.13) -- that happen to be isolated in all the ways that matter.

This is more lightweight than running a classical VM

in part because they virtualize the OS (interface), not the hardware
in part because of practical details, e.g. how the images are created/used
in part because persistent storage is intentionally separated


Technically

The isolation is mostly relatively recent kernel features - primarily namespaces, and cgroups are also quite practical.


Docker is one of several toolkit using these that make it practical to actually use - one with a specific take on it (e.g. more microservice than fully interactive system).

For some more background and comparison, see linux containers.

There are comparable things, e.g. SmartOS had been doing this before Linux, though there for reasons of operational efficiency and security, whereas docker focuses more on the microservice angle.

What's it useful for?

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Depends on who you ask.


Angles include:


Separating apps things to relieve dependency hell

Where historically, executables were often more monolithic things that controlled everything, modern systems tend to have applications live in an ecosystem of libraries and more, which keep working by merit of that environment staying sane.

The larger said ecosystem, the more friction and fragility there is, and harder to administer, e.g. more chance for "oh god I tried to update X and now everything is in a weird sorta-broken state that I don't understand.".


Keeping things mixed works out as a good solution for the OS itself, largely because there are people putting time into keeping that well defined.

This is why linux is generally quite good at this, with how sofiles work and all, but there are cases where this is still useful, e.g. with more custom, non-OS-packaged software, because some really just do what the hell they want, and are sometimes best kept isolated from not only each other but also from the OS, basically whenever that ends up simplifying the management.


Note that docker's a little overkill for just this - there's other ways to do this, some simpler, some newer, some a little more convenient (with the details of breaking out X, audio, GPU, and such).


Portability

The just-mentioned separation also means each container app will run the same regardless of your hardware or OS. Same as a VM, really.

Basically, "runs identically in any place it runs at all". Which can be very useful for multiple reasons.


Useful layer of abstraction for software packages

The tech for OS containers had existed in a usable state for over decade. Docker just made them a lot easier to actually use.

Or, as some other people put it, "docker is to apt what apt is to tar."

On top of that there are things like Docker Hub, repositories to share images people build, which makes it a software-service repository.

(Which confuses the above analogy in that it's like apt for docker. Also docker images are tarballs. Am I helping yet? :) )



Development and automating deployment

It's not too hard to set up so that a code push automatically triggers: testing, building an image, and starting it on a dev server.

Easy and fast container startup is pretty convenient to automate testing of a new version, both of the unit/regression sort, as of some amount of larger deployment. But also valid on just a single host.

docker diff can sometimes be rather useful in debugging works-for-me issues


Large-scale deployment

In clusters and swarms it's very handy to not have exactly reproducable environments, without fine management of each node and with minimal overhead.

It e.g. lets admins patch the host OS without having to worry, each time, whether that might break some app due to some random dependency. Because we've essentially separated system environment from app environment.

When is it less useful?

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

When there is no justification reason to use it.

because you'ld be adding complexity without any reason.
and distributed software is, by nature, harder to design and a lot harder to debug than a monolithic setup.
also, it takes some relearning on how to do a lot of security things (e.g. least-privilege access) and admin things (e.g. restorable backups, monitoring). If there's no other benefit, this is just lost time.


If you think it's the hip scalability band-aid

Any container/VM/management only makes the deployment step easier. Actual efficiency and scalability are properties of your design


If security is your main interest

you get isolation, yes, but was never the primary goal, and there are footnotes you must know
VMs are arguably better there, at least until security becomes a central focus for docker


Some potential upsides, like automated setup and provisioning, aren't automatic.

they only really happen when you think about how to do them for your case, and it has to make sense for your use case.

Single-purpose

Security model

Good and bad ideas

Technical Intro

Some concepts

An image file is a completely specified environment that can be instantiated.

It is basically a snapshot of a filesystem.
They are often layered on top of other images
which makes it easier to build, and easier to version-control each layer
and since layers they are references (to things you also have) rather than copies, it makes heavily layered images much smaller
For example's sake you can rely on the default fetching of image files from Docker Hub.
(you can create image files yourself, and will eventually want to)


A container is a instance of an image file - either running, or stopped and not cleaned up yet.

Note that the changes to files from the image are not written to that image. They are copy-on-write to container-specific state.




Images and containers have IDs. They show up in most tools as large hexadecimal strings which (actually a smallish part of a larger (sha256) hash, see --no-trunc for the full thing).

Note that you only need to type as many characters to make it unique within what you have (which may only be one character),


It's still not what you want for larger-scale management, so

  • for images you often want to deal with repository-based aliases to images (which are shorthands that point at an image ID(verify)).
  • For containers there are names. The automatically generated ones (look like unruffled_curran) are meant to make it easier for humans to communicate about them (than a hexadecimal numbers are).
You can give them your own meaningful names -- but note they must be unique (so at scale you need some scheme)


#More on tags below, which matters when you make your own repository.



A registry is a particular site that hosts images. Like docker hub, which is the default.

When a distinction is made between registry and index, the registry is the assets, the index is more the accounts, search stuff.


And note that since a repository is personal, a registry hosts many repositories.

Introduction by example

Install docker.

Doing so via package management often does 90% of the setup work.


You may want to give a specific users extra rights, so that you don't need to do things as root (or via sudo). (But for a quick first playing around that's fine, and I'm doing it in the examples below)


Instantiating things

root@host# docker run -i -t ubuntu /bin/bash
root@3b66e09b0fa2:/# ps faux
root         1  0.0  0.0  18164  1984 ?        Ss   12:09   0:00 bash
root        15  0.0  0.0  15560  1104 ?        R+   12:11   0:00 ps faux

What happened:

  • it found an image called ubuntu (downloaded it from docker hub if not present yet)
  • instantiated the image to a container, with its /bin/bash as entry point / main process
  • you manually ran ps within it. (Yes, the only processes inside right then are that shell and that command)


Notes:

  • The entry point is the main process, and also determines the container lifetime.
You would generally use an independent, long-running command. Exactly what is up to you.
In the microservice philosophy often the actual service (sometimes a process monitor)
note that the main process's stdout will go to the logs
It is also valid to see a container as half of an OS, with a stack of software, plus perhaps a ssh daemon. It's often cleaner to avoid that if you can, but it can sometimes make a lot of sense.
note:
docker kill
kills the main process, and thereby the container.
In this example the main process is bash, purely as a "prove it's a separate thing" example and means the container only lives until you log out.
also note that
-i -t
, for 'interactive' and 'allocte a tty' are only necessary because we want an interactive shell, which is not typical
  • If you didn't have the
    ubuntu
    image, then ubuntu:latest would have been downloaded from docker hub.
  • by default, the container id also becomes its hostname
  • the full container id is long. Most places where docker prints one or wants one needs only a few bytes (docker usually shows six bytes, twelve hex characters), because that's almost always unique
  • in many cases, you can also use the name, but note that these need to be unique


Status and cleanup, of images and instances

Status

See existing images on the host:

  • docker images
    lists present images
  • docker images -a
    includes non-tagged ones,
REPOSITORY            TAG                 IMAGE ID            CREATED             SIZE
<none>                <none>              eda5290e904b        2 months ago        70.5MB
ubuntu                bionic-20190515     7698f282e524        3 months ago        69.9MB


See running container instances on the host:

  • docker ps
    to list running containers
  • docker ps -a
    to list running and old containers
CONTAINER ID     IMAGE            COMMAND       CREATED          STATUS           PORTS      NAMES
3b66e09b0fa2     ubuntu:latest    "bash"        9 seconds ago    Up 8 seconds                drunk_mestorf



Image cleanup

If you're done with images:

  • remove image by id (or tag), e.g. based on an entry from
    docker images
    .
docker rmi ids
It'll refuse when something currently depends on it
  • Clean all dangling images (those without tags)
docker image prune
  • Clean all unused images ()
docker image prune -a


Container cleanup

In default config, container instance state sticks around after the container stops.

This can be very useful for debug during development for post-mortem debugging (while in production you may want --rm so that this happens when they stop), but you do eventually want to clean them:

  • clean specific stopped container
docker rm contid
  • clean all stopped containers:
docker container prune



More cleanup

You may like
docker system prune
, which is roughly equivalent[1] to
docker container prune
docker image prune
(with optional --all, see above)

and also

docker network prune
docker volume prune
(if you add --volumes)


Further shell-fu:

  • Stop all containers
docker stop $(docker ps -a -q) 
  • bulk remove images via tag wildcard: consider at things like
docker rmi $(docker images --filter=reference="user/spider:22*" -q)
note that not thinking may mean you do docker rmi $(docker images -a -q), i.e. remove all (unused) images


On image building

There are two basic ways to build an image:

  • manually: start with something close to what you want, make the changes you want
saving this container = saving all filesystem changes within it
good for a one-time quick fix, less ideal for some habits you'll need at larger scale
  • automate: write a dockerfile
docker build creates an image from a dockerfile - basically from a series of commands
faster to transfer, since images are cached
https://docs.docker.com/engine/reference/builder/


The hard way: manually

Say we want to build on the ubuntu image to make a web server image

root@host# docker run -i -t ubuntu /bin/bash
root@3b66e09b0fa2:/# apt-get install apache2
# ...and whatever else

To verify the difference to the original image, we can do:

root@host@ docker diff 3b66e

Which'll print ~1200 filenames. Fine, let's roll that into a new image

root@host# docker commit 3b66e my_ubuntu_apache
cbbb61030ba24dda25f2cb27af41cc7a96a5ad9d23ef2bb500e9eaa2e16aa44d

and check that it's now known for us:

root@host# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED              VIRTUAL SIZE
my_ubuntu_apache    latest              cbbb61030ba2        About a minute ago   202.7 MB
ubuntu              latest              b7cf8f0d9e82        2 weeks ago          188.3 MB

Notes:

  • commits are to your local registry. (Don't worry, you won't accidentally write things on docker hub (or other linked site) until you are registered, you are linked, and do a docker push)
actually a commit is more like a snapshot / layer
if you want things reusable and configurable, there are various good practices about how configuration and environment is best handled. You'll read up when you need it.


The usual way: Dockerfiles

multi-stage builds

Practical notes

general habits and tips

Some common bits

DEBIAN_FRONTEND=noninteractive

...On debian / ubuntu. The options to this are readline, dialog/whiptail, or noninteractive, and the last is interesting for automatic installs in that it will never pause on a question.

As-is it will often choose a hopefully-sensible default where possible (e.g. UTC timezone).


If you want non-defaults, or things that are site-specific, or answering the occasional thing that really is a freeer-form field, you would use the above and store these values store them in the debconf database. Look around for guides with commands involving debconf-set-selections


see
man debconf
and
man debconf-set-selections
for more details


apt --no-install-recommends

tl;dr: Recommended packages are the things you probably want installed too. By saying "no, I'll do that explicitly", you can often remove a few things and slim down the image somewhat.


Recommended packages are those that you often probably want too. Weaker than Depends, stronger than Suggests. It's a gray area, but recommendations are the things most people probably actually want, for a thing to live up to expectations.

By default, recommended packages are installed. Disabling that effectively treats them as suggestions.


Examples:

gcc recommends libc-dev, and suggests things like autoconf, augomake, flex, bison,
a media player might recommend the ability to play some less-common media sources, and suggest some rarer ones,
TeX may recommend a bunch of modern packages that most people use now,
a -dev package may well recommend the according -doc package,
things in general may recommend or suggest some contrib stuff.

To get more of an idea, try things like:

apt-cache depends gcc

Minimal images

Build systems?

Complex with and without: Versions and config

The microservice idea

"avoid sshd"

More on tags

Compose

Build systems

Resource management

How does storage work?

Container state

Bind mounts, data volumes

On permissions

VOLUME in Dockerfiles

Databases

How does networking work?

Limiting CPU, memory

Various commands

Starting, stopping

Pratical sides to

docker security

GUI apps in docker

X

VNC, RDP

sound in dockerized apps

CUDA in dockerized apps

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


Operating docker

SELinux and docker

Semi-sorted

Where does the host store things?

Better known as "argh how do I make it not fill up my system drive?"


Linux:

  • Most things are stored in /var/lib/docker
most of the size will be in
the image filesystem laters (aufs)
data volumes
If you move the whole thing, and then symlink to it, stop the daemon around doing that, or it'll get confused

"Cannot connect to the Docker daemon. Is the docker daemon running on this host?" or "Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock"

Typically means you don't have permissions to
/var/run/docker.sock
.

Which is often owned by root:docker, so while sudo works, the cleaner solution is to add the user (that you want to do this management) to the docker group.



Docker for windows

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

As far as I can tell (and yes, this gets confusing)...


Originally, docker on windows meant "Sure you can run linux containers on windows: Install VirtualBox on windows, recent linux in that, and docker in that".

This is a bit of initial setup (and you need to buy windows Pro or Server to do this, because you'll want Hyper-V), yet

since tooling was introduced to manage this from the windows host it's quite managable,
since the VM overhead appears only once, instances within it work as you'ld expect, and you don't need two servers to run windows+linux stuff (also same metal probably means lower latency than two hosts)

...it's quite a valid solution to a number of use cases.



The confusion started when MS later said "Now we can run docker containers natively" and 'does not require VirtualBox'.

What they meant was "linux had a good idea, we have added similar in-kernel isolation, so now we can do windows-only containers".

They weren't really clear about that, but the result is useful, and valid because for the same 'run things on the same metal' implication as before. All fine.


The confusing part is that they called it docker.

Because there is zero overlap in what makes docker linux and docker windows actually run.


To be a little more complete, MS effectively extended the situation from one or two to roughly six distinct concepts:

  • running linux containers on linux (= docker)
  • running linux containers on windows within a VM (= typical if you want the combination)
  • running linux containers on windows natively (won't happen, MS have said so, and that makes technical sense)
No, WSL cannot do that - it's a translation layer (basically reverse wine) specifically without any linux kernel code, and while intended to be thinner, e.g. the filesystem speed is noticeably worse.
WSL2 is closer - because it's actually a Hyper-V VM running a modified linux kernel. Which makes it effectively the previous point.
  • running windows containers on windows (= "docker windows")
  • running windows containers on linux via a windows VM (you could, but I don't suspect many people do this)
  • running windows containers on linux natively (won't happen)


So in the end, the largest similarities between docker linux and docker windows is that you can also use dockerfiles to build windows images, and that the tooling was updated to manage points 1 and 5 above, on the implicit condition that your host is windows.


It's confusing (see forums full of questions) to the point it seems like bad marketing.


People have raised the question what MS is working towards in the long term.

It's not that MS doesn't understand the concept of an lightweight secure VM.

It's not that MS are bandwagoning for some open-source street cred again.

It's probably just that they made sure they were somehow a player in a market that might otherwise exclude them, or make them look behind.


Various people have offered that maybe it's not bad marketing, it's actually cleverly surreptitious marketing: Making people think they are not committing to a specific tech -- while actually leading them to do so:

That is, docker linux was a "one thing runs everywhere".

Docker windows+linux is not; kinda-maybe on windows.

But if you get people to see docker as a mixed-OS thing, mainly due to the tooling, then if you squint you can say windows server does it better (who cares if that's because they don't allow the other option).


Also, if you get people to see docker as a windows-based application packager, then the next server your business buys may be a licensed windows server for "you never know, it will run all these hip dockery things" reasons.



Unsorted