Difference between revisions of "Docker notes"

From Helpful
Jump to: navigation, search
m (How does storage work?)
m
(One intermediate revision by the same user not shown)
Line 226: Line 226:
  
  
You can refer to ''images'' either with identifier (full or shortened hex form), or with an alias.
 
  
{{inlinecode|repository:tag}} is ''mostly'' just human-readable aliases for an image id.
 
  
And yes, the naming is confusing:
+
Images and containers have IDs. They show up in most tools as large hexadecimal strings which {{comment|(actually a smallish part of a larger (sha256) hash, see --no-trunc for the full thing)}}.
* alias and tag sometimes mean the same thing, tag also to the post-column part (that is usually used for versioning)
+
* repository is just a name, often grouping similar builds (and does not refer to the store of images as a whole, networkwise or not)
+
  
 +
Note that you only need to type as many characters to make it unique within what you have (which may only be one character),
  
You can add tags with docker tag. Note that removing images by name will only actually remove the image
 
if this alias was the only thing pointing at the underlying identifier (you can consider them similar to filesystem [[hardlinks]]).
 
  
 +
It's still not what you want for larger-scale management, so
 +
* '''for images''' you often want to deal with repository-based aliases to images (which are shorthands that point at an image ID{{verify}}).
  
Also note that omitting the tag implies :latest, which is generally not what you want.
+
* '''For containers''' there are names. The automatically generated ones (look like <tt>unruffled_curran</tt>) are meant to make it easier for humans to communicate about them (than a hexadecimal numbers are).
 +
: You can give them your own meaningful names -- but note they must be unique (so at scale you need some scheme)
 +
 
 +
 
 +
[[#More on tags]] below, which matters when you make your own repository.  
  
...because it refers to the last build(/tag?) command without an explicit tag (or the).
 
This is going to bite you if you don't always specify the tag. Simplest solution: always specify a tag.
 
  
  
For example, when docker ps shows me I have
 
REPOSITORY          TAG                IMAGE ID            CREATED            SIZE
 
ubuntu              xenial              d355ed3537e9        4 weeks ago        119 MB
 
I can run it with {{inlinecode|docker run -it ubuntu:xenial bash}} or {{inlinecode|docker run -it d355ed3537e9 bash}},
 
but {{inlinecode|ubuntu}} will cause a download because it implies {{inlinecode|ubuntu:latest}} which is not in the list.
 
  
 
==Introduction by example==
 
==Introduction by example==
  
 
Install docker.
 
Install docker.
Via package management often does 90% of the setup work.
 
  
You ''may'' need/want to give a specific users extra rights, so that you don't need to be root or use sudo all the time. But in a pinch that works too.
+
Doing so via package management often does 90% of the setup work.
 +
 
 +
 
 +
You may want to give a specific users extra rights, so that you don't need to do things as root (or via sudo). {{comment|(But for a quick first playing around that's fine, and I'm doing it in the examples below)}}
 +
 
  
  
Line 267: Line 264:
 
What happened:
 
What happened:
 
* it found an image called ubuntu (downloaded it from docker hub if not present yet)
 
* it found an image called ubuntu (downloaded it from docker hub if not present yet)
* instantiated it, with bash as entry point / main process
+
* instantiated the image to a container, with bash as entry point / main process
 
* you manually ran ps within it. (Yes, the only processes inside right then are that shell and that command)
 
* you manually ran ps within it. (Yes, the only processes inside right then are that shell and that command)
 +
  
  
Line 289: Line 287:
  
 
* in many cases, you can also use the name, but note that these need to be unique
 
* in many cases, you can also use the name, but note that these need to be unique
 +
  
 
===Status and cleanup, of images and instances===
 
===Status and cleanup, of images and instances===
Line 536: Line 535:
  
 
https://blog.docker.com/2019/07/intro-guide-to-dockerfile-best-practices/
 
https://blog.docker.com/2019/07/intro-guide-to-dockerfile-best-practices/
 +
 +
-->
 +
 +
 +
===More on tags===
 +
 +
<!--
 +
'''First, some more terminology'''
 +
 +
 +
'''Repository''' is intended to refer to a collection of different docker images with same name, that have different tags -- typically grouping similar builds, e.g. hello-world below.
 +
 +
It does ''not'' refer to the entire store of images you keep, networkwise or not.
 +
 +
But since this is a bad, confusing name for it, people do that too.
 +
 +
 +
A '''tag''' refers to two things: the full reference, something like:
 +
registryhost/username/name:tag
 +
 +
and to that last part.  Yes, a tag has a tag, and this unfortunate naming is why people refer to this whole thing as an '''alias''' (but docker tooling does not).
 +
 +
In various contexts there is also '''reference''', which seems to be "any pointer to an image, be it its hash or a tag/alias"
 +
 +
 +
 +
Such an alias technically in both local naming and
 +
repository:tag
 +
 +
 +
Also, when a tag is used in a "look up an image" sense, e.g. when pulling one:
 +
* registryhost is by default configured to be registry.hub.docker.com {{verify}}
 +
* username can be skipped meaning "don't care who" {{verify}}
 +
* tag defaults to :latest {{verify}}
 +
...only name is required wh.
 +
 +
Also, the public docker hub specifically enforces a two-level hierarchy (to avoid name clashes but also keep it simple), so in this form is a little more specifically:
 +
username/name[:tag]
 +
{{comment|(Other registries can be freeer)}}
 +
 +
 +
 +
 +
 +
The above is why {{comment docker images}}'s header says REPOSTORY where you might expect, say, 'image name'
 +
 +
To steal an example from [https://stackoverflow.com/questions/31115098/what-is-the-difference-between-an-image-and-a-repository this answer], you might have:
 +
REPOSITORY          TAG        IMAGE ID
 +
docker/whalesay      latest      fb434121fc77
 +
hello-world          latest      91c95931e552
 +
hello-world          v1.1        91c95931e552
 +
hello-world          v1.0        1234abcd5678
 +
 +
What you really have at low level is images with IDs. This naming is applies on top.
 +
 +
 +
 +
 +
'''Where do tags come from?'''
 +
 +
 +
Tags usually enter into things when building towards a specific tag
 +
docker build -t username/image_name:tag_name .
 +
 +
Or when explicitly tagging later:
 +
docker tag imageid name[:tag]
 +
 +
 +
Both so that you can then then
 +
docker push name:tag
 +
 +
 +
 +
Notes:
 +
* you have have multiple tags
 +
:: e.g. while building
 +
:: or with multiple docker-tag commands
 +
 +
* {{inlinecode|docker rmi ''alias'''}} will only actually remove anything from storage if this alias was the only thing pointing at the underlying identifier
 +
: you could consider them similar to filesystem [[hardlinks]]
 +
 +
* :latest is not a special case at all. It's just
 +
: the default if omitted
 +
: likely to change meaning over time (any tag can, but typically doesn't)
 +
: this makes it good habit to just ''always specify a tag''
 +
 +
* since omitting :tag implies :latest in most contexts
 +
: in some contexts (e.g. building varied versions) this is ''not'' what you want
 +
: in some other contexts it might be slightly surprising:
 +
 +
REPOSITORY          TAG                IMAGE ID            CREATED            SIZE
 +
ubuntu              xenial              d355ed3537e9        4 weeks ago        119 MB
 +
I can run this one image with {{inlinecode|docker run -it ubuntu:xenial bash}} or {{inlinecode|docker run -it d355ed3537e9 bash}} -- however, {{inlinecode|docker run -it ubuntu bash}} will cause a download because it implies {{inlinecode|ubuntu:latest}} which is not in the list. Note that this setup requires that I pulled it specifically as ubuntu:xenial
 +
  
 
-->
 
-->

Revision as of 15:45, 15 August 2019

Notes related to (mostly whole-computer) virtualization, emulation and simulation.

Some overview · Docker notes · Qemu notes · Some overview ·

Intro

What?

Not a separated machine, but processes within a host linux kernel (since approx 3.13) -- that happen to be isolated in all the ways that matter.

They are more lightweight than running a classical VM

in part because they virtualize the OS (interface), not the hardware
in part because of practical details, e.g. how the images are created/used
in part because persistent storage is intentionally separated


Actually, the isolation is mostly recent kernel features; Docker is one specific product/toolkit on top that make it practical to actually use.

Docker has its own take on it. For some more background and comparison, see Virtualization,_emulation,_simulation#Linux_containers.

There are comparable things, e.g. SmartOS had been doing this before Linux, though more for reasons of operational efficiency and security, whereas docker focuses more on the microservice angle.

What's it useful for?

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Depends on who you ask.


Angles include:


Separating apps things to relieve dependency hell

Where historically, executables were often more monolithic things that controlled everything, modern systems tend to have applications live in an ecosystem of libraries and more, which keep working by merit of that environment staying sane.

The larger said ecosystem, the more friction and fragility there is, and harder to administer, e.g. more chance for "oh god I tried to update X and now everything is in a weird sorta-broken state that I don't understand.".


Keeping things mixed works out as a good solution for the OS itself, largely because there are people putting time into keeping that well defined.

Apps on the other hand do what they want, and are sometimes best kept isolated from not only each other but also from the OS, basically whenever that ends up simplifying the management.


Note that docker's a little overkill for just this - there's other ways to do this, some simpler, some newer.


Portability

The just-mentioned separation also means each container'd app will run the same regardless of your hardware or OS. Same as a VM, really.


Useful layer of abstraction for software packages

The tech for OS containers had existed in a usable state for over decade. Docker just made them a lot easier to actually use.

Or, as some other people put it, "docker is to apt what apt is to tar."

On top of that there are things like Docker Hub, repositories to share images people build, which makes it a software-service repository.

(Which confuses the above analogy in that it's like apt for docker. Also docker images are tarballs. Am I helping yet? :) )



Development and automating deployment

It's not too hard to set up so that a code push automatically triggers: testing, building an image, and starting it on a dev server.

Easy and fast container startup is pretty convenient to automate testing of a new version, both of the unit/regression sort, as of some amount of larger deployment. But also valid on just a single host.

docker diff can sometimes be rather useful in debugging works-for-me issues


Large-scale deployment

In clusters it's very handy to not have exactly reproducable environments, without fine management of each node and with minimal overhead.

When is it less useful?

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)
  • When there is no justification reason to use it.
because you'ld be adding complexity without any reason.
and distributed software is, by nature, harder to design and a lot harder to debug than a monolithic setup.
  • if you think it's the scalability band-aid
Docker only makes the deployment step easier, but efficiency and scalability are properties of design
  • takes some relearning on how to do things like least-privilege access, restorable backups, monitoring,
  • if security is your main interest
you get isolation, yes, but it's not the primary goal, and there are footnotes you must know
VMs are arguably better there, at least until security becomes a central focus for docker
  • some potential upsides, like automated setup and provisioning, aren't automatic.
they only really happen when you think about how to do them for your case, and it has to make sense for your use case.

Single-purpose

Security model

Good and bad ideas

Technical Intro

Some concepts

An image file is a completely specified environment that can be instantiated.

It is basically a snapshot of a filesystem.
They are often layered on top of other images
which makes it easier to build, and easier to version-control each layer
and since layers they are references (to things you also have) rather than copies, it makes heavily layered images much smaller
For example's sake you can rely on the default fetching of image files from Docker Hub.
(you can create image files yourself, and will eventually want to)


A container is a instance of an image file - either running, or stopped and not cleaned up yet.

Note that the changes to files from the image are not written to that image. They are copy-on-write to container-specific state.




Images and containers have IDs. They show up in most tools as large hexadecimal strings which (actually a smallish part of a larger (sha256) hash, see --no-trunc for the full thing).

Note that you only need to type as many characters to make it unique within what you have (which may only be one character),


It's still not what you want for larger-scale management, so

  • for images you often want to deal with repository-based aliases to images (which are shorthands that point at an image ID(verify)).
  • For containers there are names. The automatically generated ones (look like unruffled_curran) are meant to make it easier for humans to communicate about them (than a hexadecimal numbers are).
You can give them your own meaningful names -- but note they must be unique (so at scale you need some scheme)


#More on tags below, which matters when you make your own repository.



Introduction by example

Install docker.

Doing so via package management often does 90% of the setup work.


You may want to give a specific users extra rights, so that you don't need to do things as root (or via sudo). (But for a quick first playing around that's fine, and I'm doing it in the examples below)


Instantiating things

root@host# docker run -i -t ubuntu /bin/bash
root@3b66e09b0fa2:/# ps faux
root         1  0.0  0.0  18164  1984 ?        Ss   12:09   0:00 bash
root        15  0.0  0.0  15560  1104 ?        R+   12:11   0:00 ps faux

What happened:

  • it found an image called ubuntu (downloaded it from docker hub if not present yet)
  • instantiated the image to a container, with bash as entry point / main process
  • you manually ran ps within it. (Yes, the only processes inside right then are that shell and that command)


Notes:

  • The entry point is the main process, and also determines the container lifetime.
You would generally use an independent, long-running command. Exactly what is up to you.
In the microservice philosophy often the actual service (sometimes a process monitor)
note that the main process's stdout will go to the logs
It is also valid to see a container as half of an OS, with a stack of software, plus perhaps a ssh daemon. It's often cleaner to avoid that if you can, but it can sometimes make a lot of sense.
note:
docker kill
kills the main process, and thereby the container.
In this example the main process is bash, purely as a "prove it's a separate thing" example and means the container only lives until you log out.
also note that
-i -t
, for 'interactive' and 'allocte a tty' are only necessary because we want an interactive shell, which is not typical
  • If you didn't have the
    ubuntu
    image, then ubuntu:latest would have been downloaded from docker hub.
  • by default, the container id also becomes its hostname
  • the full container id is long. Most places where docker prints one or wants one needs only a few bytes (docker usually shows six bytes, twelve hex characters), because that's almost always unique
  • in many cases, you can also use the name, but note that these need to be unique


Status and cleanup, of images and instances

Status

See existing images on the host:

  • docker images
    lists present images
  • docker images -a
    includes non-named ones,
REPOSITORY            TAG                 IMAGE ID            CREATED             SIZE
<none>                <none>              eda5290e904b        2 months ago        70.5MB
ubuntu                bionic-20190515     7698f282e524        3 months ago        69.9MB


See running container instances on the host:

  • docker ps
    to list running containers
  • docker ps -a
    to list running and old containers
CONTAINER ID     IMAGE            COMMAND       CREATED          STATUS           PORTS      NAMES
3b66e09b0fa2     ubuntu:latest    "bash"        9 seconds ago    Up 8 seconds                drunk_mestorf



Cleanup

If you're done with images:

  • docker rmi ids
    to remove them, e.g. based on
    docker images
    .
It'll refuse when something currently depends on it.
  • docker image prune
    should clean up dangling images (or, with -a, unused)
dangling images are mostly build layers are no longer referred to (no repository and tag)


In default config, container instance state sticks around after the container stops.

This can be very useful for debug during development, but you do eventually want to clean them:

  • docker rm contid


If you don't care about post-mortem debugging, you can start containers with
--rm
to always have it immediately clean up after itself.


On image building

There are two basic ways to build an image:

  • manually: start with something close to what you want, make the changes you want
saving this container = saving all filesystem changes within it
good for a one-time quick fix, less ideal for some habits you'll need at larger scale
  • automate: write a dockerfile
docker build creates an image from a dockerfile - basically from a series of commands
faster to transfer, since images are cached
https://docs.docker.com/engine/reference/builder/


The hard way: manually

Say we want to build on the ubuntu image to make a web server image

root@host# docker run -i -t ubuntu /bin/bash
root@3b66e09b0fa2:/# apt-get install apache2
# ...and whatever else

To verify the difference to the original image, we can do:

root@host@ docker diff 3b66e

Which'll print ~1200 filenames. Fine, let's roll that into a new image

root@host# docker commit 3b66e my_ubuntu_apache
cbbb61030ba24dda25f2cb27af41cc7a96a5ad9d23ef2bb500e9eaa2e16aa44d

and check that it's now known for us:

root@host# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED              VIRTUAL SIZE
my_ubuntu_apache    latest              cbbb61030ba2        About a minute ago   202.7 MB
ubuntu              latest              b7cf8f0d9e82        2 weeks ago          188.3 MB

Notes:

  • commits are to your local repository. You won't accidentally write things on docker hub (or other linked site) until you are registered, you are linked, and do a docker push
actually a commit is more like a snapshot / layer
if you want things reusable and configurable, there are various good practices about how configuration and environment is best handled. You'll read up when you need it.


Okay, say we want to check the new image works - we can quit the old one and continue with the new one

root@3b66e09b0fa2:/# exit
root@host# docker run -it my_ubuntu_apache bash
root@d9bc62e40440:/# apt-get install apache2
Reading package lists... Done
Building dependency tree
Reading state information... Done
apache2 is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

Eventually, we'll want to docker run the container, and we want it to -->


The usual way: Dockerfiles

multi-stage builds

habits and tips

More on tags

The microservice philosophy

"avoid sshd"

Resource management

How does storage work?

How does networking work?

Limiting CPU, memory

Various commands

Starting, stopping

Pratical sides to

docker security

GUI apps in docker

You can host a VNC, RDP, or similar screen if you want it to be and entirely independent instance.


If you're doing it just for the don't-break-my-libraries reasons, then you may want to consider displaying directly to the host's X server. (Also, take a look at snap)

Since X is a networked protocol in the first place, that connection is not very complex - the most you have to worry about is X authentication, and potentially about X extensions.


Keep in mind that sound is a little extra work. As is using your GPU.


microservices in docker

SELinux and docker

Semi-sorted

"Cannot connect to the Docker daemon. Is the docker daemon running on this host?" or "Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock"

Typically means you don't have permissions to
/var/run/docker.sock
.

Which is often owned by root:docker, so while sudo works, the cleaner solution is to add the user (that you want to do this management) to the docker group.



Docker for windows

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

So, this one's confusing. As far as I can tell, so far...


Originally, docker on windows meant "Sure you can run linux containers on windows: Install VirtualBox on windows, run linux in that, and docker in that". This was somewhat awkward, but worked, and potentially quite valid: the VM's overhead appears only once, the instances within it work as you'ld expect, tooling was introduced to manage this from the windows side, meaning you could have linux docker basically as-is on windows (and of course run windows apps (or VMs) on the same metal and OS).

(Though you do need windows Pro or Server to do this, because you'll want Hyper-V which is missing in the basic windows versions)


The confusion came later, when MS said "Now we can run docker containers natively" and 'does not require VirtualBox'.

Because what it basically means is "we have made our own thing that runs only on windows". It primarily points as some in-kernel isolation (that imitates what linux did), which is useful.


The confusing part is that they called it docker. Sure you can build these images with dockerfiles, but that's where the similarity begins and ends.

The way these are run are completely unrelated to linux docker. There is zero technical overlap.


Again, the 'run things on the same metal' implication is useful, which now amounts to having docker for linux in a VM as described earlier, and adding native windows containers alongside.

And it's nice that the tooling was updated to manage both these systems.


But the containers are fundamentally and thoroughly incompatible, so it's misleading to use Docker as an umbella terms when this very introduction included a lot of conditionals about how to run it.

...but the tooling's the same. Well, on one system.


You'll notice we've extended the situation to half a dozen possible distinct concepts:

  • running linux containers on linux (= docker)
  • running linux containers on windows via a VM (= typical if you want the combination)
  • running linux containers on windows natively (MS have basically said this will never happen)
  • running windows containers on windows (= "docker windows")
  • running windows containers on linux natively (no, WSL cannot do that. And it can't really happen unless MS is fully on board with the idea, mainly because licenses)
  • running windows containers on linux via a windows VM (you can, but I don't suspect many people do this)


It's confusing to the point it seems like bad marketing -- or maybe really intentional marketing.


People have raised the question what MS is working towards in the long term.

It's not that MS doesn't understand the concept of an lightweight secure VM.

It's not that MS are bandwagoning for some open-source street cred.

They may just want to make sure they're somehow a player in a type of market that might exclude them.


So many have suggested this it's also a clever but surreptitious marketing strategy: You get people thinking they are not committing to a specific tech -- while leading them to do so:

If you get people to see docker as a mixed-OS thing, mainly due to the tooling, then if you squint you can say say they do it a little better.

Something you can't say if it a case of "windows also does something similar" more so because it'd make it clearer just how little this is about interoperability.


Also, if you get people to see docker as a windows-based application packager, then the next server your business buys may be a licensed windows server for "you never know, it will run all these hip dockery things" reasons.

Unsorted