Admittedly, this article is not too relevant for the average Docker user. It will demonstrate the internal structure of Docker images, how they consist of multiple layers, how those layers are stored on disk and why they are so efficient.
There already are a few articles on that topic, but many of them aren't up-to-date anymore because Docker changed their layering model some time ago, and it might change again at any time. This article describes the current state in February 2021.
General Image Structure
A Docker image consists of multiple read-only layers. When building an image from a Dockerfile, each Dockerfile instruction that modifies the filesystem of the base image creates a new layer. This new layer contains the actual modification to the filesystem, thus representing a diff to the previous state.
The Dockerfile instructions that modiy the image filesystem namely are ADD
, COPY
and RUN
.
Furthermore, a Docker image merely is a configuration object stored in JSON format. Such a JSON object contains, besides
to image metadata like the CMD
instruction, an ordered list of layers. Running docker image inspect
will print those
layers.
{ "RootFS": { "Type": "layers", "Layers": [ "sha256:bcf2f368fe234217249e00ad9d762d8f1a3156d60c442ed92079fa5b120634a1", "sha256:aabe8fddede54277f929724919213cc5df2ab4e4175a5ce45ff4e00909a4b757", "sha256:fbe16fc07f0d81390525c348fbd720725dcae6498bd5e902ce5d37f2b7eed743" ] } }
As the output shows, each image layer is identified by a digest in the form <algorithm>:<hash value of the layer>
,
which will be important later. These digests were introduced in Docker 1.10 and are referred to as
Content Addressable IDs, because the hash value corresponds to the layer's content.
Separating image objects from their layers was an important and deliberate decision, because it allows multiple images to reference one and the same layer. In doing so, a base image used for several custom images only has to be stored once on disk.
Building an Image
The following Dockerfile pulls an official Python image from Docker Hub, copies all files to /code
and launches some
application.
FROM python:3.4-alpine COPY . /code WORKDIR /code CMD ["python", "app.py"]
Building the corresponding image using docker image build --no-cache -t python-app .
yields the following output:
Status: Downloaded newer image for python:3.4-alpine ---> c06adcf62f6e Step 2/4 : COPY . /code ---> 35706ada4e09 Step 3/4 : WORKDIR /code ---> Running in 9ef5d63297f7 Removing intermediate container 9ef5d63297f7 ---> c80bbfbdad34 Step 4/4 : CMD ["python", "app.py"] ---> Running in bcd75eac5de0 Removing intermediate container bcd75eac5de0 ---> 35cfdf68d3a6
This output provides information on the structure of the built image. On local builds, a temporary image is created for each layer that gets committed to the final image.
Such temporary images are distinguishable by their IDs, e. g. c06adcf62f6e
. Even Dockerfile instructions that do not
modify the filesystem produce a temporary image. Temporary images are a particularity of local builds, allowing the
utilization of the build cache for better performance.
Displaying the Individual Layers
All the various layers from the example image built above can be displayed using docker image history python-app
. Even
the layer sizes are listed.
IMAGE CREATED CREATED BY SIZE f07355217003 9 seconds ago /bin/sh -c #(nop) CMD ["python" "app.py"] 0B ed9ac8b8ab0d 10 seconds ago /bin/sh -c #(nop) WORKDIR /code 0B 0e81b30e3c9f 11 seconds ago /bin/sh -c #(nop) COPY dir:9a64c4777f86fbfd1… 38MB c06adcf62f6e 11 months ago /bin/sh -c #(nop) CMD ["python3"] 0B <missing> 11 months ago /bin/sh -c set -ex; wget -O get-pip.py 'ht… 6.04MB <missing> 11 months ago /bin/sh -c #(nop) ENV PYTHON_PIP_VERSION=19… 0B <missing> 11 months ago /bin/sh -c cd /usr/local/bin && ln -s idle3… 32B <missing> 11 months ago /bin/sh -c set -ex && apk add --no-cache --… 60.8MB <missing> 11 months ago /bin/sh -c #(nop) ENV PYTHON_VERSION=3.4.10 0B <missing> 11 months ago /bin/sh -c #(nop) ENV GPG_KEY=97FC712E4C024… 0B <missing> 11 months ago /bin/sh -c apk add --no-cache ca-certificates 551kB <missing> 11 months ago /bin/sh -c #(nop) ENV LANG=C.UTF-8 0B <missing> 11 months ago /bin/sh -c #(nop) ENV PATH=/usr/local/bin:/… 0B <missing> 11 months ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B <missing> 11 months ago /bin/sh -c #(nop) ADD file:88875982b0512a9d0… 5.53MB
Two things are striking here: First of all, the first column heading is image, not layer. This has historical reasons, because prior to Docker
1.10 every layer was associated with an image. Apart from that, this is due to the abovementioned particularity of
local image builds: The layers actually are (temporary) images stored on the host system. Entries with <missing>
as
image names originate from the base image, which has not been built locally. Those 'images' merely are layers.
Second, some layers apparently have a size of 0 bytes while other layers are several megabytes in size. Only layers
associated with a filesystem modification – i. e. those that originate from ADD
, COPY
or RUN
– have a size. In
this case, these are five layers in total. All the other Dockerfile instructions are combined in a single, empty layer.
You can verify this using docker image inspect python-app
:
{ "RootFS": { "Type": "layers", "Layers": [ "sha256:bcf2f368fe234217249e00ad9d762d8f1a3156d60c442ed92079fa5b120634a1", "sha256:aabe8fddede54277f929724919213cc5df2ab4e4175a5ce45ff4e00909a4b757", "sha256:fbe16fc07f0d81390525c348fbd720725dcae6498bd5e902ce5d37f2b7eed743", "sha256:58026b9b6bf1a7dbc0872462e9ea675cad54a45bc7682bd3631dd4f3c16b1332", "sha256:62de8bcc470aef81ddbec19b7f5aeed24d7b7ec1bff09422f7e0da3a4842d346", "sha256:8605394513ec8103a4b386e62f5dcca888651e770d36d4a58bc0f1a723526e1d" ] } }
Six layers are referenced here: The five named layers with a physical size that represent a diff to the filesystem, as well as one layer for all remaining instructions.
The ImageDB
Images are physical JSON configuration objects. Docker stores these configuration objects in
/var/lib/docker/<driver>/imagedb
, where <driver>
is the storage driver. Typically, the overlay driver is overlay2
.
After changing into that directory, all of its files or images, respectively, can be displayed with ls -l
. The
temporary images mentioned before also appear in this list.
6293 Jun 22 08:00 35706ada4e09a0a7f73e5c802b0cc1710203b8c7ca043396129b642e0d6831aa 6546 Jun 22 08:00 35cfdf68d3a69aecd9a2fcb5322280437c86e8fac01465357ba650fcb402cf03 6086 Jun 22 08:00 c06adcf62f6ef21ae5c586552532b04b693f9ab6df377d7ea066fd682c470864 6433 Jun 22 08:00 c80bbfbdad34999518ef3caf434c13cbe0ede4dc224da400ba318fe32dc77c03
If you now print the contents of one of these files using cat
, the same output as with docker image inspect
appears,
its contents are merely unformatted. This means that the layer digests are also displayed here:
{ "rootfs": { "type": "layers", "diff_ids": [ "sha256:bcf2f368fe234217249e00ad9d762d8f1a3156d60c442ed92079fa5b120634a1", "sha256:aabe8fddede54277f929724919213cc5df2ab4e4175a5ce45ff4e00909a4b757", "sha256:fbe16fc07f0d81390525c348fbd720725dcae6498bd5e902ce5d37f2b7eed743", "sha256:58026b9b6bf1a7dbc0872462e9ea675cad54a45bc7682bd3631dd4f3c16b1332", "sha256:62de8bcc470aef81ddbec19b7f5aeed24d7b7ec1bff09422f7e0da3a4842d346", "sha256:b6ffd37affa5acc285f4fa06b2f93bef635ac774c6a49038a682b678f125e5dc" ] } }
Let's remember the first layer in this list, whose hash value starts with bcf2
.
The LayerDB
Layers are stored similar to images internally. However, a layer is not described in the form of a single file - it has
its own directory in /var/lib/docker/<driver>/layerdb/<algorithm>
instead. We've come full circle now. <algorithm>
is the algorithm used for the layer digest. According to the layer list above, this is sha256
.
And indeed: The directory /var/lib/docker/overlay2/layerdb/sha256
contains all the layer directories named after their
hash values.
85 Jun 18 08:00 31af966d116c33f2120c0bbbb603026bc9aafd09f716e44feae9f8786e874da3 85 Jun 18 08:00 62b3e01ef883f4dc5459318cff905442c1ab3db4a09fe32c7020a9aa2ab819fa 71 Jun 18 08:00 bcf2f368fe234217249e00ad9d762d8f1a3156d60c442ed92079fa5b120634a1 85 Jun 18 08:00 cd22abdaccb6b4ce8dc28afbf4c03b9711c99990068dcbddc476bebd4395e899 85 Jun 18 08:00 cd296739505d41d98f1bcb846b11fa5655cc0a157851f8bb0e2a51451ea875a2 85 Jun 18 08:00 d95b0c9211330afe1efda7c47eb3f45116a20e981fbd38e9d1259d2994f29a59
Notice that the remembered layer bcf2
also appears here at the third position. Each of these layer directories
contains some files with further information:
diff
: Contains the hash value of the layer and therefore is identical to the directory namesize
: Contains the layer's physical sizecache-id
: Contains the ID of the associated layer cache
The inconspicuous file cache-id
is the key to the whole magic behind layers. For example, this file contains the
following ID for our bcf2
layer:
$ cat cache-id 640a857fec521662acd5324b88340a1597bb53c56085176af729bb3021471c22
What makes up the layer is hidden behind the cache with this specific ID.
The Caches
Docker stores all caches in /var/lib/docker/<driver>
, where <driver>
is the storage driver overlay2
again. Just as
with layer directories, the directory name corresponds to the cache ID.
72 Jun 22 08:00 1a75cb4ce44af189da03799c159344694ae7e7107479047d6f4e892e87b365ba 72 Jun 22 08:00 2dfc6b8be7d1773c4d2387a8841f19bb6d173e202e361be6aef81547e6c3fffb 47 Jun 22 08:00 640a857fec521662acd5324b88340a1597bb53c56085176af729bb3021471c22 72 Jun 22 08:00 c168758d87097d29d5e8b005a0d1cf856434a392aa021e3dcc031d878bcfd45b 72 Jun 22 08:00 c92c65c7461f85d4ad87e7ffc78bc72c6278db039457c3022395cd829733e1d1 72 Jun 22 08:00 d05f04050b81cc53223cb87e2deb1fb7634e9fc2c045677731cb4ee953e2f2db
The directory contains six layers in total as there is exactly one cache for each layer. The third directory in the list
is the cache referenced in the cache-id
file above.
Next to some metadata, this cache directory contains a sub-directory called diff
which represents the modifications
to the image filesystem. If a COPY
instruction in the Dockerfile copies a file into the image filesystem, this file
is located right here. The layer therefore only contains this specific file as a diff to the previous state of the
filesystem.
All those caches along with their files add up to the final image progressively. You could also say that they merge
together, which is why this type of filesystem is called a Union Filesystem. Accordingly, the diff
directory for
a layer that adds Linux to the image looks as follows:
$ ls -l diff 4096 Jun 4 2019 bin 6 Jun 4 2019 dev 4096 Jun 4 2019 etc 6 Jun 4 2019 home 185 Jun 4 2019 lib 44 Jun 4 2019 media 6 Jun 4 2019 mnt 6 Jun 4 2019 opt ...
This highly flexible system allows Docker to build and store images very efficiently.
The Writable Layer
All these layers are read-only layers, thus a container cannot modify a file from the image filesystem. The advantage of this restriction is that any number of containers can be started from one and the same image. Plus, the state of a freshly created container is predictable.
To grant at least some kind of write access to containers, Docker utilizes a mechanism called Copy on Write. When a container is started, a thin writable layer is laid on top of the read-only image layers. Once the container modifies a file from the image filesystem at runtime, the respective file is copied into that so-called container layer and will be modified there. From the container's point of view this is the original file, because the copied file overlays the file from the image.
Storing only modified files in a thin, ephemeral container layer enables short start-up times for containers. When a container is removed, the writable container layer disappears as well and the original image remains unchanged.