Docker Images and Their Layers Explained

Admittedly, this article is not too relevant for the average Docker user. It will demonstrate the internal structure of Docker images, how they consist of multiple layers, how those layers are stored on disk and why they are so efficient.

There already are a few articles on that topic, but many of them aren't up-to-date anymore because Docker changed their layering model some time ago, and it might change again at any time. This article describes the current state in February 2021.

General Image Structure

A Docker image consists of multiple read-only layers. When building an image from a Dockerfile, each Dockerfile instruction that modifies the filesystem of the base image creates a new layer. This new layer contains the actual modification to the filesystem, thus representing a diff to the previous state.

The Dockerfile instructions that modiy the image filesystem namely are ADD, COPY and RUN.

Furthermore, a Docker image merely is a configuration object stored in JSON format. Such a JSON object contains, besides to image metadata like the CMD instruction, an ordered list of layers. Running docker image inspect will print those layers.

{
    "RootFS": {
        "Type": "layers",
        "Layers": [
            "sha256:bcf2f368fe234217249e00ad9d762d8f1a3156d60c442ed92079fa5b120634a1",
            "sha256:aabe8fddede54277f929724919213cc5df2ab4e4175a5ce45ff4e00909a4b757",
            "sha256:fbe16fc07f0d81390525c348fbd720725dcae6498bd5e902ce5d37f2b7eed743"
        ]
    }
}

As the output shows, each image layer is identified by a digest in the form <algorithm>:<hash value of the layer>, which will be important later. These digests were introduced in Docker 1.10 and are referred to as Content Addressable IDs, because the hash value corresponds to the layer's content.

Separating image objects from their layers was an important and deliberate decision, because it allows multiple images to reference one and the same layer. In doing so, a base image used for several custom images only has to be stored once on disk.

Building an Image

The following Dockerfile pulls an official Python image from Docker Hub, copies all files to /code and launches some application.

FROM python:3.4-alpine
COPY . /code
WORKDIR /code
CMD ["python", "app.py"]

Building the corresponding image using docker image build --no-cache -t python-app . yields the following output:

Status: Downloaded newer image for python:3.4-alpine
---> c06adcf62f6e

Step 2/4 : COPY . /code
---> 35706ada4e09

Step 3/4 : WORKDIR /code
---> Running in 9ef5d63297f7
Removing intermediate container 9ef5d63297f7
---> c80bbfbdad34

Step 4/4 : CMD ["python", "app.py"]
---> Running in bcd75eac5de0
Removing intermediate container bcd75eac5de0
---> 35cfdf68d3a6

This output provides information on the structure of the built image. On local builds, a temporary image is created for each layer that gets committed to the final image.

Such temporary images are distinguishable by their IDs, e. g. c06adcf62f6e. Even Dockerfile instructions that do not modify the filesystem produce a temporary image. Temporary images are a particularity of local builds, allowing the utilization of the build cache for better performance.

Displaying the Individual Layers

All the various layers from the example image built above can be displayed using docker image history python-app. Even the layer sizes are listed.

IMAGE               CREATED             CREATED BY                                      SIZE
f07355217003        9 seconds ago       /bin/sh -c #(nop)  CMD ["python" "app.py"]      0B
ed9ac8b8ab0d        10 seconds ago      /bin/sh -c #(nop) WORKDIR /code                 0B
0e81b30e3c9f        11 seconds ago      /bin/sh -c #(nop) COPY dir:9a64c4777f86fbfd1…   38MB
c06adcf62f6e        11 months ago       /bin/sh -c #(nop)  CMD ["python3"]              0B
<missing>           11 months ago       /bin/sh -c set -ex;   wget -O get-pip.py 'ht…   6.04MB
<missing>           11 months ago       /bin/sh -c #(nop)  ENV PYTHON_PIP_VERSION=19…   0B
<missing>           11 months ago       /bin/sh -c cd /usr/local/bin  && ln -s idle3…   32B
<missing>           11 months ago       /bin/sh -c set -ex  && apk add --no-cache --…   60.8MB
<missing>           11 months ago       /bin/sh -c #(nop)  ENV PYTHON_VERSION=3.4.10    0B
<missing>           11 months ago       /bin/sh -c #(nop)  ENV GPG_KEY=97FC712E4C024…   0B
<missing>           11 months ago       /bin/sh -c apk add --no-cache ca-certificates   551kB
<missing>           11 months ago       /bin/sh -c #(nop)  ENV LANG=C.UTF-8             0B
<missing>           11 months ago       /bin/sh -c #(nop)  ENV PATH=/usr/local/bin:/…   0B
<missing>           11 months ago       /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B
<missing>           11 months ago       /bin/sh -c #(nop) ADD file:88875982b0512a9d0…   5.53MB

Two things are striking here: First of all, the first column heading is image, not layer. This has historical reasons, because prior to Docker 1.10 every layer was associated with an image. Apart from that, this is due to the abovementioned particularity of local image builds: The layers actually are (temporary) images stored on the host system. Entries with <missing> as image names originate from the base image, which has not been built locally. Those 'images' merely are layers.

Second, some layers apparently have a size of 0 bytes while other layers are several megabytes in size. Only layers associated with a filesystem modification – i. e. those that originate from ADD, COPY or RUN – have a size. In this case, these are five layers in total. All the other Dockerfile instructions are combined in a single, empty layer.

You can verify this using docker image inspect python-app:

{
    "RootFS": {
        "Type": "layers",
        "Layers": [
            "sha256:bcf2f368fe234217249e00ad9d762d8f1a3156d60c442ed92079fa5b120634a1",
            "sha256:aabe8fddede54277f929724919213cc5df2ab4e4175a5ce45ff4e00909a4b757",
            "sha256:fbe16fc07f0d81390525c348fbd720725dcae6498bd5e902ce5d37f2b7eed743",
            "sha256:58026b9b6bf1a7dbc0872462e9ea675cad54a45bc7682bd3631dd4f3c16b1332",
            "sha256:62de8bcc470aef81ddbec19b7f5aeed24d7b7ec1bff09422f7e0da3a4842d346",
            "sha256:8605394513ec8103a4b386e62f5dcca888651e770d36d4a58bc0f1a723526e1d"
        ]
    }
}

Six layers are referenced here: The five named layers with a physical size that represent a diff to the filesystem, as well as one layer for all remaining instructions.

The ImageDB

Images are physical JSON configuration objects. Docker stores these configuration objects in /var/lib/docker/<driver>/imagedb, where <driver> is the storage driver. Typically, the overlay driver is overlay2. After changing into that directory, all of its files or images, respectively, can be displayed with ls -l. The temporary images mentioned before also appear in this list.

6293 Jun 22 08:00 35706ada4e09a0a7f73e5c802b0cc1710203b8c7ca043396129b642e0d6831aa
6546 Jun 22 08:00 35cfdf68d3a69aecd9a2fcb5322280437c86e8fac01465357ba650fcb402cf03
6086 Jun 22 08:00 c06adcf62f6ef21ae5c586552532b04b693f9ab6df377d7ea066fd682c470864
6433 Jun 22 08:00 c80bbfbdad34999518ef3caf434c13cbe0ede4dc224da400ba318fe32dc77c03

If you now print the contents of one of these files using cat, the same output as with docker image inspect appears, its contents are merely unformatted. This means that the layer digests are also displayed here:

{
    "rootfs": {
        "type": "layers",
        "diff_ids": [
            "sha256:bcf2f368fe234217249e00ad9d762d8f1a3156d60c442ed92079fa5b120634a1",
            "sha256:aabe8fddede54277f929724919213cc5df2ab4e4175a5ce45ff4e00909a4b757",
            "sha256:fbe16fc07f0d81390525c348fbd720725dcae6498bd5e902ce5d37f2b7eed743",
            "sha256:58026b9b6bf1a7dbc0872462e9ea675cad54a45bc7682bd3631dd4f3c16b1332",
            "sha256:62de8bcc470aef81ddbec19b7f5aeed24d7b7ec1bff09422f7e0da3a4842d346",
            "sha256:b6ffd37affa5acc285f4fa06b2f93bef635ac774c6a49038a682b678f125e5dc"
        ]
    }
}

Let's remember the first layer in this list, whose hash value starts with bcf2.

The LayerDB

Layers are stored similar to images internally. However, a layer is not described in the form of a single file - it has its own directory in /var/lib/docker/<driver>/layerdb/<algorithm> instead. We've come full circle now. <algorithm> is the algorithm used for the layer digest. According to the layer list above, this is sha256.

And indeed: The directory /var/lib/docker/overlay2/layerdb/sha256 contains all the layer directories named after their hash values.

85 Jun 18 08:00 31af966d116c33f2120c0bbbb603026bc9aafd09f716e44feae9f8786e874da3
85 Jun 18 08:00 62b3e01ef883f4dc5459318cff905442c1ab3db4a09fe32c7020a9aa2ab819fa
71 Jun 18 08:00 bcf2f368fe234217249e00ad9d762d8f1a3156d60c442ed92079fa5b120634a1
85 Jun 18 08:00 cd22abdaccb6b4ce8dc28afbf4c03b9711c99990068dcbddc476bebd4395e899
85 Jun 18 08:00 cd296739505d41d98f1bcb846b11fa5655cc0a157851f8bb0e2a51451ea875a2
85 Jun 18 08:00 d95b0c9211330afe1efda7c47eb3f45116a20e981fbd38e9d1259d2994f29a59

Notice that the remembered layer bcf2 also appears here at the third position. Each of these layer directories contains some files with further information:

diff: Contains the hash value of the layer and therefore is identical to the directory name
size: Contains the layer's physical size
cache-id: Contains the ID of the associated layer cache

The inconspicuous file cache-id is the key to the whole magic behind layers. For example, this file contains the following ID for our bcf2 layer:

$ cat cache-id
  640a857fec521662acd5324b88340a1597bb53c56085176af729bb3021471c22

What makes up the layer is hidden behind the cache with this specific ID.

The Caches

Docker stores all caches in /var/lib/docker/<driver>, where <driver> is the storage driver overlay2 again. Just as with layer directories, the directory name corresponds to the cache ID.

72 Jun 22 08:00 1a75cb4ce44af189da03799c159344694ae7e7107479047d6f4e892e87b365ba
72 Jun 22 08:00 2dfc6b8be7d1773c4d2387a8841f19bb6d173e202e361be6aef81547e6c3fffb
47 Jun 22 08:00 640a857fec521662acd5324b88340a1597bb53c56085176af729bb3021471c22
72 Jun 22 08:00 c168758d87097d29d5e8b005a0d1cf856434a392aa021e3dcc031d878bcfd45b
72 Jun 22 08:00 c92c65c7461f85d4ad87e7ffc78bc72c6278db039457c3022395cd829733e1d1
72 Jun 22 08:00 d05f04050b81cc53223cb87e2deb1fb7634e9fc2c045677731cb4ee953e2f2db

The directory contains six layers in total as there is exactly one cache for each layer. The third directory in the list is the cache referenced in the cache-id file above.

Next to some metadata, this cache directory contains a sub-directory called diff which represents the modifications to the image filesystem. If a COPY instruction in the Dockerfile copies a file into the image filesystem, this file is located right here. The layer therefore only contains this specific file as a diff to the previous state of the filesystem.

All those caches along with their files add up to the final image progressively. You could also say that they merge together, which is why this type of filesystem is called a Union Filesystem. Accordingly, the diff directory for a layer that adds Linux to the image looks as follows:

$ ls -l diff
  4096 Jun  4  2019 bin
     6 Jun  4  2019 dev
  4096 Jun  4  2019 etc
     6 Jun  4  2019 home
   185 Jun  4  2019 lib
    44 Jun  4  2019 media
     6 Jun  4  2019 mnt
     6 Jun  4  2019 opt
                    ...

This highly flexible system allows Docker to build and store images very efficiently.

The Writable Layer

All these layers are read-only layers, thus a container cannot modify a file from the image filesystem. The advantage of this restriction is that any number of containers can be started from one and the same image. Plus, the state of a freshly created container is predictable.

To grant at least some kind of write access to containers, Docker utilizes a mechanism called Copy on Write. When a container is started, a thin writable layer is laid on top of the read-only image layers. Once the container modifies a file from the image filesystem at runtime, the respective file is copied into that so-called container layer and will be modified there. From the container's point of view this is the original file, because the copied file overlays the file from the image.

Storing only modified files in a thin, ephemeral container layer enables short start-up times for containers. When a container is removed, the writable container layer disappears as well and the original image remains unchanged.

Docker Images and Their Layers Explained

General Image Structure

Building an Image

Displaying the Individual Layers

The ImageDB

The LayerDB

The Caches

The Writable Layer

July 8, 2020 in Docker

How the Docker Build Cache Works

July 3, 2020 in Docker

Running a Private Docker Image Registry