Docker Volumes: A Quick Summary

Volumes are another convenient way for decoupling data from containers. Compared to bind mounts, they offer some advantages you should be familiar with.

July 2, 2020 in Docker

We've discussed the fundamental motivation behind the decoupling of data from containers in Docker Bind Mounts: A quick summary: Containers are not able to permanently modify the filesystem of their underlying image, thus all modifications to the filesystem will be removed along with the container. If you want to persist data generated by a container, you have to outsource that data to the host system. The preferred way of doing this are Docker Volumes.

Differences and Advantages Over Bind Mounts

Unlike with bind mounts, the user doesn't specify a host directory that they want to mount into the container. Instead, the user specifies a directory in the container that should be integrated as a volume. The volume and its data is then stored in a Docker-managed area on the host system.

Docker Volumes concept

This concept brings some advantages compared to the more traditional bind mounts.

  • Volumes can be named, making them easy to handle.
  • Users don't have to manage the storage location themselves.
  • Volumes are first-class citizens in Docker and thereby manageable via the CLI.
  • For an arbitrary scenario, volumes can use a volume driver that fits best.
  • Docker completely manages volumes itself while considering custom settings.

Let me give an example: The official PostgreSQL image stores its data in /var/lib/postgres/data, which is mounted as a volume here. Syntactically, the corresponding command only differs from a bind mount in that the host directory is missing.

$ docker container run -d \
  --name database \
  -v /var/lib/postgres/data \
  postgres

Creating Named Volumes

In the example above, I've completely outsourced the PostgreSQL data into an own volume. This volume has been created as an anonymous volume because there was no volume name specified. Running the following command indicates that the volume got a random digest as name instead:

$ docker volume ls
  DRIVER              VOLUME NAME
  local                21aef22affde3ed4ede8c0d66092f73240147099856688389256c05ddd311da8

To simplify the management and maintenance, an explicit name can be assigned to a volume. When using the -v option, this can be accomplished by putting a name like pg-data in front of the filepath.

$ docker container run -d \
  --name database \
  -v pg-data:/var/lib/postgres/data \
  postgres

A named volume can also be created independently of a container:

$ docker volume create pg-data

docker volume create allows the user to specify arbitrary options, such as the use of a different volume driver. Once a volume is created, it is manageable via the Docker CLI as a first-class citizen. You can remove the created volume and its data using docker volume rm pg-data.

Why --mount Is Preferable

If the -v pg-data:/var/lib/postgres/data seems familiar to you, you have probably come into contact with bind mounts already. Unfortunately, the only syntactical difference between a named volume and a bind mount is that the syntax for a bind mount expects an absolute path. An option like -v $(pwd)/pg-data:/var/lib/postgres/data would have created a bind mount instead of a volume. This slight difference is confusing even for more experienced users.

To avoid unnecessary confusion at this point, it is very advisable to rely on the more readable --mount option with its self-explanatory syntax instead. Paying attention whether the first argument is a volume name or an absolute path isn't necessary here, because the mount type is specified explicitly.

$ docker container run -d \
  --name database \
  --mount type=volume,source=pg-data,destination=/var/lib/postgres/data \
  postgres

Create a Backup Using Temporary Containers

The fact that all volume data is exclusively managed by Docker doesn't mean that we cannot work with the data anymore. A volume's name provides a valid reference for working with that data.

Given the scenario that the PostgreSQL container database stores its data in a volume named pg-data and we want to create a backup in the local directory db-backup, there are a number of well-understood patterns for doing this. One of them is to create a temporary container which mounts the volume and copies data into a target directory.

$ docker container run -d \
  --rm \
  --mount type=volume,source=pg-data,destination=/from \
  --mount type=bind,source=$(pwd)/db-backup,destination=/to \
  ubuntu \
  tar cvf /to/backup.tar /from

After mounting the data volume into the /from directory and bind mounting db-backup into the /to directory, all that has to be done is copying the contents of /from into /to. This example command even goes a bit further as it packs the volume data in a TAR. We've obtained the volume data as an archive stored on the host system as a result.

Thus, volumes are a perfect fit for making data independent of the container working with it. Since they are highly flexible and even provide the possibility to store data on a different host or in a cloud storage like AWS S3, you will come into contact with them sooner or later for sure.

💫graph: A library for creating generic graph data structures and modifying, analyzing, and visualizing them.
💫
graph: A library for creating generic graph data structures and modifying, analyzing, and visualizing them.