Get started with the latest updates for Dockerfile syntax (v1.7.0) | Docker


Dockerfiles are fundamental tools for developers working with Docker, serving as a blueprint for creating Docker images. These text documents contain all the commands a user could call on the command line to assemble an image. Understanding and effectively utilizing Dockerfiles can significantly streamline the development process, allowing for the automation of image creation and ensuring consistent environments across different stages of development. Dockerfiles are pivotal in defining project environments, dependencies, and the configuration of applications within Docker containers.

With new versions of the BuildKit builder toolkit, Docker Buildx CLI, and Dockerfile frontend for BuildKit (v1.7.0), developers now have access to enhanced Dockerfile capabilities. This blog post delves into these new Dockerfile capabilities and explains how you can can leverage them in your projects to further optimize your Docker workflows.

Versioning

Before we get started, here’s a quick reminder of how Dockerfile is versioned and what you should do to update it. 

Although most projects use Dockerfiles to build images, BuildKit is not limited only to that format. BuildKit supports multiple different frontends for defining the build steps for BuildKit to process. Anyone can create these frontends, package them as regular container images, and load them from a registry when you invoke the build.

With the new release, we have published two such images to Docker Hub: docker/dockerfile:1.7.0 and docker/dockerfile:1.7.0-labs.

To use these frontends, you need to specify a #syntax directive at the beginning of the file to tell BuildKit which frontend image to use for the build. Here we have set it to use the latest of the 1.x.x major version. For example:

#syntax=docker/dockerfile:1

FROM alpine
...

This means that BuildKit is decoupled from the Dockerfile frontend syntax. You can start using new Dockerfile features right away without worrying about which BuildKit version you’re using. All the examples described in this article will work with any version of Docker that supports BuildKit (the default builder as of Docker 23), as long as you define the correct #syntax directive on the top of your Dockerfile.

You can learn more about Dockerfile frontend versions in the documentation. 

Variable expansions

When you write Dockerfiles, build steps can contain variables that are defined using the build arguments (ARG) and environment variables (ENV) instructions. The difference between build arguments and environment variables is that environment variables are kept in the resulting image and persist when a container is created from it.

When you use such variables, you most likely use ${NAME} or, more simply, $NAME in COPY, RUN, and other commands.

You might not know that Dockerfile supports two forms of Bash-like variable expansion:

  • ${variable:-word}: Sets a value to word if the variable is unset
  • ${variable:+word}: Sets a value to word if the variable is set

Up to this point, these special forms were not that useful in Dockerfiles because the default value of ARG instructions can be set directly:

FROM alpine
ARG foo="default value"

If you are an expert in various shell applications, you know that Bash and other tools usually have many additional forms of variable expansion to ease the development of your scripts.

In Dockerfile v1.7, we have added:

  • ${variable#pattern} and ${variable##pattern} to remove the shortest or longest prefix from the variable’s value.
  • ${variable%pattern} and ${variable%%pattern} to remove the shortest or longest prefix from the variable’s value.
  • ${variable/pattern/replacement} to first replace occurrence of a pattern
  • ${variable//pattern/replacement} to replace all occurrences of a pattern

How these rules are used might not be completely obvious at first. So, let’s look at a few examples seen in actual Dockerfiles.

For example, projects often can’t agree on whether versions for downloading your dependencies should have a “v” prefix or not. The following allows you to get the format you need:

# example VERSION=v1.2.3
ARG VERSION=${VERSION#v}
# VERSION is now '1.2.3'

In the next example, multiple variants are used by the same project:

ARG VERSION=v1.7.13
ADD https://github.com/containerd/containerd/releases/download/${VERSION}/containerd-${VERSION#v}-linux-amd64.tar.gz / 

To configure different command behaviors for multi-platform builds, BuildKit provides useful built-in variables like TARGETOS and TARGETARCH. Unfortunately, not all projects use the same values. For example, in containers and the Go ecosystem, we refer to 64-bit ARM architecture as arm64, but sometimes you need aarch64 instead.

ADD https://github.com/oven-sh/bun/releases/download/bun-v1.0.30/bun-linux-${TARGETARCH/arm64/aarch64}.zip /

In this case, the URL also uses a custom name for AMD64 architecture. To pass a variable through multiple expansions, use another ARG definition with an expansion from the previous value. You could also write all the definitions on a single line, as ARG allows multiple parameters, which may hurt readability.

ARG ARCH=${TARGETARCH/arm64/aarch64}
ARG ARCH=${ARCH/amd64/x64}
ADD https://github.com/oven-sh/bun/releases/download/bun-v1.0.30/bun-linux-${ARCH}.zip /

Note that the example above is written in a way that if a user passes their own --build-arg ARCH=value, then that value is used as-is.

Now, let’s look at how new expansions can be useful in multi-stage builds.

One of the techniques described in “Advanced multi-stage build patterns” shows how build arguments can be used so that different Dockerfile commands run depending on the build-arg value. For example, you can use that pattern if you build a multi-platform image and want to run additional COPY or RUN commands only for specific platforms. If this method is new to you, you can learn more about it from that post.

In summarized form, the idea is to define a global build argument and then define build stages that use the build argument value in the stage name while pointing to the base of your target stage via the build-arg name.

Old example:

ARG BUILD_VERSION=1

FROM alpine AS base
RUN …

FROM base AS branch-version-1
RUN touch version1

FROM base AS branch-version-2
RUN touch version2

FROM branch-version-${BUILD_VERSION} AS after-condition

FROM after-condition
RUN …

When using this pattern for multi-platform builds, one of the limitations is that all the possible values for the build-arg need to be defined by your Dockerfile. This is problematic as we want Dockerfile to be built in a way that it can build on any platform and not limit it to a specific set. 

You can see other examples here and here of Dockerfiles where dummy stage aliases must be defined for all architectures, and no other architecture can be built. Instead, the pattern we would like to use is that there is one architecture that has a special behavior, and everything else shares another common behavior.

With new expansions, we can write this to demonstrate running special commands only on RISC-V, which is still somewhat new and may need custom behavior:

#syntax=docker/dockerfile:1.7

ARG ARCH=${TARGETARCH#riscv64}
ARG ARCH=${ARCH:+"common"}
ARG ARCH=${ARCH:-$TARGETARCH}

FROM --platform=$BUILDPLATFORM alpine AS base-common
ARG TARGETARCH
RUN echo "Common build, I am $TARGETARCH" > /out

FROM --platform=$BUILDPLATFORM alpine AS base-riscv64
ARG TARGETARCH
RUN echo "Riscv only special build, I am $TARGETARCH" > /out

FROM base-${ARCH} AS base

Let’s look at these ARCH definitions more closely.

  • The first sets ARCH to TARGETARCH but removes riscv64 from the value.
  • Next, as we described previously, we don’t actually want the other architectures to use their own values but instead want them all to share a common value. So, we set ARCH to common except if it was cleared from the previous riscv64 rule. 
  • Now, if we still have an empty value, we default it back to $TARGETARCH.
  • The last definition is optional, as we would already have a unique value for both cases, but it makes the final stage name base-riscv64 nicer to read.

Additional examples of including multiple conditions with shared conditions, or conditions based on architecture variants can be found in this GitHub Gist page.

Comparing this example to the initial example of conditions between stages, the new pattern isn’t limited to just controlling the platform differences of your builds but can be used with any build-arg. If you have used this pattern before, then you can effectively now define an “else” clause, whereas previously, you were limited to only “if” clauses.

Copy with keeping parent directories

The following feature has been released in the “labs” channel. Define the following at the top of your Dockerfile to use this feature.

#syntax=docker/dockerfile:1.7-labs

When you are copying files in your Dockerfile, for example, do this:

COPY app/file /to/dest/dir/

This example means the source file is copied directly to the destination directory. If your source path was a directory, all the files inside that directory would be copied directly to the destination path.

What if you have a file structure like the following:

.
├── app1
│   ├── docs
│   │   └── manual.md
│   └── src
│       └── server.go
└── app2
    └── src
        └── client.go

You want to copy only files in app1/src, but so that the final files at the destination would be /to/dest/dir/app1/src/server.go and not just /to/dest/dir/server.go.

With the new COPY --parents flag, you can write:

COPY --parents /app1/src/ /to/dest/dir/  

This will copy the files inside the src directory and recreate the app1/src directory structure for these files.

Things get more powerful when you start to use wildcard paths. To copy the src directories for both apps into their respective locations, you can write:

COPY --parents */src/ /to/dest/dir/ 

This will create both /to/dest/dir/app1 and /to/dest/dir/app2, but it will not copy the docs directory. Previously, this kind of copy was not possible with a single command. You would have needed multiple copies for individual files (as shown in this example) or used some workaround with the RUN --mount instruction instead.

You can also use double-star wildcard (**) to match files under any directory structure. For example, to copy only the Go source code files anywhere in your build context, you can write:

COPY --parents **/*.go /to/dest/dir/

If you are thinking about why you would need to copy specific files instead of just using COPY ./ to copy all files, remember that your build cache gets invalidated when you include new files in your build. If you copy all files, the cache gets invalidated when any file is added or changed, whereas if you copy only Go files, only changes in these files influence the cache.

The new --parents flag is not only for COPY instructions from your build context, but obviously, you can also use them in multi-stage builds when copying files between stages using COPY --from

Note that with COPY --from syntax, all source paths are expected to be absolute, meaning that if the --parents flag is used with such paths, they will be fully replicated as they were in the source stage. That may not always be desirable, and instead, you may want to keep some parents but discard and replace others. In that case, you can use a special /./ relative pivot point in your source path to mark which parents you wish to copy and which should be ignored. This special path component resembles how rsync works with the --relative flag.

#syntax=docker/dockerfile:1.7-labs
FROM ... AS base
RUN ./generate-lot-of-files -o /out/
# /out/usr/bin/foo
# /out/usr/lib/bar.so
# /out/usr/local/bin/baz

FROM scratch
COPY --from=base --parents /out/./**/bin/ /
# /usr/bin/foo
# /usr/local/bin/baz

This example above shows how only bin directories are copied from the collection of files that the intermediate stage generated, but all the directories will keep their paths relative to the out directory. 

Exclusion filters

The following feature has been released in the “labs” channel. Define the following at the top of your Dockerfile to use this feature:

#syntax=docker/dockerfile:1.7-labs

Another related case when moving files in your Dockerfile with COPY and ADD instructions is when you want to move a group of files but exclude a specific subset. Previously, your only options were to use RUN --mount or try to define your excluded files inside a .dockerignore file. 

.dockerignore files, however, are not a good solution for this problem, because they only list the files excluded from the client-side build context and not from builds from remote Git/HTTP URLs and are limited to one per Dockerfile. You should use them similarly to .gitignore to mark files that are never part of your project but not as a way to define your application-specific build logic.

With the new --exclude=[pattern] flag, you can now define such exclusion filters for your COPY and ADD commands directly in the Dockerfile. The pattern uses the same format as .dockerignore.

The following example copies all the files in a directory except Markdown files:

COPY --exclude=*.md app /dest/

You can use the flag multiple times to add multiple filters. The next example excludes Markdown files and also a file called README:

COPY --exclude=*.md --exclude=README app /dest/

Double-star wildcards exclude not only Markdown files in the copied directory but also in any subdirectory:

COPY --exclude=**/*.md app /dest/

As in .dockerignore files, you can also define exceptions to the exclusions with ! prefix. The following example excludes all Markdown files in any copied directory, except if the file is called important.md — in that case, it is still copied.

COPY --exclude=**/*.md --exclude=!**/important.md app /dest/

This double negative may be confusing initially, but note that this is a reversal of the previous exclude rule, and “include patterns” are defined by the source parameter of the COPY instruction.

When using --exclude together with previously described --parents copy mode, note that the exclude patterns are relative to the copied parent directories or to the pivot point /./ if one is defined. See the following directory structure for example:

assets
├── app1
│   ├── icons32x32
│   ├── icons64x64
│   ├── notes
│   └── backup
├── app2
│   └── icons32x32
└── testapp
    └── icons32x32
COPY --parents --exclude=testapp assets/./**/icons* /dest/

This command would create the directory structure below. Note that only directories with the icons prefix were copied, the root parent directory assets was skipped as it was before the relative pivot point, and additionally, testapp was not copied as it was defined with an exclusion filter.

dest
├── app1
│   ├── icons32x32
│   └── icons64x64
└── app2
    └── icons32x32

Conclusion

We hope this post gave you ideas for improving your Dockerfiles and that the patterns shown here will help you describe your build more efficiently. Remember that your Dockerfile can start using all these features today by defining the #syntax line on top, even if you haven’t updated to the latest Docker yet.

For a full list of other features in the new BuildKit, Buildx, and Dockerfile releases, check out the changelogs:

Thanks to community members @tstenner, @DYefimov, and @leandrosansilva for helping to implement these features!

If you have issues or suggestions you want to share, let us know in the issue tracker.

Learn more



Source link