Optimize your Docker Images

Manjit Singh
6 min readJul 28, 2022

A Guide to Optimize Dockerfile and reduce the size of Docker Images

Docker Image Optimization

Containers have become an integral part of software development and deployment processes. Organizations across the industries are moving to containers in order to optimize and streamline their software development lifecycle.

As more and more applications are getting deployed on containers, it’s very important to follow proper steps while creating Docker Images. Every application have dependencies to run, and these dependencies sometimes causes the Docker images to become bulky and fat in size.

Why Should we Optimize Images

If the image size is big, then pulling and building the image will take time. And in some cases if the application is deployed on Edge at a remote place where bandwidth is less, then pulling and building a large docker image will be an issue.

The core advantage of small Docker images is cost reduction: small images consume less disk space, can be downloaded (and extracted) faster, and thus reduce the container start time.

Choosing the Optimized base image

This is the simplest way of reducing a Docker image size. Using a base image with minimal size is the most common way to reduce the image. This is usually the first statement in a Dockerfile which starts with FROM keyword.

Some examples are-
1) Alpine images- These are the images created specifically for use in containers.
2) Slim images- These are the smaller size version of the base image which have lesser packages. These can also be used to reduce the size of Docker images.

Removing the layers

Each step is Dockerfile forms a new layer on the image, and each new layer adds to the build execution time and increases the size of the image.

One of the ways to optimize the Dockerfile is to consolidate the RUN commands into one. The below Dockerfile example is an inefficient one as each RUN statement is adding a new layer on the image and in turn adding to total size of image.

FROM nvidia/cuda:11.2.0-runtime-ubuntu20.04
RUN apt-get update
RUN apt-get install python3.8 git -y
RUN apt-get update
RUN apt-get install ffmpeg libsm6 libxext6 -y
RUN apt-get install python3-pip -y

As you can see that above statements are not the best practice to follow, we have consolidated the RUN commands to one which will make the Dockerfile more efficient and performing.

FROM nvidia/cuda:11.2.0-runtime-ubuntu20.04
RUN apt-get update \
&& apt-get install -y --no-install-recommends python3.8 git ffmpeg libsm6 libxext6 python3-pip && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

Make Use of Caching

Docker stores the cache of each layer of a build, which is helpful when we build same image again and again with slight modification in Dockerfile.

In most of the cases, the application code is modified and new image is built. Due to this, its recommended to keep the instruction with COPY or ADD command at the end just before CMD or ENTRYPOINT instruction.

The reason behind this is that docker would be able to cache the image with the required dependencies, and this cache can then be used in the following builds when code gets modified.

This modification is mostly in the application code, which is there in Dockerfile with COPY or ADD command. Lets understand this with an example-

Dockerfile1

FROM nvidia/cuda:11.2.0-runtime-ubuntu20.04
RUN apt-get update
RUN apt-get install python3.8 git -y
RUN apt-get update
RUN apt-get install ffmpeg libsm6 libxext6 -y
RUN apt-get install python3-pip -y
WORKDIR /app
COPY . .

Dockerfile2

FROM nvidia/cuda:11.2.0-runtime-ubuntu20.04
WORKDIR /app
COPY . .
RUN apt-get update
RUN apt-get install python3.8 git -y
RUN apt-get update
RUN apt-get install ffmpeg libsm6 libxext6 -y
RUN apt-get install python3-pip -y

In above example, Dockerfile1 is more efficient than Dockerfile2. Because when the application code changes frequently, and if the COPY command is placed above in Dockerfile, then this will cause the cache to be invalidated for remaining commands.

Install required packages only

Every Docker image consists of multiple packages that needs to be installed. These package usually takes time to install and also adds to the overall space. We should make sure to install only the required packages.

As you can see in below code snippet, we have used --no-install-recommends flag to disable recommended packages. This should be used while installing packages in Docker images.

In addition, we are also removing the /var/lib/apt/lists contents which are the additional packages are present in base OS. We can remove this folder to further shorten the image size.

FROM nvidia/cuda:11.2.0-runtime-ubuntu20.04
RUN apt-get update \
&& apt-get install -y --no-install-recommends python3.8 git ffmpeg libsm6 libxext6 python3-pip python3-scipy && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

Use .dockerignore files

Docker can ignore the files present in the working directory if configured in the .dockerignore file. This not only speeds up Docker image building but this can also make the image smaller. This avoids that you accidentally copy large files or folders into the image, which you don’t need to run your application.

Use multistage Docker builds

The multi-stage option in Docker can help you to optimize your images. With this feature, we can split the build in two or more stages-

  1. A build(intermediate) image that contains all the packages, dependencies and does the compilation.
  2. A run image where we copy our application code and other compiled packages from build image in the run image.

Let’s understand more by an example. Below is a simple Dockerfile

FROM node:16 
COPY . .
RUN npm install
EXPOSE 3000
CMD [ "node", "index.js" ]

The above code is a simple Dockerfile that pulls a node:16 as base image and runs npm command on code. Although, the Dockerfile looks very simple and short, but still we can optimize this by breaking into stages. Let’s see how the Dockerfile would look if we break it in multistage.

FROM node:16 as build  
WORKDIR /app
COPY package.json index.js env ./
RUN npm install

FROM node:alpine as main
COPY --from=build /app /
EXPOSE 8080
CMD ["index.js"]

Although the above Dockerfile looks bigger, but the actual Docker image will be much more efficient and lesser in size than the original one. Because it has been broken in two stages. The first stage is an intermediate step where we are just compiling the application. And then in the final(main) stage, we are using the alpine image, and only copying the executables. This is leading to significant reduction in the size of image.

Using Third Party apps to Optimize Docker Images

Docker-squash

docker-squash is a Python-based tool that squashes the last N layers of an image to a single layer. Squashing is useful to reduce your Docker image size, in case there are layers in which you created many (large) files or folders, which you then deleted in a newer layer.

Dive

Dive is a tool for exploring a docker image, layer contents, and discovering ways to shrink the size of your Docker/OCI image.

Docker Slim

DockerSlim is a tool for developers that provides a set of commands (build, xray, lint and others) to simplify and optimize your developer experience with containers. It makes your containers betters, smaller and more secure.

You don’t need to modify your Dockerfile. The DockerSlim will examine your existing Dockerfile/image and optimize it.

Summary

It is always best to optimize your Docker image as much as possible. It helps in faster deployment of applications. You can follow the above methods to go through your Dockerfile and implement the best practices needed for creating an image.

--

--