How to use a Dockerfile to build Docker Images?

    Docker is one of the most popular containerization platforms with a variety of tools that helps developers to design, build, test, and deploy applications in a packaged and containerized environment called Docker containers. These containers are runtime instances of Docker images.

    Docker images are simply a template of the environment that we want to create which will include all the binaries, libraries, packages, dependencies, system files, etc. They are read-only files and contain multiple layers of intermediate images. We can either directly download a base image from Docker registries such as Dockerhub or use a file called Dockerfile to specify instructions to build customized Docker images over a base image.

    We can directly pull base images from the docker hub and start creating containers associated with those images and work with the command line to install packages, dependencies, and all other configurations required to deploy the application. The downside of this is that every time we need to make a change, we need to access a command line by starting the container, make the changes, commit them to create a new image layer, and repeat the same process. However, there is another method of creating Docker images.

    For example, often, there are a few instructions or commands such as updating the OS, installing a few basic packages, etc., that need to be executed only once when you create a container. Instead of creating a container first and then running these commands inside the containers, we can simply specify them inside a file called Dockerfile. When we execute a Docker build command, the Docker daemon searches for the Dockerfile and tries to execute the instructions inside the Dockerfile one by one.

    The first instruction is a FROM instruction which is used to pull a base image and all the subsequent instructions are performed on this base image. Also, each subsequent instruction creates a new intermediate image layer that contains just the differences between this layer and the previous layer. This drastically reduces the size of the image.

    There are a plethora of Dockerfile instructions that are available and you can use them to create your own customized image template and use it every time to create new containers. A typical development flow is - firstly, the developer creates a new Dockerfile and pulls a base image using FROM instruction, then executes commands by specifying them along with the RUN instruction, copies build context either using ADD or COPY instructions, and specify the entry points using the CMD and ENTRYPOINT instructions.

    There are tons of other instructions as well that we will discuss later on in this article. After creating the Dockerfile, the developer builds the Docker image using the Docker build command and then runs a container associated with that image using the Docker run command. This allows the developer to get hold of the container’s command line or bash and make changes in the environment, install packages and dependencies, build, and run applications, and perform tons of other options.

    After that, the developer can save the operations by using the Docker commit command to create a new image layer on top of this and share this image with teammates for the further development process.

    In simpler words, a Dockerfile is simply a text file that has to be named exactly as “dockerfile” without any extension. Inside this Dockerfile, we mention instructions that would allow us to create our own customized Docker images over a base image. The Docker daemon will execute these instructions stepwise one after the other.

    Understanding the Docker Build Cache

    While building the Docker image using the docker build command, it may take some time to do so. It downloads base images if not already present in the local machine, copies the build context, runs commands mentioned inside the Dockerfile, downloads, and installs libraries, etc. We know that Docker Images contains several layers on top of each other as a result of each instruction and acts as an added functionality.

    When we try to build the image for the first time, all the instructions mentioned inside the Dockerfile get executed. However, if we try to build the same Dockerfile again, the Docker build process knows that there has not been any change in the Dockerfile. Hence, it will use the cache from the previous build process for all those layers that have not been changed.

    Once it finds a change in a particular layer, the cache will break for that instruction and all the subsequent instructions or layers and will be executed freshly. Hence, we need to specify the instructions carefully and in the best possible order, such that the least frequently changing instructions come at first and those instructions which are likely to change more frequently, comes at the end of the Dockerfile.

    Dockerfile Instructions

    Let’s go through some of the most commonly used instructions in a Dockerfile. Please note that the instructions mentioned inside a Dockerfile are case-insensitive, however, the general convention is to specify them in uppercase letters.

    1. FROM instruction

    The general syntax of the FROM instruction in a Dockerfile is -

    FROM <image>[:<tag>]

    This instruction pulls a base image from the official Docker registry called docker hub. All the subsequent instructions are executed inside this base image and builds new layers on top of it. This is always the first instruction of any Dockerfile, except the ARG instruction. Also, the FROM instruction might appear multiple times in the same Dockerfile if you are using multi-stage builds. Moreover, we can add the tag value to specify the exact version of the image that we want to pull. By default, if we don’t specify any tag, it will use the latest tag.

    2. ARG Instruction

    The general syntax of the ARG instruction is -

    ARG <name>[=<default value>]

    We can use the ARG instruction to define variables for images. The users during the build time can pass values to these variables using the --build-arg <variable>=<value> option. You can include multiple ARG instructions inside a Dockerfile. A simple example of using an ARG instruction inside a Dockerfile is -

    ARG  CODE_VERSION=latest
    
    FROM base:${CODE_VERSION}

    The ARG is the only instruction inside a Dockerfile that can precede a FROM instruction. In the above example, we have used the ARG instruction to define a variable called CODE_VERSION and specified a value to it. We have then used this variable as a tag to the image that we have pulled using the FROM instruction just below.

    3. ADD Instruction

    We can use the ADD instruction to add files and directories from our local machine to the container. We can also specify a URL or a tar file to copy files from. If we specify a tar file, then it automatically gets extracted and copied to the Docker image. The syntax of the ADD Instruction is -

    ADD <src> <dest>

    4. COPY Instruction

    The COPY instruction in a Dockerfile performs the same functions as the ADD instruction with slight differences. It only copies when the source is a file or a directory. Also, if we mention a tar archive, it will not extract it and copy it as it is. The syntax of the COPY instruction is -

    COPY <src> <dest>

    5. LABEL Instruction

    We can use the LABEL instruction to add metadata to Docker images. This can include image version, description, author, etc. It’s simply a key-value pair. The syntax for the LABEL instruction is -

    LABEL <Key>=<Value> <Key>=<Value> <Key>=<Value> …

    We can include more than one label and the labels that are specified inside the base image are inherited by the new images built on top of that. Also, if a label with a particular value already exists, then the latest one is applied. We can use quotes or escape characters to include spaces inside labels.

    6. MAINTAINER Instruction

    The MAINTAINER Instruction inside a Dockerfile is used to define the author of the Docker Image. However, it’s wise to choose LABEL to specify a maintainer because of the flexibility that it provides. The syntax of the MAINTAINER instruction is -

    MAINTAINER <name>

    7. WORKDIR Instruction

    This instruction is used to set a default working directory for the Docker containers. We can include the WORKDIR instruction multiple times inside a Dockerfile. All the subsequent instructions just after the WORKDIR instruction and before the next WORKDIR instruction executes inside the directory that we specify using the first instruction. After that, it changes to the new directory that we specify using the second WORKDIR instruction. Also, if the directory specified using the WORKDIR instruction does not exist inside the container, it will automatically be created. The syntax of the WORKDIR instruction is -

    WORKDIR /path/to/workdir

    8. ENV Instruction

    The ENV instruction can also be used to set environment variables like the ARG instruction. However, the scope of the ARG variable is just inside the Dockerfile and we can’t access these variables from inside the container. But in the case of environment variables declared by ENV instruction, we can even access them inside the Docker containers. Also, if a variable with the same name is defined both using ARG and ENV instruction, the ENV-defined variable always overrides. Here also, if we want to include spaces, we either need to escape them using backslashes or use quotes. The syntax of the ENV instruction is -

    ENV <key>=<value> …

    9. EXPOSE Instruction

    We can use the Expose instruction to expose a port from the container. This informs the daemon that the container is listening to the specified port during runtime. By default, it listens using the TCP protocol, but we can also define it to be UDP. This does not publish ports or connect them to a port in the host machine. However, it informs the person who is running the container about the fact that which ports can be published for this container. The syntax is for the EXPOSE instruction is -

    EXPOSE <port> [<port>/<protocol>...]

    10. RUN Instruction

    This is probably the most important and most frequently used instruction in Dockerfile. We can specify commands using RUN instruction which is to be executed during the image build process. Every time a RUN instruction gets executed, a new intermediate image layer is created. Hence, it’s always a better practice to chain RUN instructions than specifying new RUN instructions for each command. The syntax for the RUN instruction is -

    RUN <command>

    Or

    RUN ["executable", "param1", "param2"]

    The first syntax is for the shell form and the last one is for executable form.

    11. CMD Instruction

    This instruction is used to specify a default command that needs to be executed when we run a container. It simply specifies a default command which can be overridden when we run a container using the Docker run command. We can only specify a single CMD instruction and if we specify more than one, then the last one gets executed. The syntax of the CMD instruction is -

    CMD ["executable","parameter1","parameter2"]

    Or

    CMD ["parameter1","parameter2"]

    Or

    CMD command parameter1 parameter2

    Examples are -

    CMD echo “Welcome”
    CMD [“/bin/bash”, “-c”, “echo Welcome”]
    

    The first form is the executable form which is the most frequently used one. The second form of the CMD instruction is used to provide default arguments to an ENTRYPOINT instruction that we will discuss next. The last one is the shell form.

    12. ENTRYPOINT Instruction

    It looks similar to a CMD command. However, the difference is that it does not ignore the parameters when you run a container with CLI parameters. When you try to use an executable form of the ENTRYPOINT instruction, it will allow you to set additional parameters using CMD instruction. If you use it in shell form, it will ignore CMD parameters or any CLI arguments. The syntax of the ENTRYPOINT instruction is -

    ENTRYPOINT command param1 param2

    Or

    ENTRYPOINT ["executable", "param1", "param2"]

    The first one is the shell form and the next one is the executable form.

    Advantages of Dockerfile

    Let’s discuss some of the advantages of using a Dockerfile.

    1. It allows us to save a ton of time by automating the entire build process according to the instructions that we provide. We don’t have to execute instructions manually and commit every time changes are made.
    2. We can use Version Control tools like git, etc. along with the Dockerfiles.
    3. We can share them easily with others. All they need to do is run the Docker build command and they will have the exact same environment that you have.
    4. It acts as a blueprint and allows us to look back at the commands that we have used to create the environment.
    5. We can use the history of execution that it provides to leverage cache in a better and efficient way and improve the build time and size of the images.

    Wrapping Up!

    To conclude, in this comprehensive guide, we discussed how Docker leverages Dockerfiles to allow its users to create customized Docker images. We also discussed how it used build caches to build the images quickly. Moving ahead, we discussed several important Dockerfile instructions such as FROM, ARG, ENTRYPOINT, CMD, RUN, ENV, EXPOSE, etc. along with their syntaxes. Finally, we discussed some of the advantages of using Dockerfiles over the conventional method of creating containers and images. We certainly hope that this article provides you an in-depth explanation of Dockerfiles and their important instructions and will help you to get hands-on with them.

    Happy Learning!