Community Blog Getting to Know Dockerfile Instructions: Part 4

Getting to Know Dockerfile Instructions: Part 4

This set of tutorials focuses on giving you practical experience on using Dockerfile on Alibaba Cloud.

By Alwyn Botha, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud's incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.

This tutorial is different from the first 3 as there are no copy and pasting, no shell, and no commands to run.

To follow the steps in this tutorial, make sure you have access to an Alibaba Cloud Elastic Compute Service instance with a recent version of Docker already installed. You can refer to this tutorial to learn how to install Docker on your Linux server.

This tutorial consists of these parts:

  1. A review of several snippets of Dockerfiles found at https://hub.docker.com
  2. You build a demo Dockerfile that summarizes insights from these snippets
  3. Dockerfile glossary summary texts
  4. Summary of Dockerfile best practices

The review of snippets of Dockerfiles are in no particular order.

Note that several of the snippets are very basic with only a small number of basic insights - not to worry. When you combine many of those basic concepts you can create a quality Dockerfile.

Review of Snippets of Dockerfiles - EXPOSE



# 7000: intra-node communication
# 7001: TLS intra-node communication
# 7199: JMX
# 9042: CQL
# 9160: thrift service
EXPOSE 7000 7001 7199 9042 9160
CMD ["cassandra", "-f"]

This snippet is taken from right at the bottom of the Dockerfile.

EXPOSE right at bottom is useful - I do not have to go read whole Dockerfile just to find randomly scattered EXPOSE instructions - like others do it.

Port numbers neatly in number order. Port numbers all documented in brief. Brief is perfect - I just need a one word reminder which port does what.

Correct and complete - the port numbers in # doclines and in EXPOSE lines match. No inconsistencies.

Your EXPOSEs must look like this: professional and beginner friendly.

Just out of interest - What is Cassandra?

From https://en.wikipedia.org/wiki/Apache_Cassandra

Apache Cassandra is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure.

Cassandra offers robust support for clusters spanning multiple data centers, with asynchronous masterless replication allowing low latency operations for all clients.

Review of Snippets of Dockerfiles - User IDs



# explicitly set user/group IDs
RUN groupadd -r cassandra --gid=999 && useradd -r -g cassandra --uid=999 cassandra

RUN groupadd -r sonarqube && useradd -r -g sonarqube sonarqube

Neo4j is a highly scalable, robust native graph database.

RUN addgroup -S neo4j && adduser -S -H -h /var/lib/neo4j -G neo4j neo4j

I found several other Dockerfiles - all do this the same way: addgroup and adduser all on one line.

Below I changed neo4j to be split over 2 lines: it just looks slightly better.

RUN addgroup -S neo4j \
  && adduser -S -H -h /var/lib/neo4j -G neo4j neo4j

Review of Snippets of Dockerfiles - apt-get Install



Nextcloud: A safe home for all your data. Access & share your files, calendars, contacts, mail & more from any device, on your terms.

  RUN set -ex; \
    apt-get update; \
    apt-get install -y --no-install-recommends \
        rsync \
        bzip2 \
        busybox-static \
    ; \
    rm -rf /var/lib/apt/lists/*; \

The most readable apt-get instructions that I could find.

Compare the text above to the other examples below to see how this is superior.


Apache Maven is a software project management and comprehension tool.

RUN apt-get update && \
    apt-get install -y \
      curl procps \
  && rm -rf /var/lib/apt/lists/*    

Inconsistent, unaligned indentations.



Logstash is a tool that can be used to collect, process and forward events and log messages

# install plugin dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
        apt-transport-https \
        libzmq5 \
    && rm -rf /var/lib/apt/lists/*

apt-get update && apt-get install all in one line. Neo4j ( 20 lines below ) have only one instruction per line. See how easily that reads.

From https://github.com/docker-library/openjdk/blob/89851f0abc3a83cfad5248102f379d6a0bd3951a/6-jdk/Dockerfile

Java is a concurrent, class-based, object-oriented language.

RUN apt-get update && apt-get install -y --no-install-recommends \
        bzip2 \
        unzip \
        xz-utils \
    && rm -rf /var/lib/apt/lists/*

apt-get update && apt-get install all in one line. Others have only one instruction per line. See below how easily that reads.

From https://github.com/neo4j/docker-neo4j-publish/blob/94477399f63ab99c035e50b46f642e791413dcaa/3.4.9/community/Dockerfile

Neo4j is a highly scalable, robust native graph database.

RUN apk add --no-cache --quiet \
    bash \
    curl \
    tini \
    su-exec \
    && curl --fail --silent --show-error --location --remote-name ${NEO4J_URI} \
    && echo "${NEO4J_SHA256}  ${NEO4J_TARBALL}" | sha256sum -csw - \
    && tar --extract --file ${NEO4J_TARBALL} --directory /var/lib \
    && mv /var/lib/neo4j-* /var/lib/neo4j \
    && rm ${NEO4J_TARBALL} \
    && mv /var/lib/neo4j/data /data \
    && chown -R neo4j:neo4j /data \
    && chmod -R 777 /data \
    && chown -R neo4j:neo4j /var/lib/neo4j \
    && chmod -R 777 /var/lib/neo4j \
    && ln -s /data /var/lib/neo4j/data \
    && apk del curl

Perfectly aligned one instruction per line. Very readable.

List of apk packages to install sorted alphabetically.

Curl used on line 6 then deleted on last line. Not needed anymore, therefore deleted.

${NEO4J_TARBALL} extracted on line 10 and deleted on line 12. Cleanup.

Review of Snippets of Dockerfiles - ini Files



What is Nextcloud?

A safe home for all your data. Access & share your files, calendars, contacts, mail & more from any device, on your terms.

RUN { \
        echo 'opcache.enable=1'; \
        echo 'opcache.enable_cli=1'; \
        echo 'opcache.interned_strings_buffer=8'; \
        echo 'opcache.max_accelerated_files=10000'; \
        echo 'opcache.memory_consumption=128'; \
        echo 'opcache.save_comments=1'; \
        echo 'opcache.revalidate_freq=1'; \
    } > /usr/local/etc/php/conf.d/opcache-recommended.ini; \
    echo 'apc.enable_cli=1' >> /usr/local/etc/php/conf.d/docker-php-ext-apcu.ini; \
    echo 'memory_limit=512M' > /usr/local/etc/php/conf.d/memory-limit.ini; \
    mkdir /var/www/data; \
    chown -R www-data:root /var/www; \
    chmod -R g=u /var/www

Very neat / pro first 7 lines echo settings to opcache-recommended.ini.

Neat and empty lines separate the 4 different purposes cleanly.

Review of Snippets of Dockerfiles -



Jenkins Continuous Integration and Delivery server.

FROM openjdk:8-jdk

RUN apt-get update && apt-get install -y git curl && rm -rf /var/lib/apt/lists/*

ARG user=jenkins
ARG group=jenkins
ARG uid=1000
ARG gid=1000
ARG http_port=8080
ARG agent_port=50000

ENV JENKINS_HOME /var/jenkins_home

'# Jenkins is run with user `jenkins`, uid = 1000
'# If you bind mount a volume from the host or a data container, 
'# ensure you use the same uid
RUN groupadd -g ${gid} ${group} \
    && useradd -d "$JENKINS_HOME" -u ${uid} -g ${gid} -m -s /bin/bash ${user}
'# for main web interface:
EXPOSE ${http_port}

'# will be used by attached slave agents:
EXPOSE ${agent_port}

Professional looking:

  1. ARG names all lower case
  2. ARG names sorted by function: user + group; uid + guid; 2 ports
  3. ENV names all upper case
  4. groupadd and useradd over 2 lines
  5. useradd aligns perfectly with groupadd

Review of Snippets of Dockerfiles - RUN


PostgreSQL object-relational database system

ENV PATH $PATH:/usr/lib/postgresql/$PG_MAJOR/bin
ENV PGDATA /var/lib/postgresql/data
RUN mkdir -p "$PGDATA" && chown -R postgres:postgres "$PGDATA" && chmod 777 "$PGDATA" # this 777 will be replaced by 700 at runtime (allows semi-arbitrary "--user" values)
VOLUME /var/lib/postgresql/data

Long RUN squashed between other lines.

My improved version:

RUN mkdir -p "$PGDATA" \
    && chown -R postgres:postgres "$PGDATA" \
    && chmod 777 "$PGDATA" \
    # this 777 will be replaced by 700 at runtime (allows semi-arbitrary "--user" values)

The rest of their Dockerfile looks great - see link above. They also have more comments than most others.

Review of Snippets of Dockerfiles - Sorted apt-get Install

From https://hub.docker.com/_/httpd/

'# install httpd runtime dependencies
'# https://httpd.apache.org/docs/2.4/install.html#requirements
RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        libapr1 \
        libaprutil1 \
        libaprutil1-ldap \
        libapr1-dev \
        libaprutil1-dev \
        liblua5.2-0 \
        libnghttp2-14=$NGHTTP2_VERSION \
        libpcre++0 \
        libssl1.0.0=$OPENSSL_VERSION \
        libxml2 \
    && rm -r /var/lib/apt/lists/*

Sorted list of apt packages to install.

Here is the mess if unsorted:

RUN apt-get update 
&& apt-get install -y --no-install-recommends \
        libxml2 \
        libaprutil1 \
            libnghttp2-14=$NGHTTP2_VERSION \
        libapr1-dev \
      liblua5.2-0 \
        libaprutil1-dev \
        libpcre++0 \
        libapr1 \
        libssl1.0.0=$OPENSSL_VERSION \
    libaprutil1-ldap \
  && rm -r /var/lib/apt/lists/*

You do not have to sort that by hand. Your editor probably has the functionality where you can just highlight the list and click - SORT.

Even during development it helps to have such lists sorted - to help you find a specific libapr. Right now the messy list must be carefully read from top to bottom to avoid missing any data. If this happens even once then the one-click SORT would have been a sound time investment.

Building a Demo Dockerfile

Now that we have seen how good versus bad Dockerfiles look like, let's create a Dockerfile to be proud of.

Your Dockerfile must use each of these instructions below at least once.

Your Dockerfile does not have to result in a cool finished application. You are just experimenting with the syntax and functionality possible with these commands.

Run any commands, add any files, make a workdir, add environment variables and arguments. Expose some ports and label it all. Have an ENTRYPOINT. Our purpose here is just to be familiar with the commands.

Copy any text snippets you can find at https://hub.docker.com/explore/

The more different software you use as input, the more interesting your learning process will be.


Dockerfile Concepts

The purpose of the texts below is for you to test how well you understand Dockerfile terminology.

It contains sentences stuffed with Dockerfile concepts.

It is not meant to teach you anything new.

If you can understand most of what is said below, you are comfortable with Dockerfile terminology.

Around 50% of the text below is from:


That text got edited to include more Docker terms.

. . .

The docker build command builds Docker images using a Dockerfile.

A Dockerfile is a text file that contains all the Linux commands you would normally run at the Linux command shell in order to build a Docker image. Docker can build images by reading the instructions from a Dockerfile.

An image is a layered collection of all the software needed to run your software application in an isolated container. An image is not running - its just software files in directories.

A container is a runtime instance of a docker image. You can use the docker run command to create a container from an image.

A Docker container includes the Docker image used to create it. A container is like a mini VM.

The Docker Hub - https://hub.docker.com - is a website that stores Docker images

A registry is a website service containing repositories of Docker images. The Docker Hub is a registry.

The default registry - https://hub.docker.com - can be accessed using a browser at Docker Hub or using the docker search command.

A repository is a set of Docker images. A repository can be shared by pushing it to a registry server - using the docker push command. We did not use this docker push command during this set of 5 tutorials.

1000s of other people used docker push to add 1000s of public images at https://hub.docker.com

Docker images are the basis of containers. An Image is an layered collection of root filesystem changes and the corresponding execution parameters for use within a container runtime.

If you want your Dockerfile to be runnable without specifying additional arguments to the docker run command, you must specify either ENTRYPOINT, CMD, or both.

A named volume is a volume which Docker manages: you can use docker volume list to get a list of those volumes.

You can specify a friendly text name when you create a named volume.

An anonymous volume is similar to a named volume, however, it can be difficult, to refer to the same volume over time when it is an anonymous volumes. Docker handle where the files are stored. An anonymous volume

Summary of Dockerfile Best Practices



Below are one-liner summaries of several Dockerfile best practices.

  1. Have a small build context ... https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#understand-build-context
  2. Use dockerignore ... https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#exclude-with-dockerignore
  3. Minimize the number of layers ... https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#minimize-the-number-of-layers
  4. Sort multi-line arguments ... https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#sort-multi-line-arguments
  5. Use Alpine FROM ... https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#from
  6. Add one label - just for practice ... https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#label
  7. Split long or complex RUN statements onto multiple lines - see examples above ... https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#run
  8. Expose any port - just for practice ... https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#expose
  9. Define some ENV variables ... https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#env
  10. Use ADD and COPY to show you understand their differences ... https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#add-or-copy
  11. Define a VOLUME ... https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#volume
  12. Add a demo test user - many examples at top of this tutorial ... https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#user
  13. Create a workdir ... https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#workdir

Phase 1: Review the demo Dockerfile you created. Fix things you did wrong.

Phase 2: Visit https://hub.docker.com/explore/

Click on first package name: nginx - find the first Dockerfile listed there - click again to see that Dockerfile. Then briefly scan it and see if you can spot best practices employed and abused.

Also consider the neat / pro snippets briefly discussed above. Study those Dockerfiles and see if you can spot similar types of problems or good practices.

Do this for as many official packages you have time for.

Also enter your favourite Linux software's name in the top left box on that web page. Investigate how well 'your' software dockerized itself.

Your Turn

Now apply all you have learnt in these 4 tutorials at your workplace.

You are now ready to read the full Dockerfile reference at https://docs.docker.com/engine/reference/builder/

You should be familiar with nearly all concepts mentioned there. Based on the first 3 tutorials you have practically experimented with most Dockerfile instructions.

That should all be VERY easy reading now.

Everything just said applies equally well to the best practices below:


You can now start building cool apps with Docker containers on Alibaba Cloud Elastic Compute Service (ECS) instances.

0 0 0
Share on

Alibaba Clouder

2,606 posts | 737 followers

You may also like