Docker Container Resource Management: CPU, RAM and IO: Part 2

By Alwyn Botha, Alibaba Cloud Tech Share Author. Tech Share is Alibaba Cloud's incentive program to encourage the sharing of technical knowledge and best practices within the cloud community.

This tutorial aims to give you practical experience of using Docker container resource limitation functionalities on an Alibaba Cloud Elastic Compute Service (ECS) instance.

cpu-shares Proportional to Other Containers

In this test, CPU-shares are proportional to other containers. The 1024 default value has no intrinsic meaning.

If all containers have CPU-shares = 4 they all equally share CPU times.

This is identical to all containers have CPU-shares = 1024 they all equally share CPU times.

Run:

docker container run -d --cpu-shares=4 --name mycpu1024a alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'
docker container run -d --cpu-shares=4 --name mycpu1024b alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'
docker container run -d --cpu-shares=4 --name mycpu1024c alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'

Investigate logs:

docker logs mycpu1024a
docker logs mycpu1024b
docker logs mycpu1024c

Prune containers, we are done with them.

docker container prune -f

Note they still all ran the same time. They did not run 4/1024 slower.

cpu-shares: Only Enforced When CPU Cycles Are Constrained

cpu-shares are only enforced when CPU cycles are constrained

With no other containers running defining CPU-shares for one container is meaningless.

docker container run -d --cpu-shares=4 --name mycpu1024a alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'

docker logs mycpu1024a

real    0m 12.67s
user    0m 0.00s
sys     0m 12.27s

Now increase shares to 4000 and rerun - see - zero difference in runtime.

One single container is using all available CPU time: no sharing needed.

docker container run -d --cpu-shares=4000 --name mycpu1024a alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=100 | md5sum'

Prune this one container, we are done with it.

docker container prune -f

––cpus= Defines How Much of the Available CPU Resources a Container Can Use

Specify how much of all the available CPU resources a container can use. For instance, if the host machine has two CPUs and you set --cpus="1.5", the container is guaranteed at most one and a half of the CPUs.

Note the range of --CPUs values we are using in the commands below. Run it:

docker container run -d --cpus=2 --name mycpu2 alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=50 | md5sum'
docker container run -d --cpus=1 --name mycpu1 alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=50 | md5sum'
docker container run -d --cpus=.5 --name mycpu.5 alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=50 | md5sum'
docker container run -d --cpus=.25 --name mycpu.25 alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=20 | md5sum'
docker container run -d --cpus=.1 --name mycpu.1 alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=10 | md5sum'

Investigate docker stats

docker stats

Expected output :

CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
843bea7263fb        mycpu2              57.69%              1.258MiB / 985.2MiB   0.13%               578B / 0B           1.33MB / 0B         0
186ba15b8258        mycpu1              55.85%              1.25MiB / 985.2MiB    0.13%               578B / 0B           1.33MB / 0B         0
3bcc26eab1ac        mycpu.5             46.60%              1.262MiB / 985.2MiB   0.13%               578B / 0B           1.33MB / 0B         0
79d7d7e3c38c        mycpu.25            25.43%              1.262MiB / 985.2MiB   0.13%               508B / 0B           1.33MB / 0B         0
b4ba5503a048        mycpu.1             9.76%               1.328MiB / 985.2MiB   0.13%               508B / 0B           1.33MB / 0B         0

mycpu.1, mycpu.25 and mycpu.5 perfectly demonstrate the restrictions applied.

However mycpu1 and mycpu2 does not have 100 + 200 % additional CPUs available. Therefore their settings are ignored and they equal share remaining CPU time.

––cpus Number of CPUs

The --cpus setting defines the number of CPUs a container may use.

For the purposes of Docker and Linux distros CPUs are defined as:

CPUs = Threads per core X cores per socket X sockets

CPUs are not physical CPUs.

Let's investigate my server to determine its number of CPUs.

lscpu | head -n 10

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel

Information not needed removed:

lscpu | head -n 10

CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1

CPUs = Threads per core X cores per socket X sockets

CPUs = 1 x 2 x 1 = 2 CPUs

Confirm with:

grep -E 'processor|core id' /proc/cpuinfo

2 core id = 2 cores per socket
2 processors = 2 cpus

processor       : 0
core id         : 0
processor       : 1
core id         : 1

Ok this server has 2 CPUs. Your server will be different, so consider that when you investigate all the tests done below.

The --cpus setting defines the number of CPUs a container may use.

Let's use both CPUs, just one, a half and a quarter CPU and record runtimes for CPU-heavy workload.

Note ––cpus=2

docker container run --cpus=2 --name mycpu alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=30 | md5sum'

Expected output :

real    0m 3.61s
user    0m 0.00s
sys     0m 3.50s

We have nothing to compare against. Let's run the other tests.

docker container prune -f

Note --cpus=1

docker container run --cpus=1 --name mycpu alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=30 | md5sum'
real    0m 3.54s
user    0m 0.00s
sys     0m 3.37s

docker container prune -f

Note ––cpus=.5

docker container run --cpus=.5 --name mycpu alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=30 | md5sum'
real    0m 9.97s
user    0m 0.00s
sys     0m 4.78s

docker container prune -f

Note --cpus=.25

docker container run --rm --cpus=.25 --name mycpu alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=30 | md5sum'
real    0m 19.55s
user    0m 0.00s
sys     0m 4.69s

--cpus=2 realtime: 3.6 sec
--cpus=1 realtime: 3.5 sec
--cpus=.5 realtime: 9.9 sec
--cpus=.25 realtime: 19.5 sec

Our simple benchmark does not effectively use 2 CPUs simultaneously.

Half a CPU runs twice as slow, a quarter CPU runs 4 times slower.

The --CPUs setting works. If the applications inside your containers are unable to multithread / use more than 1 CPU effectively, allocate just one CPU.

––cpu-period and --cpu-quota

Obsolete options. If you use Docker 1.13 or higher, use --cpus instead.

––cpu-period Limit CPU CFS (Completely Fair Scheduler) period
––cpu-quota Limit CPU CFS (Completely Fair Scheduler) quota

Our exercises above clearly show how easy it is to use --CPUs setting.

––cpuset-cpus CPUs in which to allow execution (0-3, 0,1)

––cpuset-cpus - CPUs in which to allow execution (0-3, 0,1)

Unfortunately my server only has 2 CPUs and we saw moments ago using more than 1 CPU has no effect ( using THIS SPECIFIC benchmark ).

If your server has several CPUs you can run much more interesting combinations of --cpuset settings. No it will not be useful: THIS SPECIFIC benchmark uses only 1 thread.

Later in this tutorial there are tests using sysbench ( an actual benchmark tool ) that allows you to specify number of threads.

Here are my results: no difference using 2 CPUs, just cpu 1, just CPU 0.

docker container run --rm --cpuset-cpus=0,1 --name mycpu alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=30 | md5sum'

Expected output : 

real    0m 3.44s
user    0m 0.00s
sys     0m 3.35s

docker container run --rm --cpuset-cpus=0 --name mycpu alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=30 | md5sum'

Expected output : 

real    0m 4.15s
user    0m 0.00s
sys     0m 4.00s

docker container run --rm --cpuset-cpus=1 --name mycpu alpine:3.8 /bin/sh -c 'time dd if=/dev/urandom bs=1M count=30 | md5sum'

Expected output : 

real    0m 3.40s
user    0m 0.00s
sys     0m 3.28s

Test Container Limits Using Real Benchmark Applications

All tests done above were done using quick hacks.

To properly test the resource limits of containers we need real Linux bench applications.

I am used to using CentOS so will be using that as the basis of our bench container. Both bench applications are available on Debian / Ubuntu as well. You could easily translate yum installs to apt-get installs and get identical results.

We need to install 2 bench applications in our container. The best way is to build an image with those applications included.

Therefore create a dockerbench directory:

mkdir dockerbench
cd dockerbench

nn Dockerfile

FROM centos:7

RUN set -x \
   && yum -y install https://www.percona.com/redir/downloads/percona-release/redhat/0.0-1/percona-release-0.0-1.x86_64.rpm \
   && yum -y install sysbench \
   
  && curl http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm -o epel-release-latest-7.noarch.rpm \
  && rpm -ivh epel-release-latest-7.noarch.rpm \
  && yum -y install stress

The first install adds the percona yum repo - the home of sysbench.

Yum then installs sysbench

The curl adds the EPEL yum repo - the home of stress.

Yum then installs stress

Build our bench image. It will take about a minute - based on Internet yum downloads + yum Dependency Resolution and the usual other activities.

If you do not have CentOS 7 image downloaded already it may take another minute.

docker build --tag centos:bench --file Dockerfile  .

Now we have a CentOS bench image ready to for repeated use ( with 2 bench tools installed ).

docker run -it --rm centos:bench /bin/sh

––cpus Tested with Sysbench Tool

Syntax:

sysbench --threads=2 --events=4 --cpu-max-prime=800500 --verbosity=0 cpu run

––threads=2 ... run 2 threads so we can compare 2 CPUs versus 1 CPU
––events=4 ... do 4 runs
––cpu-max-prime=800500 ... calculate prime numbers up to 800500
––verbosity=0 ... do not show detailed output
cpu run ... run the test named CPU

Via experiments I determined 800500 to be a nice value to run tests quick enough on my 10 year old computer. CPUmark 700. I added the 5 in there since that many zero digits are difficult to read.

2 CPUs:

docker run -it --rm --name mybench  --cpus 2 centos:bench /bin/sh

sh-4.2# time sysbench --threads=2 --events=4 --cpu-max-prime=800500 --verbosity=0 cpu run

real    0m1.952s
user    0m3.803s
sys     0m0.029s
sh-4.2#
sh-4.2# exit
exit

Real is wall clock time - time from start to finish of the sysbench: 1.9 seconds.

User is the amount of CPU time spent in user-mode code (outside the kernel) within sysbench. 2 CPUs were used: each used 1.9 seconds CPU time, so total user time is time added for each CPU.

The elapsed wall clock time is 1.9 seconds. Since 2 CPUs worked simultaneously / concurrently their summarized time is shown as user time.

Sys is the amount of CPU time spent in the kernel doing system calls.

One CPU:

docker run -it --rm --name mybench  --cpus 1 centos:bench /bin/sh

sh-4.2# time sysbench --threads=2 --events=4 --cpu-max-prime=800500 --verbosity=0 cpu run

real    0m4.686s
user    0m4.678s
sys     0m0.026s
sh-4.2#
sh-4.2# exit
exit

A more convenient way to run these comparisons is to run the bench command right on the docker run line.

Let's rerun one CPU this way:

docker run -it --rm --name mybench  --cpus 1 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=800500 --verbosity=0 cpu run'

real    0m4.659s
user    0m4.649s
sys     0m0.028s

Let's run half a CPU this way:

docker run -it --rm --name mybench  --cpus .5 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=800500 --verbosity=0 cpu run'

real    0m10.506s
user    0m5.221s
sys     0m0.035s

Results make perfect sense:

2 CPUs : real 0m1.952s
1 CPU : real 0m4.659s
5 CPUs : real 0m10.506s

With sysbench in our image it makes such tests very easy and quick. Mere seconds and you now have experience limiting Docker containers CPU usage.

Quite frankly waiting 10.506s for the .5 CPU test is too long - especially if you have a many multicore server.

If you did this on a development server at work the CPU load can change drastically on over an elapsed minute. Developers could have compiled during the 2 second 2 CPU run and the server could be CPU-quiet for the - long 5 seconds - 1 CPU run totally skewing our numbers.

We need to have an approach that is somewhat robust against such changing circumstances. Every test must run as quickly as possible and directly one after the other.

Sounds promising, lets try that. Reduce the max prime number 100 fold.

Cut and paste all 3 these instructions in one go and observe results:

docker run -it --rm --name mybench  --cpus 2 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=8005 --verbosity=0 cpu run'
docker run -it --rm --name mybench  --cpus .5 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=8005 --verbosity=0 cpu run'
docker run -it --rm --name mybench  --cpus 1 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=8005 --verbosity=0 cpu run'

Expected output :

2 CPUs
real    0m0.049s
user    0m0.016s
sys     0m0.021s

1 CPUs 
real    0m0.049s
user    0m0.019s
sys     0m0.020s

.5 CPU 
real    0m0.051s
user    0m0.020s
sys     0m0.019s

Benchmark startup overhead overwhelms the wall-clock real times. Tests hopelessly too short.

After 3 private experiments decreasing the original workload by 10 fold seems perfect.

docker run -it --rm --name mybench  --cpus 2 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=100500 --verbosity=0 cpu run'
docker run -it --rm --name mybench  --cpus 1 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=100500 --verbosity=0 cpu run'
docker run -it --rm --name mybench  --cpus .5 centos:bench /bin/sh -c 'time sysbench --threads=2 --events=4 --cpu-max-prime=100500 --verbosity=0 cpu run'

( The 5 in there is just to make long strings of 000000 more readable. )

2 CPUs
real    0m0.152s
user    0m0.225s
sys     0m0.015s

1 CPU
real    0m0.277s
user    0m0.279s
sys     0m0.019s

.5 CPU
real    0m0.615s
user    0m0.290s
sys     0m0.024s

Ratios look perfect. Overall runtime is less than a second which minimizes effects of changing CPU-load effects on the development server upon our test timings.

Spend a few minutes playing on your server to get and understanding of what is explained here.

Note I used --rm on the run command. This auto-removes the container after it finishes the command handed to it via /bin/sh.

Community

Docker Container Resource Management: CPU, RAM and IO: Part 2

cpu-shares Proportional to Other Containers

cpu-shares: Only Enforced When CPU Cycles Are Constrained

––cpus= Defines How Much of the Available CPU Resources a Container Can Use

––cpus Number of CPUs

––cpu-period and --cpu-quota

––cpuset-cpus CPUs in which to allow execution (0-3, 0,1)

Test Container Limits Using Real Benchmark Applications

––cpus Tested with Sysbench Tool

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

Alibaba Clouder

Related Products

Container Service for Kubernetes

ECS(Elastic Compute Service)