×
Community Blog How to Run Google TensorFlow on Alibaba Cloud

How to Run Google TensorFlow on Alibaba Cloud

This tutorial is describes how to run TensorFlow on Alibaba Cloud using Docker containers for machine learning applications.

By Nikesh Gogia, Solution Architect

This tutorial is targeted for any organization that wants to host TensorFlow on Alibaba Cloud using Docker containers. This document can be used by Solution Architects or Business Development teams for proof of concept (POC) of any customer requirement and gradually can be converted into production grade hosting.

We will be running our Docker containers on an Alibaba Cloud Elastic Compute Service (ECS) Ubuntu 16.04 64 bit 4 Core 8 GB RAM Virtual Machine.

Hosting TensorFlow using Docker on Alibaba Cloud

TensorFlow is an open source library for numerical computation, specializing in machine learning applications. In this tutorial, you will learn how to install and run TensorFlow on a single machine, and will train a simple classifier to classify images of flowers.

What Are We Going to Be Building?

In this lab, we will be using transfer learning, which means we are starting with a model that has been already trained on another problem. We will then be retraining it on a similar problem. Deep learning from scratch can take days, but transfer learning can be done in a much short order.

This lab will train the model of lights on and off based on my demo that I released on YouTube. We will use this same model, but retrain it to tell apart a small number of classes based on our own examples.

What you will learn:

  1. How to use Python and TensorFlow to train an image classifier
  2. How to classify images with your trained classifier

What you need:

  1. A basic understanding of Linux commands
  2. Alibaba Cloud account

1. Install Docker on Ubuntu 16.04

The Docker installation package available in the official Ubuntu 16.04 repository may not be the latest version. To get the latest and greatest version, install Docker from the official Docker repository. This section shows you how to do just that.

First, add the GPG key for the official Docker repository to the system:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

Add the Docker repository to APT sources:

sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release-cs) stable"

Next, update the package database with the Docker packages from the newly added repo:

sudo apt-get update

Make sure you are about to install from the Docker repo instead of the default Ubuntu 16.04 repo:

apt-cache policy docker-ce

You should see output similar to the follow:

Output of apt-cache policy docker-ce
docker-ce:
   Installed: (none)
   Candidate: 17.03.1~ce-0~ubuntu-xenial
   Version table:
      17.03.1~ce-0~ubuntu-xenial 500
500 https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages
      17.03.0~ce-0~ubuntu-xenial 500
500 https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages

Notice that docker-ce is not installed, but the candidate for installation is from the Docker repository for Ubuntu 16.04. The docker-ce version number might be different.

Finally, install Docker:

sudo apt-get install -y docker-ce

Docker should now be installed, the daemon started, and the process enabled to start on boot. Check that it's running:

sudo systemctl status docker

The output should be similar to the following, showing that the service is active and running:

Output

docker.service - Docker Application Container Engine
Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2016-05-01 06:53:52 CDT; 1 weeks 3 days ago
Docs: https://docs.docker.com
Main PID: 749 (docker)

2. Setup TensorFlow Docker Image on Alibaba Cloud

In order to set the environment variables and quick setup of TensorFlow, I have created docker image of TensorFlow and you can pull it fast.

In above Ubuntu image of Alibaba Cloud, after installing Container and docker, execute following steps.

$ docker pull nikeshgogia/tensorflow:1.0
$ docker images

Once you do the following, you will see below output when you run docker images and docker ps –a commands.

root@iZt4neefbpoojkuy4fdvqzZ:~# docker images
REPOSITORY             TAG     IMAGE ID     CREATED     SIZE
nikeshgogia/tensorflow     1.0     ed0ee8133d06     26 minutes ago 1.62GB

Congratulations! You have successfully configured Docker image of TensorFlow on your Alibaba Cloud VM Container.

3. Starting and Entering TensorFlow Container

Once you have completed pulling the image, execute following steps.

First execute mkdir tf_files in / path. Execute below command

docker run -it --publish 6006:6006 -p 80:5000 --volume ${HOME}/tf_files:/tf_files --workdir
/tf_files nikeshgogia/tensorflow:1.0 bash

You will be now in container prompt of TensorFlow as shown below

root@17e62932b5b5:/tf_files#

4. Training and Testing Your Model

Execute below steps to train your lights model.

Execute cd .. command so that you come out of tf_files folder.

Execute following command cp -a btf_files/. tf_files/

Now enter into tf_files and see list of files.

root@17e62932b5b5:/tf_files# ls -l
total 16
drwxr-xr-x 4 root root 4096 Nov 16 03:37 lights
drwxr-xr-x 2 root root 4096 Nov 16 03:37 testimages
drwxr-xr-x 6 root root 4096 Nov 16 03:37 tf

In above folder, I have already put some of the lights on and off images under lights folder. There is a testimages folder which contains sample image which we will use to test after training the model. Folder tf is the scripts of tensorlfow

Create directory /trained_files by executing mkdir trained_files (Make sure it is in / path)

Execute following command to train your model.

Set variable path as below

IMAGE_SIZE=224
ARCHITECTURE="mobilenet_0.50_${IMAGE_SIZE}"

Once you set above command, make sure you enter into cd /tf_files/tf and then execute below command.

python -m scripts.retrain \
--bottleneck_dir=/trained_files/bottlenecks \
--how_many_training_steps=500 \
--model_dir=/trained_files/models/ \
--summaries_dir=/trained_files/training_summaries/"${ARCHITECTURE}" \
--output_graph=/trained_files/retrained_graph.pb \
--output_labels=/trained_files/retrained_labels.txt \
--architecture="${ARCHITECTURE}" \
--image_dir=/tf_files/lights

Once you train the model, test by giving sample image

python -m scripts.label_image \
--graph=/trained_files/retrained_graph.pb \
--image=/tf_files/testimages/l1.jpeg

You should see the following output:

light on 0.999999
light off 1.37755e-06

This concludes that image you provided was of light on. You can test multiple images by putting sample in testimages folder. Make sure you execute python scrip by entering into folder tf.

Conclusion

In this tutorial, we have successfully run TensorFlow on Alibaba Cloud using Docker containers for a sample machine learning application.

TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API.

TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well.

1 1 1
Share on

Alibaba Clouder

2,599 posts | 758 followers

You may also like

Comments

Raja_KT March 7, 2019 at 6:33 am

Good one. Will the images be compatible for both NVIDia and AMD? I got an error while executing..."sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release-cs) stable"....lsb_release-cs not found...