TensorFlow.js Helps Recognize Large Quantities of Icons in Milliseconds!

By Tianke

Background

During the frontend development process, icons of images need to be restored. In most cases, the icons do not have a corresponding type field. If users need to look for what they need from hundreds of icons with their naked eyes, the result is a poor user experience.

Therefore, last year, I submitted a pull request for Ant Design open-source projects. The pull request contributes to a new feature, searching icons with screenshots, based on the deep learning technology. When users click, drag, or paste an icon screenshot in a design or any image to upload, they can search for the best matches and the corresponding matching rates. Remember that all recognition tasks are completed by the frontend!

The following figure shows the effect:

You can also experience it on the official website: https://ant.design/components/icon/

How can we implement this technology? This article will reveal the secrets behind the technology:

Introduction to Deep Learning
Sample Generation
Model Training
Model Compression and Conversion
TensorFlow.js Recognition

Introduction to Deep Learning

As described in the preceding section, this feature is implemented based on deep learning. What is deep learning? Deep learning is a type of machine learning. Machine learning is the study of computer algorithms that automatically improve based on "experience."

The keyword here is experience. Humans have long addressed problems based on their experience. For example, as early as medieval times, someone estimated the average foot length of all men by measuring the average foot length of 16 men.

Here is another example. If you are given a lot of height and weight data as well as the height of a single person, can you estimate the person's weight?

Of course you can! You can calculate the values of a and b in the formula of y = ax + b shown in the preceding figure, and then calculate a person's weight based on his or her height. In machine learning, a is called weight, and b is called bias. More specifically, this is called linear regression in machine learning.

A computer can learn number patterns. If we convert images, voices, or texts into numbers, can a computer recognize the patterns? Of course it can! However, the model is much more complicated.

Image Classification

Voice Assistant

We use a deep learning model called Convolutional Neural Network to classify icon screenshots.

Whether we are talking about simple linear regression or complicated deep learning, the model learns from "experience." The "experience" here is called "samples" in machine learning. Therefore, we must first generate samples for machine learning.

Sample Generation

In this icon classification task, the samples consist have two parts:

Images
Labels of Images

The labels refer to the category names of images. For example, if you want to identify whether it is a cat or a dog in the image, then the cat and dog are labels.

Studies show that the more samples you generate, the better the deep learning model will learn. Therefore, we have adopted the method of integrating a sample page. It uses Puppeteer with FaaS to quickly generate tens of thousands of icon images and their corresponding labels. How do we achieve it?

Write a Sample Page: We create a new frontend page, which only renders one Ant Design icon. However, this icon may be any one of over 300 Ant Design icons. In addition to that, the size, color, and location of the icon are rendered randomly.
Use Puppeteer to Take Screenshots Recurrently: After writing a sample page, we use Puppeteer (a headless browser) to open the page, and the refresh-screenshot operation cycles automatically. Tens of thousands of images are generated.
Implement FaaS Concurrency: It is too slow to generate tens of thousands of images on a PC, so we expect to take screenshots on 100 computers concurrently. We use FaaS and create 100 instances to take screenshots concurrently, generating 20,000 images per minute during measurements.

As such, a number of samples are available.

Model Training

When samples are available, you can start to train the model. We use the TensorFlow framework. The TensorFlow framework provides an example of image classification for you to download. When you run it, specify parameters based on the sample we just generated. https://github.com/tensorflow/hub/tree/master/examples/image_retraining

You can perform the model training on your PC. The training model is slow, but it can be finished during a lunch break!

Alibaba Cloud Machine Learning Platform for AI (PAI) provides available image classification algorithms and GPU for accelerated training. I do not use image classification algorithms on PAI. Instead, I have deployed the TensorFlow code to PAI for model training. It's remarkably fast!

Model Conversion and Compression

After the model is trained, it can be used to recognize images. However, the Python code must be deployed on the server before you can use the model. This can bring the following disadvantages:

Server Costs: Servers are required to deploy a model. Ant Design is an open-source project, so we are unwilling to bear any costs that increase linearly.
Recognition Speed: Servers are centralized, so users outside of China that live far away will be affected in terms of speed.
Stability: Hundreds of thousands of developers are using Ant Design. If a server problem occurs, server stability is worrying, and a large number of users may be affected.
Security: Ant Design is a static public website without any authentication and authorization, so an open interface will inevitably pose some security risks.

In view of these disadvantages, we intend to convert our model into the TensorFlow.js model and allow users to download the latter model to their browsers for recognition. This can bring the following benefits:

Edge Computing: Each user has a computer with a GPU installed on it. After our model is downloaded to the browser of each user, the model can use the computing power of GPUs from large quantities of users, reducing server costs and freeing users from worries about server attacks and server stability issues.
Fast Recognition Speed: The model is downloaded to the browser of each user, the recognition process is almost real-time, and it does not involve network transmission.

Both model conversion and compression use the tfjs-converter: https://github.com/tensorflow/tfjs/tree/master/tfjs-converter

We use MobileNet for transfer learning. The model is 16 MB, which is compressed to about 3 MB and released to jsDelivr for global acceleration.

TensorFlow.js Recognition

Now, you need to write some TensorFlow.js code. First, load the model file. The code snippet is shown below:

const MODEL_PATH = 'https://cdn.jsdelivr.net/gh/lewis617/antd-icon-classifier@0.0.1/model/model.json';
model = await tfconv.loadGraphModel(MODEL_PATH);

Next, convert icon screenshots into tensors.

A tensor is a type of data structure that is similar to a multidimensional array. In TensorFlow, the inputs and outputs of a model are tensors. Therefore, data must be converted into tensors before training and recognition.

// Convert images into tensors
constimg=tf.browser.fromPixels(imgEl).toFloat();

constoffset=tf.scalar(127.5);
// Normalize an image from [0, 255] to [-1, 1]
constnormalized=img.sub(offset).div(offset);

// Change the image size 
let resized = normalized;
if (img.shape[0] !== IMAGE_SIZE || img.shape[1] !== IMAGE_SIZE) {
  const alignCorners = true;
  resized = tf.image.resizeBilinear(
    normalized, [IMAGE_SIZE, IMAGE_SIZE], alignCorners,
  );
}

// Change the shape of a tensor to meet the model requirements
constbatched=resized.reshape([-1,IMAGE_SIZE,IMAGE_SIZE,3]);

Then, start to recognize images. The code snippet is shown below:

pred=model.predict(batched).squeeze().arraySync();
// Find the categories with the highest matching degree
const predictions = findIndicesOfMax(pred, 5).map(i => ({
    className: ICON_CLASSES[i],
    score: pred[i],
  }));

Then, the final result is shown!

The complete code is available on GitHub. Click this link to obtain the complete code.

Community

TensorFlow.js Helps Recognize Large Quantities of Icons in Milliseconds!

Background

Introduction to Deep Learning

Image Classification

Voice Assistant

Sample Generation

Model Training

Model Conversion and Compression

TensorFlow.js Recognition

Read previous post:

Read next post:

Alibaba F(x) Team

You may also like

Comments

Adnan Zaidi December 17, 2020 at 1:15 pm

Alibaba F(x) Team

Related Products

Offline Visual Intelligence Software Packages

AIRec

Artificial Intelligence Service for Conversational Chatbots Solution

Log Management for AIOps Solution