Revealing Neural Network and QA system Behind Deep Learning

Revealing the Dark Magic Behind Deep Learning

Introduction

Training regimes, regularization schemes, and architecture enhancements are crucial for effective deep learning. We shall name them "Dark Magic Tricks". In this article, we review useful dark magic tricks, accompanied by examples and use-cases. We also compile a generic "checklist" of tricks testing, that can be used to upgrade any existing deep learning repository.

Dark magic tricks are usually not fully supported by deep mathematical background but are more empirically and heuristically based. Due to that, in many cases they are overlooked and briefly mention as implementations details in academic articles, sometimes even visible only by inspecting source codes.

The lack of fundamental understanding and the heuristic nature of these tricks, in addition to their importance and effectiveness, is the reason we named them "Dark Magic".

Despite the relative disregarding, we argue that the difference between mediocre and top repositories is almost always due to better dark magic tricks. Even for articles that claim to reach new SotA scores using a unique novelty, in almost all cases the novelty is accompanied by very effective usage of existing dark magic tricks.

There is no trick that always works, and has "correct" hyper-parameter values. You usually need to choose the relevant tricks for each problem and tune their hyper-parameters. However, knowing and understanding common best practices can increase dramatically the chance to use dark magic tricks effectively and get top scores.

We have compiled a generic checklist of dark magic tricks and best practices, that we try and test each time we start a new repository or upgrade an existing one. For each checklist trick, we always try to recommend a specific default option, instead of suggesting a long list of possibilities. Each recommendation is accompanied by example and uses cases that were thoroughly tested and reviewed. While our main focus was on classification datasets, we believe that our checklist has a good chance to generalize to other tasks and datasets.

Tricks Testing Checklist

Training Schemes:

Which learning rate regime to choose
Which optimizer to choose
Should we always pretrain a model on ImageNet
Which batch size to use (-)

Regularization Tricks:

AutoAugment
Weight decay
Label smoothing
Scheduled regularizations
Mixup
Auxiliary loss (-)
Crop factor, padding and resizing schemes(-)
Drop path (-)
CutOut (-)

Architecture Enhancements:

Squeeze-And-Excite (SE) layers
Stem activation functions
Attention pooling (-)

While we, of course, can't cover all the existing dark magic tricks, we found the checklist above very useful and were able with it to improve our scores significantly on multiple datasets - CIFAR10, CIFAR100, ImageNet, Palitao-102K, Alicool, COCO keypoints, Market-1501, SVHN, Freiburg-grocery and more. In the following sections, we will dive in and analyze the different tricks on the checklist.

Due to space considerations, tricks marked with (-) are left for future posts.

Training Schemes

Which Learning Rate Regime to Choose
We experimented a lot with different kinds of learning rate regimes, including gamma decay, cosine annealing, heuristics ("reduce learning rate by factor 0.1 in epochs 50 and 75...") and more.

Related Blogs

Alibaba Open-Source and Lightweight Deep Learning Inference Engine - Mobile Neural Network (MNN)

Alibaba has made its lightweight mobile-side deep learning inference engine, Mobile Neural Network (MNN), open source to benefit more app and IoT developers.

Recently, Alibaba officially published the source code of its lightweight mobile-side deep learning inference engine - Mobile Neural Network (MNN) - on GitHub.

Jia Yangqing, a famous AI scientist, commented that "compared with general-purpose frameworks like TensorFlow and Caffe2 that cover both training and inference, MNN focuses on the acceleration and optimization of inference and solves efficiency problems during model deployment so that services behind models can be implemented more efficiently on the mobile side. This is actually in line with ideas in server-side inference engines like TensorRT. In large-scale machine learning applications, the number of computations for inference are usually 10+ times more than that for training. Therefore, optimization for inference is especially important."

How is the technical framework behind MNN designed? What are future plans regarding MNN? Today, let's have a closer look at MNN.

1. What Is MNN?

Mobile Neural Network (MNN) is a lightweight mobile-side deep learning inference engine that focuses on the running and inference of deep neutral network model. MNN covers the optimization, conversion, and inference of deep neutral network models. Currently, MNN has been adopted in more than 20 apps such as Mobile Taobao, Mobile Tmall, Youku, Juhuasuan, UC, Fliggy, and Qianniu, covering live broadcast, short video capture, search recommendation, product searching by image, interactive marketing, equity distribution, security risk control and other scenarios. MNN stably runs more than 100 million times per day. In addition, MNN is also applied in IoT devices like Cainiao will-call cabinets. During the 2018 Double 11 event, MNN was used in scenarios like smiley face red envelopes, scans, and a finger-guessing game.

MNN has already been made an open-source project on GitHub. Follow the official WeChat account "Alibaba Technology" and enter "MNN" in the dialog box to learn more and obtain the GitHub download link to this project.

2. Advantages of MNN

MNN loads network models, does inference, make predictions and returns relevant results. The inference process consists of loading and parsing models, scheduling computational graphs, and running models efficiently on heterogeneous back-end devices. MNN has four advantages: lightweight, versatility, high performance, and ease of use.

QA Systems and Deep Learning Technologies – Part 1

QA systems can interpret a user's questions described in natural language and return concise and accurate matched answers by searching in the heterogeneous corpora or QA knowledge bases.

1. Introduction

The automatic question and answering (QA) system has been in use for decades now. However, Siri's and Watson's success in 2011 has captured the whole industry's attention. Since the success of these two technologies, the automatic QA system has stepped further into the limelight as a standalone practical application.

This success can be attributed to the considerable progress of machine learning and natural language processing technology, as well as the emergence of large-scale knowledge bases, such as Wikipedia, and extensive network information. However, the problems facing the existing QA system are far from being solved. The analysis of the question and the identification of the matching relationship between the question and the answer remain two key problems that restrict QA systems. This article is the first in a 2 part series which explores the ins and outs, problems and opportunities of the QA system.

2. Overview of QA Systems

QA systems can interpret a user's questions described in natural language and return concise and accurate matched answers by searching the heterogeneous corpora or, in more common terms, the QA knowledge bases. Compared with search engines, QA systems can better interpret the intended meaning of the user's questions and therefore can meet the user's information requirements more efficiently.

2.1 History of QA Systems

The Turing test is the earliest example of a QA system implementation and tests a machine's ability for human intelligence. The Turing test requires the computer to answer a series of questions asked by human testers within 5 minutes. With the development of relevant technologies, such as artificial intelligence (AI) and natural language processing, different QA systems use various data types. Due to the limitation of intelligent technologies and domain data scales, early QA systems were mainly restricted to AI systems or expert systems of a limited domain, such as STUDENT [1] and LUNAR [2] systems. During this period, QA systems processed structured data. The system would translate the input questions into database query statements and then implement database retrieval and provide the feedback. With the rapid development of the internet and the rise of natural language processing technology, QA systems entered the open-domain-oriented and free-text-data-based development stage, such as the English QA retrieval systems Ask Jeeves (http://www.ask.com) and START (http://start.csail.mit.edu). The processing flow of such QA systems mainly includes question analysis, document and paragraph retrieval, candidate answer extraction, and answer validation. The introduction of the Question Answering Track (QA Track) at the Text Retrieval Conference (TREC) in 1999 promoted research and development based on natural language processing technology in the QA field.

Later, the internet-based community question answering (CQA) provided the data about question-answer pairs (QA pair) derived from massive user interactions, which provide a stable and reliable source of QA data for QA pair based systems. With the advent of Apple's Siri system, QA systems entered the intelligent interactive question answering stage allowing users to experience more natural human-computer interactions and make information services more convenient and practical.

A QA system's data objects include the user's questions and answers. A QA systems classification correspond to the data domains of the user's questions into those oriented to a restricted domain, an open domain, and frequently asked questions (FAQ). Also, QA systems can be categorized, according to generation and feedback mechanisms of the answers, into those based on retrieval type and generation type. This paper mainly describes the processing frameworks of QA systems based on various retrieval type.

Related Products

Elastic Compute Service

Alibaba Cloud ECS has the scale to provide high elasticity that can meet your business needs instantly. We can provide hundreds of thousands of vCPUs in minutes for a single customer in a single region, that is because of our sophisticated smart placement algorithm, dynamic and automatic planing as well as our optimization of both hardware and software.

Simple Application Server

Simple Application Server is a new generation computing service for stand-alone application scenarios.

Related Courses

Network Series Courses

After learning this network series courses, the trainees will have a entry level understanding of basic relevant conceps, then knowing the routing and switching knowledge system, load balancing working theory and widely used network security solutions, thus having a comprehensive understanding of computing network knowledge.

Alibaba Cloud Network Solution

This course aims to help Alibaba Cloud users quickly understand Alibaba Cloud network products, so as to have the ability to select Alibaba Cloud Network services according to scenarios, to enable individual users or enterprise users to quickly understand cloud network technology. The course mainly focuses on the services of three parts: Cloud Network, Interconnected Cloud Network and Connection to Cloud Network. Each part is composed of specific Alibaba Cloud Network services.

Related Market Products

LAMP Stack(Apache PHP5.6 MySQL5.6) on Ubuntu16

This image is built with Ubuntu16.04 64bit and bundled with following popular software for web service solution. It includes ready-to-run versions of Apache, MySQL, PHP and phpMyAdmin and all of the other software required to run each of those components. Optimized to add your security concern.

Hillstone Virtual NGFW Standard Edition(BYOL)

The Hillstone CloudEdge Virtual NGFW Standard Edition provides essential NGFW features to AliCloud users in order to protect their cloud-based applications and resources. It provides both user and application identification; Intrusion Prevention (IPS); ,Quality of Service (QoS); Virtual Private Networks (VPN) and server load balancing capabilities, to address different customer security requirements and deployment scenarios.

Community

Revealing Neural Network and QA system Behind Deep Learning

Revealing the Dark Magic Behind Deep Learning

Introduction

Tricks Testing Checklist

Training Schemes:

Regularization Tricks:

Architecture Enhancements:

Training Schemes

Related Blogs

Alibaba Open-Source and Lightweight Deep Learning Inference Engine - Mobile Neural Network (MNN)

1. What Is MNN?

2. Advantages of MNN

QA Systems and Deep Learning Technologies – Part 1

1. Introduction

2. Overview of QA Systems

2.1 History of QA Systems

Related Products

Elastic Compute Service

Simple Application Server

Related Courses

Network Series Courses

Alibaba Cloud Network Solution

Related Documentation

FAQ of deep learning - Machine Learning Platform for AI

How to enable deep learning?

Big data deep learning engine best practices

Related Market Products

LAMP Stack(Apache PHP5.6 MySQL5.6) on Ubuntu16

Hillstone Virtual NGFW Standard Edition(BYOL)

Read previous post:

Read next post:

Alibaba Clouder

You may also like

Comments

Alibaba Clouder

Related Products

Platform For AI

Application High availability Service