AutoML Application Exploration

Everything is for smoother!

Wufu has been five years old. Except that in the first year, we adopted the xiuyixiu scheme to complete the collection of blessings, and the scheme of image blessings has not been introduced. This year is the fourth year that the AR sweeping link has been used as the entrance to the knowledge of blessings. Some students will ask, I already felt that the scanning of blessings was very smooth and the recognition was very accurate last year. Why should I continue to upgrade the model this year? This reminds me of Xiaoyaozi's words on the Double Eleven battlefield, "This year's Double 11, what I care most about is not the sales figures, but the peak technology." Corresponding to the scene of scanning the five blessings, how to make the user’s experience of AR knowledge more smooth, how to smoothly scan the blessings on the extremely low-end mobile client, and how to increase the coverage calculated on the end. Reaching more users is even close to 100%, which is the direction we have to refresh and break through every year.

Precipitation from previous years

On the visual algorithm side of Sweeping Fortune in the past three years, we can summarize it as two breakthroughs. The first breakthrough came from the introduction of the xNN deep learning engine in 2018, which supported the operation of the Zhifu deep learning network on the terminal, significantly reducing the pressure on cloud services and greatly improving the accuracy of the recognition algorithm; the second breakthrough, From 2019, the cloud service has completed the upgrade of xNN-x86 version, which makes it possible to truly realize the integration of the device and the cloud under the background of the combination of the device and the cloud, and completely unify the models on the device and the cloud. Technological breakthroughs have brought smoother somatosensory interaction and more accurate recognition results, just like the animation below, if he may still be a "vulnerability mining expert" in 2017, but if he wants to "sneak attack" after 18 years, I'm afraid it will be It's hard.

breakthrough of the year

This year, in order to make the user experience and R&D process smoother, we introduced AutoML. It is not only the search for training hyper-participating network structures, but also the automation of the entire model development process. The reasons can be attributed to the following two points:

Network performance is improved step by step, and labor costs are high

This year, on the basis of previous years, in order to reach more users with end-to-end computing, lower resource consumption on the end, and more accurate recognition results, we need to complete further step-by-step improvements in network model performance. However, when the models over the years have reached a good benchmark level, it will require more energy and labor costs to rely solely on manual design to greatly improve the performance and accuracy of the model.

KA Merchant needs and public opinion need to be responded to more quickly

With the enrichment of the commercialization of AR scanning, the model results of the Fu character recognition link must also be quickly compatible with and meet the individual needs of KA merchant directional recognition. Moreover, for activities involving the participation of the whole people, such as sweeping blessings, the ability to identify public opinion and quickly iterate the model's emergency response capabilities is a must-have ability for sweeping blessings. We need to shorten the model iteration cycle as much as possible. All operations, delays, and training time are standard and controllable, and strengthen the emergency response capability of model iteration.

Automation of Network Structure Design
AutoML Capabilities Features
We used the AutoML capability provided by the xNN-Cloud platform to complete the structure search of this year's Fu character model. Then it must have the following characteristics:

1) is available out of the box. On the original network structure code, the network can support search without much modification.

2) It can take into account the end-side indicators. When comparing the results of different search experiments, it is not limited to the accuracy, and factors such as the amount of calculation/parameter amount/time-consuming should also be taken into consideration.

3) User interaction is friendly. Users only need to care about whether the search process meets expectations and whether the search result indicators can be quickly compared, but users do not need to care about specific resource scheduling and GPU usage.

The figure below shows an example flowchart of an AutoML task search process on xNN-Cloud, and the dynamic change process of the index results during the search process.

Network and Space Design
identify link

The detection model first locates the candidate target area, and then the recognition model judges whether the area is a blessing. Since the detection and recognition networks are separated, in scenarios where customized recognition requirements (such as recognizing Mr. Ma’s blessing) require rapid model migration, the detection model can be kept stable, and only migration training needs to be performed on the recognition model instead of the detection network. renew. So in general, we have two models of detection and recognition, both of which require AutoML structure search.

introduce a priori

For AutoML search, if there are enough training resources and the time span is long enough, we will be willing to give the machine as much freedom as possible, so that it can search in a more complex space. However, for the business scenario of sweeping blessings, it only takes about a month to form a temporary group by drawing students from various departments every year until the model participates in the test and joint debugging. In such a short period of time, the structure search needs to be completed for both networks. In addition, it is necessary to ensure that the amount of parameters and calculations are smaller than previous models, and the accuracy is higher than previous years. Therefore, we made some prior choices for the network framework based on the experience accumulated in other similar businesses.

detection model

Starting from the single-stage detector (SSD) structure, we added the structure of the Pyramid Pooling Network (PPN) to quickly acquire features in different scale spaces without adding additional parameters. And on the regression branch of the frame position/category, the weight sharing convolution is used to learn the information features on different scales with the same weight, so that the network can have better results for targets at different scales. Performance. On this basis, we use the MobileNet V1 structure as the initial base network to search for the channel width and convolution kernel size of each layer, and in order to predict that the head can better match the base network feature, the channel width of the shared tower regression layer is also into the search space. As a result, the feature extraction and prediction head of the network can achieve the best adaptation in the scene of blessing word detection.

recognition model

We start from the MobileNet V2 structure, and refer to the EfficientNet method, search under a smaller network, and then use the scaling of the network width, depth, and input size to make the excellent structure searched based on the small model, which can quickly expand the model capacity. And Google's Searching for MobilenetV3 is also based on MobileNet V2 in the way of search, making the mobile terminal recognition model more extreme. The results of our AutoML are also compared with the results of using the MobileNet V3 structure directly. The figure below shows the detection and recognition network structure obtained by the final structure search.

search target

Along with the search process of AutoML, it will involve the comparison of the results of each trial. In order to select a relatively better structure, to further search or train on this basis. The target benchmark for comparison, in the task of sweeping blessings, we have to consider not only the accuracy. Because the model will eventually run on the client, the calculation amount (FLOPS) of the model and the parameter size of the model (Model Size) will also become indicators that we must pay attention to. Therefore, when specifying a benchmark, we designed a way to take multiple factors into account to compare different trails in the AutoML process. Among them, T_f and T_s are the desired target values, and w_f and w_s are used to control a trade-off between FLOPS, ModelSize and Accuracy.

search strategy

Since search resources must be limited, we chose the Hyperband algorithm that is relatively optimal in terms of resource consumption and search efficiency. This algorithm allows the Trial of the pre-order search to first train with a small number of steps and then compare the objective function, and then select a better structure from it step by step, and then use a larger number of steps for training, so that the number of training steps and the number of training steps of the search structure can be compared. Find a balance between overall resources. At the same time, in order to prevent falling into a local optimal solution with a certain probability, multi-level random selection is also used when selecting the first random parameter space. Therefore, on the whole, while ensuring randomness, it also ensures a larger number of training steps for a better structure, and finally obtains a model with the best convergence.

In terms of search algorithm implementation, xNN-Cloud cooperated with the ALPS team of the Ant Artificial Intelligence Department to complete the integration of the ALPS-AutoML framework Python SDK. Among them, the hyperparameter search algorithm part supports grid, random, bayesian, racos and other hyperparameter searches Algorithm; also supports EarlyStopping mechanisms such as hyperband and medianstop; supports flexible configuration of hyperparameter combinations. At the same time, in other machine learning scenarios, ALPS-AutoML also supports very rich automatic machine learning capabilities such as meta-learning of the xgboost model and efficient automatic feature engineering.

model performance

Faster, smaller, and more precise

The top50 models from 2019 on the Android client and the top20 models from 2019 on the ios client are used as benchmarks for comparing the two-year reasoning time. Judging from the real online environment data of last year and this year, the Android platform has reduced by more than 50% compared to last year, with an average time-consuming of less than 100 milliseconds; the iOS has reduced by more than 30%, with an average time-consuming of less than 40 milliseconds. In terms of model size, it is further reduced by 80K compared with last year. In the case of lower time consumption and parameter size than last year, the accuracy has further improved by 1.6% compared with last year. These all come from the comprehensive consideration of accuracy, parameter amount, and calculation amount during the AutoML search process, so that the final model structure obtained by the search can be considered in all three aspects.

Recognition model vs Mobilenet-v3

We also compared the searched recognition network structure with the Mobilenet-v3 structure in the Fu character recognition scene. It can be seen that when the calculation amount is not much different (mnv3-small-065), the parameter size obtained by our NAS is only 1/10 of the mn-v3 structure. However, for Jiwufu, which is a business scenario where the resource size on the peer end is also very strictly limited, the model size of the native mn-v3 is obviously inappropriate. However, if some advanced blocks are discarded in order to minimize the model size of mn-v3, and the width magnification is further reduced (mnv3-small-minimalistic-025), the calculation amount of the model will indeed be further reduced, but the accuracy will cause to a very noticeable drop. Therefore, this further proves the effectiveness of the AutoML search results for Fu character recognition for the comprehensive consideration of accuracy, calculation amount, and parameter size, as well as the higher adaptability of the network to business scenarios.

P.S. The recognition accuracy shown in the above table is the image-level accuracy of 90,000 offline test cases at threshold = 0.9. When it is actually called in a business scenario, it is based on the video stream recognition scheme, and there will be fusion and verification of multi-frame results, so that the overall online recognition accuracy is almost close to 1. During the entire sweeping campaign, there was no large-scale public opinion about blessings was generated and disseminated.

Automation of Algorithm R&D Process
platform foundation

xNN-Cloud provides a one-stop vision algorithm development capability, and provides a wealth of functions and algorithm development features (as shown in the figure above), which also provides great convenience and assistance to the automation of the sweeping model development process.

algorithm development

Built-in standard classification, detection, OCR and other algorithm package templates, and with parameter configuration, you can train high-quality models without additional code for tasks. When there is a need for customized optimization, you only need to complete the development of several abstract interfaces of the Common API, and then you can apply the customized network structure, loss function, optimization method, etc. to the model training.

model compression

The configurable model compression capability greatly reduces the cost of the user-perceived model compression capability. The integrated xQueeze model compression toolchain provides powerful automatic pruning, model quantization, and fixed-point capabilities.

data preview

It is not limited to viewing the annotation results, and can combine the evaluation results of the model in the data set to achieve more dimensional display. For classification tasks, you can quickly view the intersection results between the predicted category and the GroundTruth category; for the detection task, you can visually display the IoU between multiple sets of prediction results and GroundTruth.

Multiple machines and multiple cards

Minimize the complexity for users to use the multi-machine multi-card capability. By accessing Kubeflow MPI Operator, it helps users to easily run distributed training. Regardless of the type of algorithm scenario, users only need to select the number of cards to use the training capabilities of multiple machines and multiple cards, without having to focus on how the code runs on multiple machines. The gradient on each card is How to aggregate updates.

model evaluation

During the entire sweeping period, thousands of models were produced through automated training. How to quickly select the better one from these models can greatly reduce the time cost. Through the evaluation, you can observe the accuracy results, PR curves, etc. on the independent test set. And the results of multiple models can be placed in the same panel.

real machine test

To run a model on the client side, in addition to the size of the model, indicators such as time consumption and CPU/memory usage are also needed to determine whether the model meets the running performance requirements on the client. By connecting to EdgePerf and the cloud measurement platform, the model measurement on the real machine can be quickly realized, and real and intuitive operation performance indicators can be provided.

Model quick reverse
During the Wufu event, if the model is launched, according to the actual situation of users scanning blessings, it is found that it is necessary to increase the recognition of targeted categories. And the product also hopes that the model can be launched within 1 hour. From data organization to model training to format conversion to model evaluation, if there is a step that is accidentally neglected by human operations, there will be a great risk of delaying the update and launch of the model. However, with the platformization and automation of the entire data preparation, training, and evaluation processes, the conceivable chances of making mistakes will be greatly reduced compared with manual offline training!

Before the Sweeping Model was launched, we simulated the training process of the fast reverse model based on the type of directional images on xNN-Cloud. It is only necessary to upload the directional recognition data set, clone the previous simulation training task, and add the new data set, coupled with the multi-machine multi-card capability provided by the platform, so that the final model training process can be controlled within 15 within minutes. It not only ensures that the structure and size of the model are fully consistent with expectations, but also increases the recognition of directional images on the premise of keeping other categories unaffected in the recognition results. The whole process is under control, and there is no possibility of human error, only need to be handed over to the platform. This ability has also been applied in the migration training of the recognition of Mr. Ma’s blessings in the Year of the Rat and merchants’ specific blessings. In a very short period of time, on the basis of the benchmark model, the final online version was iterated to complete the quick review of the model.


The training of the sweeping model relies on the xNN-Cloud platform. This year, it has produced a better model than previous years while saving a lot of manpower. This is inseparable from the AutoML and automated training capabilities provided by the xNN-Cloud platform. At present, xNN-Cloud is also directly connecting the real machine test and the search process, using the operating indicators closest to the real environment to guide the search of AutoML.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us