Da Mo Academy Daniel learns to cut
In the eyes of others, most of the Bodhidharma Academy are strange people, doing mysterious and high-end research, like sweeping monks, but if one day, when the mysterious experts are no longer mysterious, you find that they also start playing cutouts, and All this is developing in an uncontrolled direction. So what tricks can they play with cutouts?
You see, everything can be picked!
Some pictures are from Taobao product map
Why did we start researching matting?
This starts with Luban, a design product independently developed by Alibaba Intelligent Design Lab. The original intention of Luban is to change the traditional design mode, so that it can complete the design of a large number of banner pictures, poster pictures and venue pictures in a short time, and improve work efficiency. The baby pictures uploaded by merchants are uneven, and the direct delivery effect is not good. Luban Mapping can ensure the uniform style of the venue and high-quality visual effect transmission, thereby improving the attractiveness of the products and the visual experience of buyers, and achieving the purpose of improving the conversion rate of products. In the process of drawing, we found that product matting is an inevitable and cumbersome work. A fine portrait matting takes an average of more than 2 hours for the designer. Such pure physical work that does not require creativity needs to be replaced by AI. Instead, our matting algorithm came into being.
In recent years, image matting algorithms have gradually entered people's field of vision. And the industries hidden behind it: pan-entertainment, e-commerce, vertical industries, such as online catering, media, education, etc. The commercial value of industries should not be underestimated. Form picture production demand expansion. Some matting algorithms on the market are not very effective in handling the hair details of portraits, and they do not support some general scenarios (e-commerce, etc.) very well. For these two problems, we design a system with more generalization ability on the one hand, and on the other hand, deepen the hairline and highly hollow-out related algorithms, which have better results.
Difficulties encountered and solutions
When we first got started with Luban's "batch cutout" requirements, we found that the quality, source, and content of images uploaded by users are varied, and it is difficult to use a model to achieve business effects once and for all. After extensive analysis of scenarios and data, the overall framework for customization is as follows:
It mainly covers four modules of filtering, classification, detection and segmentation:
Filtering: filter out poor images (too dark, overexposed, blurred, occluded, etc.), mainly using classification models and some basic image algorithms;
Classification: The connectivity of products such as bottled drinks, cosmetics, etc. is relatively good, while 3C, daily necessities, toys and other categories are the opposite. In addition, the requirements of scenes (such as human heads, portraits, animals) are also different, so different segmentation models are designed to improve the effect;
Detection: In the Luban scene, user data mostly comes from product images, many of which are highly designed images. One image has multiple products, multiple categories, and a small proportion of subjects. There is also redundant information such as copywriting, modification, and logo. Add a step of detection Cropping and then segmenting the effect is more accurate;
Segmentation: first perform a layer of coarse segmentation to obtain a rough mask, and then perform a fine segmentation to obtain an accurate mask, so that on the one hand, the speed can be increased, and on the other hand, it can be accurate to the hairline level;
How to make the effect more accurate?
At present, the classification and detection models are relatively mature, while the evaluation model needs to be customized according to different scenarios (e-commerce design drawings, natural photography images, etc.), and the segmentation accuracy is insufficient, which is the weakest link in all modules, so it has become our main battlefield. The details are as follows:
Classification model: Classification tasks often require multiple rounds of data preparation, model optimization, and data cleaning before they can be used. Based on this, we have designed an automatic classification tool, which integrates the latest optimization technology and learns from the idea of autoML to perform parameter and model search under the condition of limited GPU resources, simplify the participation of personnel in the classification task, and accelerate the implementation of the classification task.
Evaluate the model: directly using regression to do score fitting, the training effect is not good. In this scenario, as a pre-order filtering task, it is more reasonable to deal with it as a classification problem. In fact, we also use some traditional algorithms to assist in the judgment of overdarkening and overexposure.
Detection model: It mainly draws on the FPN detection architecture.
For each feature map of the feature pyramid, the features of the upper and lower adjacent layers are fused, so that the output features have stronger potential representation capabilities;
The features of different layers of the feature pyramid are predicted separately, and candidate anchors can increase the robustness to scale changes and improve the recall of small-scale regions;
Add some predictable scales to the setting of candidate anchors, and greatly improve the universality in the case of extreme product size ratios;
Segmentation Fusion Model:
Different from the traditional image segmentation (segmentation) problem that only needs to separate the foreground and background, the high-precision matting algorithm needs to find out the specific transparency of a certain pixel, and turns a discrete 0-1 classification problem into [0, 1 ] between regression problems.
In our work, for a certain pixel p in the image, we use such a formula for transparency prediction:
The red part in the right figure is the pixel surrounded by the probability of the foreground and background
Fusion Network: Consists of several consecutive convolutional layers, which are responsible for predicting mixture weights. Note that in the solid area of the image, the foreground and background prediction of the pixel is often easy to meet this condition. At this time, the derivative is always 0. This good property allows the fusion network to automatically "focus" on the semi-transparent area during training.
Application productization open
The basis for commercial application is our single-point capabilities at the application layer, such as portrait/head/face/hair cutouts, product cutouts, and animal cutouts. In the future, we will gradually support cartoon scene cutouts, clothing cutouts, and panorama cutouts. Figure etc. Based on this, we have also done some productization work, such as Luban’s batch white background map function, E application ID photo/battle report/character background change (DingTalk->My->Discovery->Mini Program->Drawing Butterfly) wait.
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Knowledge Base Team
Explore More Special Offers
50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00