Community Blog New Exploration of AI·OS: A Device-to-Device Platform for Algorithm Engineering

New Exploration of AI·OS: A Device-to-Device Platform for Algorithm Engineering

This article focuses on deep learning algorithm engineering and uses Taobao to explain how to build a device-to-device platform for AI algorithm platforms.

The Alibaba Cloud 2021 Double 11 Cloud Services Sale is live now! For a limited time only you can turbocharge your cloud journey with core Alibaba Cloud products available from just $1, while you can win up to $1,111 in cash plus $1,111 in Alibaba Cloud credits in the Number Guessing Contest.

Catch the replay of the Apsara Conference 2020 at this link!

By ELK Geek, with special guest, Zhang Di, Senior Technical Expert of Alibaba Group

Today's topic will focus on deep learning algorithm engineering. By introducing the best practices of search, recommendation, and advertising on Taobao, we will elaborate on how Alibaba Cloud builds an efficient device-to-device platform for AI algorithm platforms.

AI Drives the In-Depth Technological Development of Search, Recommendation, and Advertising on Taobao

A large amount of content presented on Taobao today has been personalized for thousands of users. Taking search, recommendation, and advertising as its core, content delivery has played an important role in personalization. In the past five years, AI technology, such as deep learning technology, has become the core driving force for the improvement of search, recommendation, and advertising on Taobao. The key elements of deep learning are computing power, algorithms, and data. The way to build an efficient device-to-device AI platform directly determines the business scale and iteration efficiency.


The Increasing Demand for Computing Power

As AI algorithms becoming intelligent, the demand for computing power continues to increase.

  • From the perspective of algorithms, algorithm engineers wish to freely design and assemble models, like piling up building blocks and quickly verifying models' effects. As a result, requirements on the understanding of the sparse representation and characterizing of continuous behaviors are increasing along with a load of various structures in a fully connected network.
  • From the perspective of computing scale, the model can contain tens of billions of features and hundreds of billions of parameters, with TB-level capability. These pose a challenge to model training and online estimation.


Improved Algorithm Diversity

The algorithms are becoming more and more diverse. In addition to the standard Deep Neural Networks (DNN) model, technologies, such as a graph-sound-based network, reinforcement learning, and tree-based deep learning, have also been widely used in Taobao's business.

  • The graph-sound network can describe connections between users and products, improving product recall capabilities by using GraphEmbedding.
  • Reinforcement learning technology is beneficial to Optimized Cost Per Click (OCPC) in advertisements to make every penny spent by advertisement groups more accurate.
  • Tree-Based Deep Match technology integrates a powerful model characterizing capabilities into the advertisement recall.

Device-to-Device Algorithm Platform

The increasing complexity and diversity of algorithms require support from an efficient device-to-device algorithm platform.

Optimization Objectives in Three Dimensions

1. Unlimited Demand for AI Computing Power

The platform can continuously release the computing power of in-depth learning to improve algorithm effects.

2. Acceleration of Iteration Efficiency

The platform provides a constant device-to-device experience and ensures the whole-process iteration efficiency of algorithms.

3. Support for Algorithm Innovation

The platform should be designed with high flexibility to support the continuous innovation of algorithms.

AI-OS: An Engineering Technology System for Big Data Deep Learning

As an engineering technology system for big data deep learning, AI-OS has covered AIOfflinePlatform (a comprehensive modeling platform) and AIOnlineServing (an online AI service system). They are engines of device-to-device big data AI for seamless connections of offline systems. Currently, AI-OS deals with big data all the time to support the search, recommendation, and advertising business on Alibaba.com. The transactions it guided dominate the e-commerce market of Alibaba Group. In addition, as the backbone of the middle platform technology, AI-OS has become the infrastructure of Alibaba Group, Alibaba.com, Alibaba Cloud, Youku, Cainiao, Freshippo, and DingTalk. More importantly, the matrix of cloud products in AI-OS is provided to developers around the world through Alibaba Cloud.


Problems Solved by Industrial Machine Learning

Problems solved by industrial machine learning involve the code development for algorithm models. It also covers full-procedure problems in an offline closed-loop environment, including features, samples, and models.


All-in-One Modeling Platform

For scenarios, such as search, recommendation, and advertising, Alibaba Cloud has developed an all-in-one modeling platform. The platform provides device-to-device services in the full procedure, such as feature management, sample assembly, model training and evaluation, and model delivery.

Based on KubeFlow's cloud-native foundation in the bottom layer, the all-in-one modeling platform provides batch learning and online learning for users.


XFC provides standardized management and trends of features. Channel is an abstract concept of sample computing, while the Model center is provided by model factories for model training, sharing, and delivery.

The visualization analysis of multi-dimensional models and the verifying of models' security in the model analysis system requires no concern in the bottom system operation. On this basis, algorithm engineers can edit the logic of the algorithm process to complete development, deployment, and the online O&M of the algorithm process. The platform has built-in and unified lineage management for computational storage. Based on the relationship mentioned above and the analysis of algorithm logic descriptions, the platform has a set of optimization layers for computational storage and editing. These optimization layers can automatically optimize the sharing of features, samples, and model data, as well as computational storage. For example, when the overlap ratio of two sets of algorithm experiment processes is high, the system will automatically merge those two sets of features for computing and storage, improving the storage efficiency of the entire platform.

With the all-in-one modeling platform, more business innovations can be achieved and projects and effect verification can be implemented at lower costs. Thus, users realize rapid iteration and circulation from product ideas and algorithms to projects.

Batch-Online-Integrated Deep Learning Solution

With the increasing pursuit of timeliness for services, online deep learning becomes increasingly important. Therefore, Alibaba Cloud provides the batch-online-integrated deep learning solution, which allows models to be updated in real-time and users to capture behavior changes of their customers in real-time.

What Is Batch-Online Integration?

Batch-online integration refers to users that are allowed to complete daily batch learning and online real-time learning with a set of algorithm logic. It can reduce the complexity of the algorithm development process and ensure consistency between full data models and real-time models.

  • Thanks to the strong computing power of Blink, batch-online integration provides highly reliable computing of lost samples with QPS in the millions when performing real-time computing of characterized samples. Through X-Deep Learning (XDL), users can implement real-time deep training. Batch-online integration also provides highly reliable real-time model verification. It updates the model changes in real-time to online model services based on the Real-Time Transport Protocol (RTP) and can provide device-to-device model updates in minutes. In search, recommendation, and advertising of Alibaba Group, batch-online integration plays a significant role in improving the timeliness of the system.


XDL: A High-Dimensional Engine for Sparse Training

The improvement of computing power in deep learning mainly involves two key aspects:

  1. The Efficiency of Deep Model Training
  2. The Efficiency of Online Estimation of Deep Model

Search, recommendation, and advertising are high-dimensional and sparse scenarios. They are characterized by tens of billions of features and hundreds of billions of parameters. Models of those scenarios are wide and deep. Therefore, they need both width and depth computing optimization simultaneously.


XDL is a distributed training framework for deep learning designed for high-dimensional and sparse scenarios.

  • XDL aims to perform a large number of distributed model optimizations and redesign parameter servers with high performance. By placing dynamic parameters based on real-time statistics, XDL eliminates the computing hot spots of parameter servers and implements excellent distributed disaster recovery policies. Thus, more efficient high-concurrency training can be completed.
  • It accelerates computing by optimizing a large number of distributed computing graphs. These include high-performance lines of data processing, sparse operators' fusion, communication merging with exchanged parameters, and the ultimate asynchronization of computing and communication. Those optimizations enable XDL to support distributed high-dimensional and sparse training with tens of billions of features, hundreds of billions of parameters, and thousands of data. In addition, excellent capabilities in automatic distribution and automatic processing lines allow model developers to focus on their model development logic without worrying about the details of the underlying layer.
  • At the high-level paradigm, XDL supports good structural training and streaming training. Structural training refers to fully reducing the computing of previous items and the following items by using structural Internet samples in the search, recommendation, and advertisement scenarios. It can improve training efficiency substantially.

RTP: Distributed Estimation Service

As a distributed estimation service for deep learning of AI-OS, RTP provides powerful orchestration capabilities for model applications by modularizing the online estimation capability of machine learning. As a result, machine learning technology can be applied to the whole procedure of online services, such as search, recommendation, and advertising, including recall, fine-grained sorting, shuffling, and summary selection.

RTP can also support consistency semantics, switching of full data between distributed models, and distributed features. It can also support the online services of TB-level models.

  • The online inference efficiency of deep learning models is crucially important because models need to perform a large amount of deep learning computing under certain latency requirements. Therefore, RTP integrates a specialized acceleration engine of heterogeneous inference computing. By doing so, RTP supports various heterogeneous hardware, such as FPG, CPU, GPU, and Alibaba NPU, which provides strong computing power for businesses in the deep learning field.
  • The entire delivery process matters as well, both in offline training models and online estimation models. In the delivery process, models need to be quantitatively compressed. Also, developers rewrite the logic of the model computing graphs to maximize the online inference logic of models.



To support fast algorithm iteration, Taobao built a device-to-device algorithm platform in an offline closed-loop environment. This platform allows algorithm solutions to be quickly replicated and migrated between different scenarios. Based on the core training engine and estimation engine, the platform uses high-dimensional and sparse scenarios in search, recommendation, and advertising for in-depth scenario optimization. The platform also allows AI algorithms to make full use of the computing power to minimize algorithm results.

0 0 0
Share on

Alibaba Clouder

2,605 posts | 747 followers

You may also like


Alibaba Clouder

2,605 posts | 747 followers

Related Products