Dolphin intelligent computing engine based on Flink+Hologres practice

When users open Taobao App, there will be two types of requests in the background. The first type is natural recommendation to meet users' demands, and the second type is advertising recommendation to meet users' and businesses' comprehensive demands.

For example, when Kaitaobao sees a brand, it is because after the brand logs in with Alibaba Mama's marketing product, it will circle people for advertising, and the people who are selected will see the advertisement.

The main goal of merchant side marketing products is to serve advertisers and help advertisers to put people in, so as to improve business results. Such marketing products cover a wide range of scenarios, including crowd selection, insight analysis, lookalike, crowd recommendation and other scenarios.

These scenarios will have the basic capability requirements of OLAP analysis, AI algorithm and real-time feature computing. Based on such a comprehensive capability requirement of data+algorithm, Ali's mother has developed Dolphin engine by herself.

Dolphin engine is a super fusion engine integrating AI analysis. It has the capabilities of OLAP analysis and calculation, streaming stream calculation, batch calculation and AI algorithm calculation. These capabilities are based on SQL components and Index Build components.

The main capabilities of SQL components are SQL escape, routing, load balancing, and federated query. The Index Build component is mainly responsible for intelligent indexing, multi-level indexing, etc.

Dolphin engine mainly solves two problems: one is the performance problem of using general computing methods in large-scale scenarios; The second is to reduce the cost of using the engine for the business side, or even make it insensitive to the underlying engine., Many general engines cannot directly solve the performance problem of the business, so data needs to be indexed to achieve query optimization. In addition, Flink Hologres has certain learning costs. Therefore, we have built this set of business engines on Flink and Hologres, making it easier for users to use Flink and Hologres, and even unaware of the existence of Flink and Hologres engines.

Dolphin engine provides such functions as self research index, intelligent materialization, intelligent index selection, heterogeneous data source query and approximate calculation.

Intelligent materialization refers to the ability to automatically convert SQL into materialized views without manual operation. The implementation method is to analyze the business history query SQL. For example, which advertisers use data more frequently in which time periods, you can choose to materialize high-frequency usage data. The intelligent index will analyze the SQL query statements, judge the condition hit rate, and recommend different indexes. The recommended goal is to maximize the filtering amount of the index for data queries. Approximate calculation is mainly used to calculate the approximate results of large-scale data.

At present, our business scale is 2w+core, with 200 million requests per day and 3000+QPS, supporting millions of advertisers and petabytes of data.

In addition, the Dolphin engine is a business engine built on Flink+Hologres, so it can support other businesses very efficiently. Many businesses can directly use Flink+Hologres by directly connecting with the Dolphin engine. The cost of using it is very low. It externally supports 10 companies ->+U in Alibaba Group, and supports the ability of crowd selection and insight analysis.

Our requirements for the underlying engine are high performance and scalability. If the performance of the underlying engine is not high, the effect will be reduced when the upper layer makes further optimization; At the same time, the underlying engine must be scalable enough so that the upper layer can make more improvements and extensions based on the underlying engine and undertake new business scenarios. Flink and Hologres can meet the above two requirements.

The real-time engine Flink supports low latency, automatic tuning, and stable and mature technology. When the user submits SQL on the Dolphin Streaming platform, the SQL will be translated into Flink SQL and submitted to the Flink engine. Here, the automatic configuration and tuning function of Flink autoscale will be used to automatically adjust, reducing the user's learning costs and tuning and maintenance costs for Flink.

Dolphin engine and Hologres have achieved a lot of co construction and growth in the past four years. In 19 years, Dolphin engine and Hologres jointly built the bitmap computing power to achieve the largest scale, highest performance and lowest use cost in the industry's public information; In the past 20 years, it has jointly built a ten million level crowd center to achieve unified management of advertising crowd; In the past 21 years, Dolphin engine has supported the algorithm business scenario, using Hologres vector computing capability to support the algorithm business, mainly supporting rough sorting of recall calculation links in the recommendation algorithm; In the past 22 years, it has supported the real-time feature development capability of algorithms. Here, it is applied to Hologres' real-time writing and spot checking capabilities to achieve more efficient real-time development. Finally, all capabilities are integrated to form a super integration engine capability.

The core of OLAP computing is to solve large-scale problems. The advertising scenario has all the data of the whole group. In order to enable advertisers to have a better user experience, it needs to support more complex computing and faster speed. The business challenges we face include dozens of single SQL table joins, the maximum single table row size of one trillion, the maximum single table base of one million and the daily updating of ten thousand level labels.

There are two tables, the user base table (gender and age) and the user store table (users and store types). You need to query the number of 20 and 30 year old users who have visited a brand. If the amount of data is small, the general engine can quickly get results. However, if SQL involves dozens of join tables, and there may be trillions of tables, the general computing engine cannot complete the calculation in this case, so we have jointly built a set of bitmap computing scheme.

The query process of the scheme is: the user enters the logic to execute SQL, and the Dolphin engine translates the logic to physical SQL, and then transfers it to the Holo Master for execution.

The index building process of the solution is: MaxCompute preprocesses the tag data, builds it into a bitmap index, and then accesses it to the Hologres cluster to achieve the bitmap query.

Bitmap query has better performance and storage, QPS 200+, and average query performance of 100 milliseconds, meeting the user's demand for interactive analysis.

Open API can directly call Flink to submit jobs, suspend jobs, etc. in the form of service interface calls. In order to reduce the learning cost of users and improve the development efficiency, Dolphin has done a lot of practice with OpenAPI and developed a real-time development platform for Dolphin Streaming.

DolphinStreaming encapsulates Hologres and Flink to expose a simpler development interface, Dolphin SQL, to users. To submit Dolphin SQL, users only need to develop real-time jobs as if they were using a database. When using a database, users can directly create a table and then write to it, without considering storage, configuration authentication information, token information, etc.

The user submits Dolphin SQL on Alibaba Mama's interactive R&D platform, processes it through Dolphin Streaming, performs SQL parsing, translates metadata management, sends SQL to Flink through OpenAPI, and pulls up the job for execution. After execution, the data will be written to Hologres, and then the features written to Hologres will be directly queried through Dolphin SQL. The whole process is very smooth and simple.

Demo1: Calculate the last 50 user behavior sequences

The last 50 behavior sequences of users are common features in the algorithm sequence model. Generally, behavior sequence features need to be developed. If Dolphin Streaming is used for development, it only takes three simple steps:

Step 1: Define the data source table. Biztype can be directly filled in as tt, which is Alibaba's real-time data source. If you want to write Flink SQL here, you need to log in to the tt management platform, query topic, and subscribe to the subID.

Step 2: Define the output table. Biztype=feature means to write to Hologres, and then fill in the PK parameter.

Third, define the calculation logic. After the SQL is executed, the data is written continuously through select user_ id, product_ id from ** where user _ Id=* * to query user characteristics.

Demo2: Real time Debug function.

During real-time development, you often need to view the upstream data source. In the past, you usually need to define a print output source, then define the input source and execution logic, write the data to the standard output, and then view the log to obtain the upstream data source. And we implemented a simpler approach.

After registering a table in the form of create table, execute select user_ ID is from a table, and the results directly show the details of the table.

When using the marketing platform, advertisers often have no idea how to promote and through which promotion channels. We solved the problem for advertisers from the perspective of algorithm. When the advertiser clicks on certain information and points, judge their intentions, and recommend more effective babies and channels for them by combining their intentions with the advertiser's own store and baby information.

Based on the above requirements, we developed a set of jobs to capture the real-time behavior characteristics of users. The real-time behavior log of merchants is calculated through Flink. After it is stored in Hologres, the online model directly reads the features, and the recommendation effect of the model is improved through the real-time features.

This scheme mainly uses Flink real-time computing, Hologres real-time writing and row table spot checking capabilities, which improves the overall development efficiency by more than three times.

Everything in the algorithm can be represented by vectors. Especially in the recommendation algorithm and recall process, vector recall is often used to obtain Top K similar objects. Here, Hologres' vector recall capability is used.

Lookalike is an important ability of advertising products. Its core is to select similar people based on the characteristics of the seed population to bring new stores. The principle is to analyze the characteristics of users who have already acted in the store, find users with similar characteristics, and promote the baby to such users.

The implementation process of vector based Lookalike algorithm is as follows:

The first step is to create a seed audience for advertisers.

The second step is to calculate the center vector based on the seed population, and then recall users similar to Top K from the overall users through the center vector.

Under the traditional algorithm, Dolphin will use the traditional database without vector recall capability to import data into the database. First, Dolphin queries the database and calculates the seed population center vector. Then find out the Top K vector through Faiss. The overall operation, maintenance and management cost of the traditional scheme is high, so we have upgraded it. Directly use Dolphin to call Hologres, because Hologres can support both database functions and vector functions, simplifying the calculation process.

Based on Hologres' vector recall capability, we have developed real-time vector recall. Users can directly enter SQL to call the underlying Hologres. Dolphin encapsulates disaster tolerance, load balancing and other important capabilities. Simply fill in the parameters to complete the calculation of batch vector recall.

Based on the powerful capabilities of Flink+Hologres, we can build a Dolphin engine that is closer to the business field, mainly including high-performance OLAP computing based on bitmap, more simple and flexible real-time development capabilities, and the powerful AI vector recall capabilities based on Hologres.

In the future, we will continue to explore intelligence and integration, and constantly improve the user experience.

Q&A

Q: Is the optimized vector recall also based on Faiss?

A: No, it is based on the vector recall library Proxima developed by Alibaba.

Q: Is Proxima better than Faiss in performance?

A: On the whole, the performance of proxima is better. Hologres is chosen here not because of the performance of proxima, but because the Hologres technical solution architecture is simpler and the operation and maintenance management cost is lower.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us