During the 2020 Double 11 Global Shopping Festival, Alibaba’s cloud-native real-time data warehouse was first implemented in core data scenarios and reached a new record for the big data platform. This data warehouse is built based on Hologres and Realtime Compute for Apache Flink, and has set a new record for the big data platform. This article focuses on the best practices of Hologres in Taobao's marketing analysis scenarios. The article also discusses the technical challenges behind the first-time implementation of Apache Flink + Hologres stream-batch unification on the marketing analysis screen during Double 11.
Big promotions are crucial for business operations and user number growth for Taobao business operations. Marketing analysis services are the core data services used for decision-making and operation instructions during big promotions. They cover the analysis works of the whole process before, during, and after the promotions. Different requirements for data timeliness and flexibility must be met in different phases.
The traditional marketing analysis is based on conventional real-time offline data systems and FW product architectures. Many problems have been exposed in large and small activities in the past, and the core problems can be classified into three types:
Therefore, it is increasingly important to respond to the changing business requirements quickly and to process data efficiently during activities. The upgraded next-generation architecture for marketing analysis should have the following advantages:
Based on the background information listed above, it is necessary to reconstruct the current architecture and find alternatives to solve the business pain points. After a lot of attempts, the Apache Flink + Hologres + FBI (a visual analysis tool of Alibaba) solution was chosen to re-architect the architecture for the Tmall marketing analysis service.
After deeply analyzing business requirements for data and exploring data models multi-dimensionally, the overall technical framework for marketing analysis reconstruction is finally determined, as shown in the figure below. The core points of the framework include:
The following section will introduce the core technologies of the technical solution: stream-batch unification, Hologres, and FBI.
The following figure shows the traditional architecture of data warehouses. The core issues of the traditional data warehouse architecture are listed below:
The following figure shows the stream-batch unified architecture of data warehouses. The upgraded architecture has the following core features:
The stream-batch unification data service based on Apache Flink has made a breakthrough in marketing analysis scenarios and has withstood severe stability, performance, and efficiency tests. During 2020 Double 11, the peak value of records processed by Flink reached 4 billion per second, and the data volume reached 7 TB per second. High stability was achieved in the entire Flink streaming and batch processing tasks during that time. The entire procedure was free of problems in terms of capacity, single points of servers, and network bandwidth.
The entire data layer is unified through the stream-batch unification data architecture. Another product is also needed to unify the entire storage layer. This product needs to support high-concurrency writes, real-time queries, and OLAP analysis.
In the architecture of the earlier versions, each page module involved the data query of one or more databases, such as MySQL, Apache HBase, and AnalyticDB 3.0 (the original HybridDB). Most real-time data was stored in HBase because HBase supported high-concurrency writing and high-performance point query. With the easy management and simple query of MySQL tables, dimension table data and offline data is usually stored in MySQL. In addition, the data used by some product modules have a small volume and multiple dimensions, such as marketing method data. AnalyticDB is selected as the database for OLAP multi-dimensional analysis. As a result, we had two pain points: the separation of real-time data from offline data and the disordered management of multiple databases and instances.
The new marketing analysis service is constructed to achieve unified storage, reduce O&M costs, and improve R&D efficiency. It also aims to realize high performance, high stability, and low cost.
After comparing multiple products, Hologres was selected for the marketing analysis service. Hologres is an all-in-one real-time data warehouse and is compatible with the PostgreSQL 11 protocol. It seamlessly connects to the big data ecosystem and supports the high-concurrency and low-latency analysis and has the capability to process PB-level data efficiently. It can easily and economically use existing BI tools to perform multi-dimensional analysis and business exploration of data. Hologres is extremely advantageous in complex business scenarios.
Based on the in-depth analysis of the modules in the marketing analysis service and the business-side requirements for data timeliness, specific real-time procedure solutions are made for these modules.
Based on the point query and OLAP analysis required for marketing analysis, dt-camp
and dt-camp-olap
libraries are created for Tmall marketing analysis. Some historical data during the marketing activity needs to be stored in the dt_camp
library for a long time to make comparisons of different activities. The overall data volume is nearly 40 TB. What is stored in the OLAP library is the detailed data of the marketing methods, and the overall data volume is nearly 100 TB. As marketing methods require very high accuracy of the overall data, the inaccurate query method is not used. This puts forward higher requirements on the overall query performance of the database.
To improve the overall performance of Hologres, the following optimization strategies have been implemented for its database based on marketing analysis:
count(distinct user_id)
statement, the user_id is set as the distribution key. This way, the count distinct is performed parallelly in each shard of Hologres. This prevents large amounts of data from being shuffled and significantly improves query performance.GROUP BY
operation to reduce the cost of COUNT DISTINCT.UPDATE
operations on column storage tables. In Hologres, the UPDATE
operations find the corresponding uniqueid
based on the specified primary keys (PK). Then, they find the corresponding record mark for deletion based on the uniqueid
and finally insert a new record. In this case, if an incremental segment key can be set, the file can be located quickly based on the segment key during querying. This improves the record locating speed based on the PK, as well as writing performance. The peak writing traffic in the system stress testing of marketing analysis can reach 8 million records per second based on UPDATE
operations.Hologres performed stably during 2020 Double 11. The peak writing speed of point query in Hologres reached several hundreds of thousands of records per second, while millions of records were served per second. For OLAP, the peak writing speed reached four million records per second, and five million records were served per second. Meanwhile, the execution time of more than 99.7% of single point queries and OLAP queries was less than one millisecond. Therefore, the overall system performance was stable throughout Double 11 and supported fast point queries and OLAP analyses simultaneously.
FBI is the preferred data visualization platform in the Alibaba ecosystem. It can quickly create reports for data analysis and support fast access and expansion of multiple data sets. It also supports the construction of various analytical data products through the product construction feature.
During the core process of product construction in FBI, four core functions can be used to reduce the construction cost:
For the underlying data after stream-batch unified processing, users demand the flexible analysis of real-time data, real-time comparison, and hourly comparison. To meet users' demands, FBI abstracts a standard data model featuring online-offline unification. This model allows accurate comparison of real-time data. For trend analysis, minute-based trends are automatically routed to minute-based tables, and hour-based trends are directly routed to hour-based tables.
Complex metrics in FBI, such as channel proportion, category proportion, year-on-year contribution, and cumulative volume of activities, were all defined with complex SQL statements in the previous version. As a result, the SQL statements were too long, and the stability and maintainability of the product were also reduced. To solve this problem, FBI built a set of analytical DSL that is easy to learn and understand, called FAX functions. FAX functions include more than 20 analysis functions, such as year-on-year balance, contribution rate, and activity accumulation. Simple statements can be used to define a variety of complex metrics used in marketing analysis.
Product page construction is a crucial process. FBI's method to simplify user configuration is listed below:
To further ensure the quality of the marketing analysis service, the test side strictly compares and verifies data at the DWD layer, the DWS layer, and the product layer. A full range of monitoring is also carried out on the core data of the big promotion.
During 2020 Double 11, the test and inspection feature improved the ability to actively identify data problems and core issues in a timely manner. The quality and stability of the whole data product improved significantly.
During 2020 Double 11, the stream-batch unified marketing analysis service based on Apache Flink + Hologres supported an average of hundreds of PV access per capita for thousands of Tmall listings. The service also achieved zero P1/P2 failure. At the same time, several advantages of the service were shown during the activity compared with previous years:
Even though it was tested by 2020 Double 11, it is still necessary to continuously improve the technologies to deal with more complex business scenarios:
Second-Level Response Time Achieved in Fliggy’s Double 11 Real-Time Data Big Screen by Hologres
Hologres Supports AliExpress Upgrades of Real-time Data Warehouse during Double 11
Hologres - July 13, 2021
Hologres - May 31, 2022
Rupal_Click2Cloud - December 15, 2023
Alibaba EMR - March 18, 2022
Alibaba Cloud MaxCompute - January 22, 2021
Alibaba Cloud MaxCompute - December 6, 2021
Get started on cloud with $1. Start your cloud innovation journey here and now.
Learn MoreAlibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.
Learn MoreConduct large-scale data warehousing with MaxCompute
Learn MoreMore Posts by Hologres