Pre-computing technology + Hologres acquisition traffic analysis product practice

1、 Relation between precomputation and Hologres

As a query acceleration team, our vision is to build a low-cost, high-performance inclusive query acceleration engine that can support A+traffic analysis, business advisors, Huang Jince, Youmeng, FBI/QBI and other products with massive data interaction analysis. When the cost is controllable, AQP, FastMap precomputation and other technologies are used to greatly improve the query performance of massive data and reduce the response time to seconds or milliseconds.

The data product side at the top of the figure above supports such platforms as A+traffic collection analysis platform, Huang Jince (Alibaba's internal operation platform for refined population), Circle People (analysis of population characteristics), business advisors (analysis products for commercial data of businesses and stores), FBI/QBI (Alibaba's internal BI products and cloud BI products) and Youmeng data analysis platform. The middle layer is the acceleration layer, and the storage and computing under it are implemented by Hologres.

Upward pre calculation is OLAP online analysis processing. According to the engine, OLAP can be roughly divided into two categories: relational model - there is no data redundancy and preprocessing mechanism, which is more suitable for large data analysis such as Spark, Presto, etc; Multi dimensional model - Cbube model is formed in advance according to dimensions and measurements to achieve acceleration, which is more suitable for OLAP scenarios with large data volume and high performance requirements.

The traditional way to query the goods with the highest sales volume of cats on Double 11 is as follows: first, find out the data on Double 11; The second step is to aggregate commodities according to data. If there are 1 billion pieces of data, you need to aggregate them according to 1 billion pieces. The query time root is linear with the data volume.

If the sales volume can be calculated according to the time and commodity dimensions, the query will no longer be related to the data volume. The core of precomputation is to break the rule that query time increases linearly with the amount of data.

The result data of pre calculation needs a carrier to store and calculate, because pre calculation itself is still the dimension of the data model. Hologres is a database dimension, mainly used for data distribution, data storage method, index and query execution plan.

Pre calculation and Hologres are optimizations in different dimensions and do not conflict. Precomputing also requires a powerful operating carrier for storage and calculation, and its natural Cube model can play a more important role in Hologres.

From the perspective of query acceleration platform, Hologres is selected for the following three reasons:

First, operator ability. A lot of data will be turned into bitmap in pre calculation, so a strong bitmap engine is required. Secondly, Bitmap cannot solve all problems, such as the funnel of the conversion cycle within minutes, and the high base scenario of the amount frequency, but Hologres provides a special solution for it.

Second, unified query. The analysis platform has not only self-service analysis, but also high QPS scenarios such as Kanban that require millisecond level return. We hope to use a set of architectures to implement different analysis scenarios.

Third, cost based integration architecture. Hologres provides hot storage, cold storage, appearance and other solutions to help enterprises reduce costs and improve efficiency.

2、 Introduction to A+flow analysis

The A+traffic analysis platform is a unified global traffic analysis platform of Alibaba Group. With pages, small station activities, and apps as the entry points, the A+traffic analysis platform constructs macro overview data through buried point collection and calculation, such as pit effect, category conversion, transaction path analysis, and user segmentation. It is committed to creating a traffic data analysis loop, helping businesses find traffic problems, and improving traffic conversion.

For example, if you want to analyze the traffic of the lower left app page, you can view its PV, UV, average dwell time, direct guidance interface, full guidance PV indicators, etc.

In order to better integrate with the pre calculation model, we convert the indicators into pre calculation measurement forms. For example, UV is equivalent to weight removal, average residence time is equivalent to abg, and direct guidance interface is equivalent to four operations.

The analysis models generated for the demand in the traffic analysis field are event analysis, retention analysis, funnel analysis, transformation analysis, and path analysis.

Event analysis is the most important and extensive analysis method in the field of traffic analysis and solution. The core is to conduct multi-dimensional analysis around events. Events refer to a series of behaviors triggered by users during the use of products. In principle, events can be divided into several categories, such as browsing, clicking, and exposure. Alibaba Group uses the four segment SPM model to uniquely locate the site, page, block and location where the user triggered the event.

The event report interface is also relatively simple, such as the number of users browsing the event, user UV grouping, brand, time, etc. In order to combine with the pre calculation, we correspond the indicators with the pre calculation model. The number of users is equal to the number of duplicate users, the page and time are equal to the group by, and the brand filtering is equal to the dimension filtering.

Retention essentially has initial behavior and subsequent behavior. The analysis entity on the interface is the logged in user or device. When defining retention, you need to define two behaviors, which are calculated according to the time. For pre calculation, login users correspond to de duplication indicators, and filtering dimensions correspond to aggregation dimensions of different behaviors.

3、 Key business Hologres solution

At present, we can support a data level of less than 1t per day. The analysis granularity can query weeks, days and months. In this context, the problems are mainly as follows: First, performance problems. It takes several minutes on average to remove duplicate indicators, and the user experience is poor; Second, data error. Unable to meet customer requirements for data accuracy.

Event analysis is to calculate the cardinality of the user ID in the condition, and retention is to calculate the cardinality of the two behavior sets. It is easy to associate the intersection of two sets and the radix calculation with the data structure bitmap. The 0 or 1 of the bitmap corresponds to the existence or non existence of the person respectively. Secondly, bitmap is a sequence composed of 0 and 1, which can represent a set of people with certain conditions.

Therefore, we can use the high-performance computing of bitmap to solve the problem. But there are still two problems: first, although the original bitmap is only 1 bit, if there are 1 billion people, the data volume is still large; Second, with Bitmap, you also need storage for computing.

Therefore, we introduced Hologres Roaring Bitmap. Roaring bitmap can be understood as a special structure of bitmap with compression. Hologres Roaring bitmap is a distributed Roaring bitmap implementation.

On storage, Hologres implements Roaring bitmap data type. When creating tables, you can create bigint types, varchar types, and bitmap types. The bitmap level intersection, merge and difference operation is implemented on a single node. In addition, Hologres has implemented a set of distributed bitmap execution plans, so it also has a high-performance bitmap finger engine.

As shown in the above figure, different dimensions are distinguished by color, but the dimensions within the color are consistent. For example, yellow represents the person who browsed the AB page on September 1, and green represents the person who exposed the AC page on September 2. What should I do if I change the structure to Bitmap?

First, make the user ID have a unique subscript in the bitmap. For example, 1001 is 01004 is 4. The resulting bitmap structure is shown below. The same dimension has natural aggregation. The green representative visited the exposure of the AC page on September 2.

The above figure shows the architecture based on Hologres Roaring Bitmap.

On the left is DataWorks on the cloud, which is responsible for scheduling and is scheduled to the query acceleration engine system every day. Metadata is stored in the engine. MaxCompute is responsible for transforming business data into Bitmap data and importing it into Hologres. The Pangu strategy under Hologres has extremely fast import performance. Hologres is an MPP structure, which needs to be grouped according to the bitmap itself. It keeps self increasing in a certain segment, which also means that the number of people in the group is always fixed.

The right side of the above figure is the logical view of storage. Different groups have the same dimensions.

Suppose you want to query all UV values on September 1. First, the business layer sends SQL to the Hologres front-end node to generate an execution plan. Hologres is segmented by group, so the group is parallel. On September 1, there were two data. First, the Hologres index was used. After the two data were indexed, the rb was performed_ or_ Agg, make and operate Bitmap, and then call rb_ Cardinality function is used to calculate the cardinal number of the first group. The sixth group is the same. All groups are parallel, and only the merge big int value is required, which is very fast.

The retained calculation logic only needs to add one step, because it needs to be handed over to No. 2 data for retention. First, join, then rb_ and_ Cardinality operation. The parameters are two bitmaps. Finally, count is calculated.

After the improved query, the self-service analysis of dozens of T-level UV indicators takes only 5 seconds on average.

Circle people are also the application scenarios of Hologres Roaring Bitmap. In essence, it is also the intersection, merger and difference between bitmaps, but the result is not a single number, but a bitmap. Hologres can support SQL, so you can insert and select bitmaps, and naturally enclose people in Hologres after calculation. Subsequent portraits are also bitmaps in essence.

The QPS of the analytical scenario is not high, but in the business advisor scenario of Alibaba, one requirement is simple merger and difference. It needs to be analyzed, but cannot be made into KV. Therefore, we skillfully used non bitmap grouping, and used business material grouping plus Hologres to achieve 1000+QPS for simple analysis scenarios.

Hologres provides the ability to generate bitmaps in real time, and can import parts lists into Hologres. During query, bitmap is generated in real time and cross merge and offset is performed.

Building a unified query mainly comes from the following scenarios:

① Non accelerometer and precomputation table exception library. Pre calculation cannot solve 100% of the customer's problems, only the pain points. Therefore, we also need a very fast OLAP engine to support us.

② Pre calculation and wide table can give full play to Hologres performance. The wide table is directly placed in the OLAP engine. If the upper layer query is very free, the distribution and index cannot be determined. The pre calculated natural cuboid will become a table splitting form, which will be combined with Hologres to create higher performance.

③ Analytical scenarios include not only self-service analysis, Kanban analysis, high QPS and high performance scenarios, but also Hologres.

④ Precalculation category problem. Real time update scenarios. For example, if the dimension table needs to be updated in real time and the original dimension table cannot be imported, cross database problems may occur, so it needs to be in one database.

The figure above shows the data production process from left to right. The left side is the original table, and the middle platform is the model cube dimension. The table dimensions are city, event, page, and date. If you need to find a pv, Cuboid is 2n. After the Cuboids are generated, they are aggregated into the Hologres logical structure. One Cuboid corresponds to one table, which is naturally separated.

Cuboid naturally reduces dimensions. When distributing, you can use common queries as distribution keys. This layer determines the complete distribution of queries. The second step is to use the Hologres segment key field, which is usually a time sequence field with a date. The date determines the fast search of files in the shard. Then configure the Cluster key to the high-frequency filter index to quickly find the block. Finally, configure the non high frequency filter through the bitmap columns to quickly find data in each block using the bitmap index.

The average RT of the above query on the pre calculation self-service analysis platform is less than 1 second, and the KV index RT is about 10 ms, which can support millions of QPS.

Cost is the focus of enterprises. As a query acceleration platform, we maintain many types of tables, such as the lowest level detail table, sub table, sampling table, etc. The business system will make sub tables by itself, but general products such as BI products will not help the business to do sub tables. Tabulation is also a requirement in the field of OLAP query acceleration, and may be sampled through near real-time calculation.

The query types and service life of multiple tables are different, which is closely related to the cost. Hologres provides hot storage, cold storage, surface and other storage media, which reduces performance in turn and increases cost in turn. It can be freely selected according to the analysis requirements of the actual business table and the query requirements of the time cycle.


Q: If the Bitmap group is a 500 million yuan group, if the people in the first group are browsing and the people in the second group are paying in advance, how can the people in the two groups correspond?

A: Bitmap storage bits have no relationship with table dimensions. For example, the fifth person in the sixth group is always the fifth person.

Q: Is cold data stored in MaxComputer in the form of Hologres?

A: No. The difference between cold data and appearance is that the appearance is external storage, such as storage in MC, but the cold data is Hologres' own, only implemented through OSS.

Q: How are hot and cold data used?

A: Direct use. The cold storage data is managed by Hologres, but stored in the OSS in Hologres. SSDs are used for hot storage. The storage format is still Hologres' own internal format with indexes. In terms of usage mode, there is a partition table on which a policy can be set. You can specify a retention period of 7 days or 14 days. All data partitions over 14 days will be transferred to cold storage as a whole.

Q: When is it recommended to create partition tables in Hologres?

A: Partition tables are not required for pre calculation scenarios. If precomputation is not implemented, it is not recommended to create partitions for small data volumes, because more small partitions will result in more fragments and more files. In addition, it is recommended to create partitions for scenarios such as daily regular query of partition data or daily replacement. In general, it is recommended to build zoning if it is more than 100 million.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us