Interpretation of the Concept of Analysis Service Integration

An important capability innovation of Alibaba Cloud's integrated data warehouse.

The process of making decisions through data is analysis. Common multi-dimensional analysis, exploratory analysis, interactive analysis, AD Hoc analysis, etc., are usually used in internal business reports, leadership cockpit, indicator library platforms, and other fields, and are good at dealing with complex and multiple Flexible queries; service is usually a term in the field of TP, which is responsible for supporting the high-performance, high-QPS data read and write requirements of online services, and has high requirements for SLA, availability, and delay. However, it is also different from TP. Its requirements for transactions are weaker than those for throughput and systems. It usually serves 2c scenarios, including online recommendation, online marketing, and risk control.

The underlying data sources of the analysis and service scenarios are unified and even support each other. The service data will be used for secondary analysis, and the analyzed data can also be used for online services. The data exchange between systems can be simplified through the integrated architecture of analysis and services, and the development efficiency can be improved.
What is an efficient, high-quality, and reliable real-time data warehouse? In the field of processing, the trend of real-time data warehouses is becoming more and more agile, including the lightweight processing methods, real-time, and weakening of data tiering; in the field of services, more and more big data teams directly serve the company's online business. , from a cost center to a profit center, to ensure the stability and high efficiency of online business; in the field of architecture, through the integration of analysis services, improve development efficiency and reduce operation and maintenance costs.

In response to trends in various fields, we have made a series of capability innovations.
The traditional Lambda architecture is "complicated", the pain of data warehouse construction
Traditionally, building a reasonable big data real-time data warehouse system is a complex project, basically using the Lambda architecture, with real-time processing layers, offline processing layers, and even some near-real-time processing layers. Data storage will be divided into offline storage and online storage according to different access characteristics, and the online part will be subdivided into the OLAP system and VK system, which respectively provide flexible analysis capabilities and online high-performance queries. On the application side, most of the online systems are accessed through API, and most of the analysis systems are accessed through SQL. Different systems are connected to different storage engines.
The above architecture is effective when there are few business changes and high data quality, but the reality is often more complex, business changes will become more and more agile, and the quality of data cannot be guaranteed. Routine adjustments to data structures, data quality corrections, and re-flashing are still frequent and time-consuming tasks.

In addition, there are multiple sets of storage in the architecture. Repeated synchronization of data will make business agility impossible. IT engineers need to spend a lot of time on data troubleshooting and correction.

All in all, the silos of the architecture will inevitably lead to difficult data synchronization, high resource consumption, high development costs, and difficulty in recruiting talents.

Data processing agility

There are two core points to solve the agility of processing: one is to simplify state storage and reduce data redundancy so that data development and correction only need to be carried out on one piece of data; the other is to reduce the weight of processing links.

In terms of state storage, Hologres provides good real-time batch write and update capabilities, whether it is a single flexible update, or hundreds of millions of batches of refresh scenarios can be well supported. Therefore, a unified state layer of data can be built based on Hologres to reduce data relocation.

In processing, it is divided into common layer processing and application layer processing. Public layer processing adopts Flink+Hologres The Binlog method realizes the event-driven development of the whole link from ODS to DWD to DWS, and realizes data writing and processing; at the application layer, the management of intermediate tables is reduced by encapsulating business logic, and the excellent distributed query through Hologres Capability to provide the business layer with good analytical flexibility, giving flexibility back from the engineer to the analyst.

Online data service

In the current environment, data has gradually expanded from the original internal decision-making scenarios for 2B to online business scenarios supporting 2C, supporting real-time portraits, real-time personalized recommendations, and real-time risk control. The efficiency improvement of online conversion through data puts forward higher requirements for the efficiency and stability of system execution. It also allows the past analysis system to evolve from a relative edge system to a mission-critical business-critical system, requiring data platforms with higher availability, higher concurrency capabilities, low latency, and low jitter, and support for cloud-native elasticity Expansion capabilities, support for hot upgrades and hot expansion of services, and more complete observability and operation and maintenance capabilities.

In response to the above requirements, Hologres has made a lot of innovations in storage engines, execution engines, and running capabilities.
Online Drive Reliability Design.
On the basis of the original row storage and column storage, it supports the coexistence structure of row and column, so that a table has both OLAP and Key-Value advantages. At the same time, it introduces the ability of shard-level multi-copy and realizes QPS linearity by increasing the number of copies. increase. In addition, by combining the two capabilities of row-column coexistence and shard set multi-copy, a new scenario is supported—the non-primary key check capability, which is widely used in order retrieval scenarios

In addition, the system will inevitably have a lot of requirements for operation and maintenance upgrades. Holograms has introduced hot upgrade capabilities to keep services uninterrupted during the upgrade process and reduce the impact of system operation and maintenance on business. At the same time, through the innovation of metadata physical backup and data file lazy open capabilities, the speed of failure recovery is optimized.

The actual business verification shows that after optimization, the recovery speed is improved by more than 10 times, and the automatic recovery of faults in minutes is realized , which minimizes the impact of faults.
Analytical service integration simplifies data service export
The integration of analysis services is an important trend to simplify the data platform and unify the export of data services, and it is also an important capability innovation of the storage query engine. Two typical data scenarios are supported in one architecture, which can not only support complex OLAP analysis, but also meet the requirements of high QPS and low latency of online services; in business, a unified data service export is created for users, which realizes Agile business response; supports data autonomous analysis, avoids data silos, and simplifies operation and maintenance.

However, the above requirements pose a high challenge to the technical architecture. Therefore, Hologres also designs row storage and column storage for different scenarios in storage . Row storage supports online high-QPS query, and column storage supports OLAP scenarios, which is also implemented in computing. In order to support more fine-grained load isolation capabilities on the basis of data sharing.
Resource isolation, high availability, unlimited elasticity, unified storage
Hologram has a multi-instance, high-availability mutual aid model based on shared storage. Users can create multiple instances, which represent different computing resources, but all instances share a piece of data. One instance is the main instance, which supports read and write operations of data, and the other instances are read-only sub-instances. The memory state of data between different instances is synchronized in milliseconds and in real-time, and only one copy of data is physically stored. In this solution, the data is unified, and the permission configuration is also unified, but the computing liabilities are 100% isolated through the distinction of physical resources, and read and write requests will not increase resources, and it also reflects better fault isolation capabilities.

A master instance currently supports mounting up to 4 sub-instances. If it is deployed in the same region, all data will share storage; if it is deployed in different regions, the data needs to be replicated in multiple copies.
This solution has been repeatedly verified in a large number of scenarios of Double 11 in 21 years, and the reliability is very high. It is generally recommended that one main instance be used as an instance for data writing and processing, and other sub-instances are used for OLAP business analysis or external data services so that different computing specifications can be allocated according to different scenarios and different computing power requirements.
Analysis service integration architecture upgrade case practice
Real-time data warehouse architecture of a leading logistics company. Logistics companies have a strong demand for real-time decision-making and real-time analysis, and there will also be traffic peaks during regular marketing promotions. The system load fluctuates greatly. At the same time, it also needs to directly support many 2c scenarios, which requires high service responsiveness. high.
Before the architecture upgrade, the company mostly adopted some traditional relational database architectures to support real-time queries and real-time monitoring of online business, including scenarios such as refreshing the logistics status of each package.
However, such an architecture has the problem of insufficient real-time performance. The data update efficiency of the order is low, and the update link is also very long, which cannot meet the needs of real-time monitoring, and will also reduce the efficiency of logistics distribution. At the same time, complex correlation calculations are often required between multiple indicators, and the query efficiency is relatively slow, which cannot meet the needs of real-time business decision-making.
The pain point of the architecture is the lack of stability. When multiple services are queried concurrently, the overall delay will increase, affecting the stability of the service. The traffic that needs to be borne during Double 11 will be several times the daily traffic, and the original system cannot withstand the sudden increase in traffic, which will require a lot of additional manual operation and maintenance.
Therefore, we upgraded the real-time data warehouse architecture for users and replaced the original data warehouse architecture with Flink+Hologres. For frequently accessed service data, use Flink to consume data from DataHub, and store the calculation results directly in Hologres; for some complex query analytical data read the upstream RDS through DataWorks, and perform ODS. The layered construction of DWD or DWS and other data, so as to connect the final summary data to the upper-layer application, and realize high-concurrency and fast query. The solution adopts a hybrid model of integration of analysis services, which not only utilizes Flink's stream computing capabilities for business preprocessing but also makes full use of Hologres' powerful complex and multi-dimensional query capabilities, successfully replacing traditional databases software such as OLAP systems and RDS systems. , which simplifies the schema of the data.
After the upgrade, the stability of the system has been greatly improved. Whether it is real-time data writing or data reading, it shows strong stability. During the entire Double 11 period, it truly achieved zero failure rate, met real-time business needs, supported real-time large screens such as real-time parcel collection, operation transfer, and allocation in the warehouse, and provided strong real-time data support for operations.
The overall effectiveness has also been significantly improved, bringing a good logistics experience to users and improving the company's service level.
In addition, the traffic peak period for Double 11 is thousands of times higher than the daily traffic. Through the cloud-native elastic capability of Hologres, the dynamic expansion and contraction of resources can be realized, which can meet the different needs of resources and reduce the cost of operation and maintenance. Alibaba Cloud Big Data is a simple, easy-to-use, fully managed cloud-native big data service for business agility. Activate data productivity and analyze to generate business value.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00