Flink x TiDB smart bud creates a new solution for real-time analysis

1. Product Architecture

The picture above is the product architecture diagram of Wisdom Bud APP, including background management system, AI, content engine, and help center, providing customers with intellectual property information services and technological innovation information systems.

2. Technical Architecture

2.1 Original real-time analysis scheme

The picture above is the original real-time analysis solution. The process is roughly that the customer retrieves a condition, and the relevant conditions retrieved by the customer are sent to different search engines through the analysis API. This solution creates 4 problems:

* Impact on retrieval performance;
* Complex analysis requires the development of plug-in support;
* High complexity of analysis across multiple search engines;
*Data of different dimensions cannot be stored.

Before establishing the real-time data warehouse, the characteristics of the real-time data warehouse required by the business were collected:

*Second-level response;
* Quasi-real-time data update;
* Can support a certain amount of concurrent capabilities;
* Consistent with search engine data;
* Ability to support complex analysis;
*Support unified usage and mainstream features;
*Support interaction with search engines;
*The ability to support horizontal expansion of storage capacity.

The figure above is an overview of the data platform. From bottom to top:

The lowest layer is the data base, including data storage and data calculation, where the data calculation layer is composed of Spark, Kafka, and Flink;
The middle layer is the data platform, including data development, data classification, data management and data service;
The upper layer is data application, mainly composed of data business, external analysis service and internal analysis business.

2.2 New real-time analysis solution

The new technology selection is mainly based on TiDB, which mainly includes two parts: data storage and data warehouse services. The data warehouse service is divided into security checks, driver table management, cache management, cluster load checks, and executors.

We chose TiDB because it is cloud-native and has an active community, meets TP and AP business scenarios, has rich ecological tools and multiple platforms, and is easy to use, compatible with MySQL and big data capabilities.

Flink was also chosen because it is an open source big data computing engine and has an active cloud-native community, which can meet the timeliness requirements for data, has exactly-once semantics for consistency, and has low latency and high throughput.

Online business data writing process: Put the source data changes into the message queue, distribute the data to different search engines through the indexing program, and the search engine will also send messages to the indexing program.

Offline analysis technology system: The entire offline analysis technology system is relatively dependent on oss. Put daily incremental data offline into oss, and perform some complex analysis on the full amount of data.

Offline business data writing process: Data changes will trigger persistent streaming to oss, and oss will merge with historical streams at the same time and store a full amount of data in oss.

2.3 Original User Behavior Analysis Solution

The original user behavior analysis solution is a very complicated solution. This solution has JS and Java APIs on the front end. JS will place the user's buried point data into the Segment, and there are two synthesis engines, Gainsight and AMPLITUDE.

2.4 New User Behavior Analysis Solution

The new user behavior analysis scheme is relatively simple. First, collect user behavior data, stream it to Flink through Kinesis, and then calculate some real-time indicators, and store the calculation results in different tables, providing us with visual development.

2.5 Flink + Iceberge Exploration

In the exploration of Flink + Iceberge, the tables of hundreds of gigabytes are streamed into Kafka, and then pushed to oss. At present, there is a lack of mature solutions on the market, so this method has not been applied to the production environment.

3. Future plans

* Cloud native database architecture migration;
*Provide a more complete indicator and access system;
* Construction of full-link monitoring and early warning of data production;
* To support the company's data consumption and service capabilities;
*Continued evolution of online real-time analysis data warehouse and its data processing pipeline;
* Create a cloud-native data technology system and a new generation of big data platform;

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us