Developer Content

Real-time application log analysis

scene description

The requirements of the first scenario are relatively common. In this scenario, an API for vehicle privacy protection is built. This API itself can perform a privacy-protecting process on the photos of vehicles uploaded by users, and it is a deep learning model.

This model is encapsulated into an API and placed on Alibaba Cloud's public cloud ECS for users all over the world to access. The first thing that needs to be done for this API is to analyze how many people are accessing his feedback frequency, which country or region they come from, and some characteristics of his access. Is it an attack or a normal use?

In order to do this real-time analysis, it is first necessary to be able to collect a large amount of real-time application logs of each API scattered in each server. Not only can it collect us, but we also need to be able to process it in a relatively timely and real-time manner. Processing includes queries that may have dimension tables, aggregation of some windows, etc. This is a relatively common operation for streaming computing. Finally, the processed results of these operations are placed in an environment with high throughput and low latency, making downstream analysis The system can conduct a real-time access to the data.

The entire link is not complicated, but it represents a very important capability, that is, by using Flink as a representative of real-time computing and processing, it can provide business decision makers with a data-driven decision-making function within seconds.

Demo solution architecture

Let’s take a look at how this demo is implemented. There are several important keys in this architecture.

First of all, the upper right is the built API environment, using Flask, Python combined with mainstream Nginx, Gunicorn to make it into an API. It is necessary to turn the API into a container image and deploy it to Alibaba Cloud's ECS through the image. For high concurrency and low latency, a seventh-layer load balancer is also installed, and an API Gateway is installed in front to help users go The ability to call the API.

At the same time, as this demo, we also provide a WEB APP, so that users can not only call the API through code, but also use a graphical interface to access the API. When the front-end users call the API, they will use the SLS simple log service to collect real-time API application logs from the API's own server, and after simple processing, deliver them to the real-time computing Flink.
Flink has a good feature, that is, it can subscribe to the delivery of logs from simple log services, and perform operations such as query combination of window aggregation dimension tables and other operations on this log in the form of streaming computing, and there is another benefit It is that it can use customary SQL to customize more complex business logic.

After all the data has been processed, Flink will write the streaming data to Hologres in a structured table. Hologres is not only a data store, but also an OLAP-like engine that powers downstream BI data. These things are strung together to form a framework for this big data real-time log collection and analysis.

Solution analysis

Let's take a closer look at how each part is used.

Use the Vehicle Privacy API as a data source for real-time analytics
Through the WEB APP, users can easily upload photos of their vehicles, and the API will blur them. In the screen recording, it can be seen that the background of this photo is blurred after being processed by the API, and the part of the license plate and the private information are also blocked.

SLS Log Center
When a user accesses this API, the background simple log service will collect him in real time.

After the log is collected, the ability of Log tail to convert and process data will be used to analyze and convert the original log to a certain extent, including parsing the IP address into geographical information such as the latitude accuracy of the country and city, which is convenient for subsequent downstream This information can be dispatched during the analysis, and besides some simple services, it also provides a very powerful graphical data analysis capability.

Real-time computing Flink version

Here you can do a primary data analysis, or a data survey function, and you can see whether the conversion of the original log meets a requirement for downstream business support. After the log is collected, converted and processed, the log will be converted through the Log Hub Delivery to the stream processing center, that is, real-time computing Flink.

In fact, the term practical delivery is not particularly accurate. In fact, Flink actively subscribes to the processed log information of the Log Store stored in the Log Hub. Flink has a very good place. It can use common SQL to write business logic, including conversion and processing of some logical conditions. After the SQL is written, just click Go Online, it can be packaged into a Flink job, and hosted in Flink's cluste, inside the cluster, it can be accessed very conveniently through this console.

So what is the frequency of use of the current plus cluster? How about the CPU, whether there is any abnormality, whether there is an error report, including viewing the entire delivery situation, etc., can be hosted directly through Flink. This is a very big advantage, and there is almost no need to worry about operation and maintenance.

Hologres (HSAP)

After Flink processing is completed, the stream data can be directly written into our storage system Hologress in a table-like structured manner through the interface provided by Flink. Hologress has a particularly big feature that is It is both OLTP and OLAP.

Specifically, it can be used as OUTP to write quickly, and at the same time, a high-concurrency and low-latency query analysis can be performed on the written data at the same time. That is to say, the capability of the OLAP engine, which combines the two into one, so Hologress is also called HSAP.
image34.png

DataV Dashboard
In this architecture, it is mainly used to display the processed data to the downstream, that is, end users, and the business decision-makers at the end can see the real-time big screen of consumption.

This real-time large screen will reflect the latest information processed with a delay of seconds as the API is accessed. On the real-time large screen of this datav, this can greatly reduce the delay when decision makers see the data.

If the traditional batch processing method is used, then each processing may require terabytes of data, and the processing time may be as long as several hours. If an end-to-end real-time computing solution with flink as the core is adopted, this delay can be compressed from several hours to a few seconds or even within one second.

Vehicle engine real-time predictive maintenance

scene description

The second business scenario is to combine the simulated telemetry data of IoT to analyze and judge whether the engine of the car on the road shows some abnormal licenses, so as to judge whether there may be a problem in advance. If left alone, there may be a certain part in 3 months. It is about to break down. This is also a requirement that is often mentioned in practical application scenarios. We call it predictive maintenance. In actual application scenarios, predictive maintenance can help customers save a lot of money, because repairing things when they have problems is definitely not as effective as replacing them in advance before they are damaged.

Demo solution architecture

In order to realize such a scene that is relatively close to the real world, the research has learned that there is a diagnostic system called OBD II among the on-board equipment, which often contains classic data, and collects part of these data to process and simulate it. I wrote a program to simulate a data of a more realistic car engine running in a real environment.

Of course, this time it is impossible to actually let a car drive on the road, so with this simulation program, various statistical analysis methods are used to simulate and generate such driving data, so as to achieve the real effect as much as possible.

This program will deliver the simulated driving engine telemetry data to Kafka, and then consume and subscribe to Kafka topics through real-time calculation of Flink, and then perform different streaming calculations according to each topic. Part of the result is archived in OSS, and historical data can be obtained by storing it. The other part is directly delivered to the developed anomaly detection model as a heat flow data source, deployed on PAI EAS, and can be directly invoked through Flink .

Then after making this machine learning judgment, check to see if there are any abnormal signs in the data of the current engine, and then write the result into the database for AB to consume one by one. After the data is processed in real time by Flink, part of the data is archived in OSS.

This part of the data is actually used as historical data to model, or even re-model. Because every once in a while, if there are some changes in the characteristics of driving, commonly known as Data Drifting, then the newly generated historical data can be used to retrain the model, and the retrained model can be used as a Web Service , and deploy it to PAI ES for Flink to call, so that a Lambda-based big data solution is completed.

Solution analysis

Generate simulated driving data

First of all, it is necessary to do the work of generating simulated data, to simulate the OBD data of the engine's telemetry data, and deliver it to the cloud for analysis. The function calculation is used here, and the function calculation is very convenient. It's a hosting service first, it's a service of services.

Secondly, you can directly copy and configure Python scripts from locally developed scripts into this function calculation, and use this hosted calculation to execute such a program script generated by the simulated data, which is very convenient.

In this demo, the function calculation is performed every minute, that is, a batch of telemetry data is generated, and then a data is delivered to Kafka at an interval of 3 seconds each time to simulate the data in a real environment as much as possible. A frequency generated.

Collect/publish driving data

Kafka is also a commonly used Pub/Sub system for big data. It is very flexible and has excellent scalability. For Kafka on Alibaba Cloud, you can build a Kafka cluster in EMR, or you can use Kafka on MQ A hosted service to build a complete service.

This is a kafka system. This demo uses kafka to build a hosted Pop Subject System for convenience. This system is actually only used to store the data generated in front, that is, the engine delivered by the vehicle. So in practice In the current production environment, it is impossible to have one car, and it must even be tens of thousands, or hundreds of thousands. If Kafka is used, it can be very convenient to expand the capacity. Regardless of whether there are 10 or 100,000 front-end vehicles, the overall structure does not need to be changed much, and it can calmly cope with the flexible needs of these expansions.

Real-time calculation and exception analysis model call

For the real-time computing part, Flink’s real-time computing system is still used, but in this demo, Blink’s exclusive cluster is used, which is the so-called semi-hosted real-time computing platform. In fact, it is almost exactly the same as the full hosting method in the previous scene.

It’s just that when making this demo, the fully managed version of Flink has not been launched in some areas, so I chose a service called Blink exclusive cluster, which is also linked to the family of real-time computing. The method used is almost the same as that of the full Flink Hosting is exactly the same, and developers only need to focus on writing this script to do business logic processing, click Go Online, and the rest is basically completely managed by Flink, just need to monitor to see if there are any exceptions, including doing Some tuning and other work are very convenient.

So what is worth mentioning here is that the interface called by the model of PAI-EAS is embedded in Flink, so that when Flink processes streaming data in real time, it can also throw part of the data to PAI to do the inference of the model. , the results obtained are combined with real-time streaming data, and finally written to the downstream storage system, which reflects the extensibility and scalability of the Flink computing platform.

Development of Anomaly Detection Models

This part shows how to use a graphical learning platform to design and develop a very simple binary classification model.

This binary classification model is mainly to learn from the historical data of the engine in the past which features will be used to judge that the engine has a problem, and which are relatively normal values. Through this model, there is a basis that can be used to make a judgment on the new engine data generated in the future, which will help business personnel to predict the data problems of the current engine in advance.

Model deployment and call service

Because the model has learned the relevant features and the pattern of this Data from the past. The studio used in the whole process of the development of this model is completely built by dragging and dropping, and almost no code has been written. It is very convenient and fast, and the development of a model can be realized through buttons. What's even better is that after the model is developed, it can be packaged into a rest API and Web Service through PAI's one-click deployment and placed on the PAI platform for users to call. After one-click deployment, it is very convenient to make a test call to the service of the deployed model.

High Throughput Structured Data Store (RDS)

When the model deployment is complete, Flink can let him judge whether there is any abnormality. After the stream data is processed in real time, it is finally written into a MySQL database.

This database will be used as a data source to provide a data support for the downstream real-time large screen. In this way, business personnel can realize real-time, that is, every few seconds, can see whether there is any problem with the car currently running on the road?

Near Realtime Dashboard

The large screen of data v is preset to be updated every 5 seconds, which means that every 5 seconds, the latest pre-telemetry data will be collected from the database, including the data for judging whether there is any abnormality, and the data will be displayed on the large screen.

Red represents the data collected at this point in time, which means there is a problem, and blue represents normal, that is, relatively normal data. The normal standard of this data is completely controlled by the function computer of the simulated data generated before. Because some data that makes the engine look wrong is artificially added in the logic of the function computer, which makes the abnormal part of this demo reflect a little more.

Scenario solution demo based on real-time computing Flink version

Related Articles

A detailed explanation of Hadoop core architecture HDFS

What Does IOT Mean

6 Optional Technologies for Data Storage

What Is Blockchain Technology

Explore More Special Offers

Short Message Service(SMS) & Mail Service

Sales Support

Technical Support

Connect & Report Abuse