The entrepreneurial team at SUYUN Technology focuses on supporting the operation of the hydrogen fuel cell eco-chain. Currently, our major business consists of real-time operations monitoring and analysis of new energy vehicles and hydrogen refueling stations, and vehicle safety operations support.
The major challenges that our Internet of Things (IoT) system is facing are the real-time parsing, storage, and analysis of high-frequency data. For example, during the real-time operations monitoring and analysis of vehicles, each vehicle reports its original packets at 1 kbit/s, and the system is required to parse, respond to, and store the packets with a delay of seconds. At the same time, the system needs to quickly query and further analyze the parsed packets of each vehicle at a transmission rate of 33 kbit/s. Considering the growing number of accessed vehicles in the future, we need to design the system in the most economical way while guaranteeing performance. Based on the transmission rate of 33 kbit/s for the parsed packets of each vehicle, each vehicle is expected to generate packet data of 30 GB per month (assuming that each vehicle is running for 10 hours per day).
Problems that exist in the original system are listed as follows:
Before optimizing the system, we need to reorganize the system architecture and clearly define the applicable scope of middleware products to be used in the data solution. The following figure shows the clear definition of data that flows in various phases:
We identified three key components for improvement, which include:
As a powerful data analysis tool of Alibaba Cloud, MaxCompute is familiar to us based on our previous cooperation. Therefore, we are giving more consideration to performance, costs, maintainability, and other aspects during this reconstruction.
First, let's talk about costs. Based on the frequency of data usage, we divide data into three types: online, offline, and archived. Regarded as archived data, packet data reported by vehicles is archived and stored on Object Storage Service (OSS). Online data has a preset life cycle of N months and mainly includes the parsed packet data that needs to be queried in real time. Offline data mainly includes various intermediate results and report data produced after the offline analysis and statistics of the parsed packet data.
After defining data types based on data usage scenarios, we mainly consider whether offline data is stored on OSS, MaxCompute, or OTS. In the following table, we roughly compare the storage and computing costs of these three data solutions.
We have already considered compressing data stored on OTS to reduce the storage costs. MaxCompute charges are calculated based on the actual data size after compressed storage. According to official MaxCompute documentation, the compression ratio is 5:1. Owing to the characteristics of our data, the actual compression ratio can be in the range of 7:1 to 8:1. As a result, the direct storage of offline data on MaxCompute prevails as the most cost-effective data solution. MaxCompute also complies with edge computing principles.
After testing, we found that OTS external tables are not satisfactory data carriers for computing. (Currently, the MapReduce computing of OTS external tables on MaxCompute is based on OTS fragments, but not partitions. The entire table is scanned each time, which can be observed from the task details of MaxCompute.)
After MaxCompute has been determined through technical selection, we need to think about how to use MaxCompute to provide reliable and stable data services for the business. We pay special attention to data warehouse modeling, data integration, and work maintenance.
Data integration is mainly reflected in the two-way synchronization between MySQL and MaxCompute. We need to take into account the design of repeated data synchronization. Work maintenance refers to the monitoring of task running status and support for tasks that are running for a second time.
For data warehouse modeling, we need to consider the reuse of costs and models. First, we apply layered modeling to massive and low-quality underlying data and ensure that the upper-level business model relies only on intermediate results. The direct benefit is the significant reduction in computing costs. (It breaks our hearts each time we discover that some development colleagues are querying data based on an original table of hundreds of GBs.) Second, the intermediate model improves the performance of the system data complement, especially in scenarios where some reports need to be run again based on business needs or data requirements. Otherwise, we have to scan the original data again, which is very costly and time-consuming.
After the reconstruction of offline statistical analysis, the system can make the best of MaxCompute's parallel computing capability and draw support from its powerful functions, especially window functions, to achieve impressive analysis performance. For example, when we help a customer with statistical analysis of a core component, a professional staff member used to spend a day in analyzing a part and it was easy to make mistakes. With the help of MaxCompute's analytical capability, we can finish the data analysis of nearly 1,000 components in 10 minutes. The following curve chart shows the average values during each data fluctuation period. These values could hardly be calculated manually before and required very complex Java coding. With the support of MaxCompute, the system can now accomplish this task with ease.
Alibaba Cloud MaxCompute - March 2, 2020
Alibaba Clouder - April 2, 2020
Alibaba EMR - November 4, 2020
Alibaba Cloud MaxCompute - January 7, 2019
Alibaba Clouder - May 24, 2019
Alibaba Clouder - October 1, 2019
Provides secure and reliable communication between devices and the IoT Platform which allows you to manage a large number of devices on a single IoT Platform.Learn More
Alibaba Cloud experts provide retailers with a lightweight and customized big data consulting service to help you assess your big data maturity and plan your big data journey.Learn More
Alibaba Cloud provides big data consulting services to help enterprises leverage advanced data technology.Learn More
ApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.Learn More
More Posts by Alibaba Cloud MaxCompute