Solution highlights: capabilities for processing petabytes of raw data of new energy vehicles, high-performance data collection and storage, efficient data analysis, and low operations costs
In 2019, we migrated the big data platform for Shanghai new energy vehicles from self-managed Hadoop clusters to Alibaba Cloud services: ApsaraDB for Lindorm (Lindorm) and the serverless Spark engine offered by Data Lake Analytics (DLA). These Alibaba Cloud services help us address our difficulties in dynamically scaling out compute and storage resources. In addition, the middleware named Lindorm Tunnel Service (LTS) helps us separate our cold data from hot data to reduce our storage costs. The powerful ecosystem of Alibaba Cloud helps us break through lots of technology barriers so that we can concentrate more on business development.
Shanghai Electric Vehicle Public Data Collecting, Monitoring and Research Center (hereinafter referred to as the Data Center) was established at the end of 2014 with the approval from Shanghai Community Administration. The Data Center is guided by Shanghai Municipal Commission of Economy and Informatization. The Data Center is the first municipal supervision platform for new energy vehicles in China and is the only one in Shanghai. The platform is used to collect, analyze, and apply the data of new energy vehicles that are popularized in Shanghai to help the public service sector make decisions and implement security supervision. As of January 31, 2021, the data for 418,000 vehicles of 777 models from 107 brands provided by 95 automakers have been stored in the Data Center. The volume of the stored data exceeds 1 PB and was in the top rank among other cities worldwide for stored data volume. Since the Data Center was established, significant efforts have been made to integrate and process data from multiple sources and apply the insights. The Data Center has also launched the following platforms: a big data platform for new energy vehicles in Shanghai, a platform for tracking the sources of power batteries and managing power batteries in Shanghai, a public data platform for hydrogen fuel stations and hydrogen fuel-cell vehicles, and the GEF6 Shanghai energy management center. The Data Center mines important data that helps users supervise vehicle security, manage the lifecycle of batteries, and manage the subsidies of fuel-cell vehicles.
The number of new energy vehicles is rapidly increasing due to the support of the national policies in China.
The data collection points of electric vehicles are constantly changing because the electric vehicles are still in the early development phase.
The data collection frequency needs to change to meet analysis and upper-layer business requirements. An increase in the data collection frequency always brings a double increase or even an exponential increase in the throughput and data volume.
China has formulated rules on the retention period of the data for electric vehicles. The retention period is measured in years.
Large amounts of data need to be archived in offline data warehouses in real time for data analysis.
Data analysis results need to be provided as a service. Therefore, the data analysis results need to be stored in online storage.
Lindorm is launched by a professional team composed of top talents from the database industry in China. Lindorm has been tested by a large number of services provided by Alibaba Group. The results of the large-scale tests show that Lindorm can serve as the foundation for online and offline storage and ensure the stability and reliability of services. Lindorm allows the customer to focus more on its business development.
Lindorm provides a wide table engine named LindormTable. LindormTable provides the batch commit feature. This feature can greatly improve the throughput and shorten the response time for requests. After this feature is enabled, the write performance is improved by more than three times.
The Lindorm wide table engine provides a feature that allows you to optimize data compression. After this feature is enabled, storage costs are significantly reduced.
The Lindorm big data storage solution provides a file engine named LindormDFS. This file engine enables Lindorm to store files. This way, Lindorm meets the requirements for extract, transform, load (ETL) processing and analysis of large amounts of data.
The serverless Spark engine of DLA meets diverse business requirements such as those for online interactive searches, stream processing, batch processing, and machine learning.
The Lindorm wide table engine provides batch write, efficient data compression, and linear scalability features. These features can improve the performance of data collection and storage and reduce costs. LindormTable provides powerful support to assist the customers with rapid business development. The Lindorm wide table engine also helps the customer manage traffic bursts caused by the changes of data collection points and data collection frequencies.
Alibaba Cloud provides an end-to-end procedure of data storage, real-time data archiving, data analysis, storing analysis results in online storage, and queries on the analysis result data. This procedure helps the customer meet the requirements for business development. In this procedure, the data flows in the following sequence: applications > Lindorm > LTS for real-time archiving > Apache Parquet for columnar storage (provided by LindormDFS) > DLA Spark for data analysis > BulkLoad > Lindorm.