Petabyte-scale new energy vehicle (NEV) data collection, storage, and analysis for a municipal supervision platform — with high write throughput and low operational costs.
At a glance
| Organization | Shanghai Electric Vehicle Public Data Collecting, Monitoring and Research Center |
|---|---|
| Vehicles monitored | 418,000 |
| Vehicle models | 777 models from 107 brands, provided by 95 automakers |
| Data volume | Over 1 PB — top-ranked globally among cities for stored NEV data |
| Migration | 2019: self-managed Hadoop clusters → Alibaba Cloud Lindorm |
| Write performance gain | Over 3x improvement with LindormTable batch commit |
Customer feedback
In 2019, we migrated the big data platform for Shanghai new energy vehicles from self-managed Hadoop clusters to Alibaba Cloud-provided Lindorm databases. These Alibaba Cloud services help us address our difficulties in dynamically scaling out compute and storage resources. In addition, the middleware named Lindorm Tunnel Service (LTS) helps us separate our cold data from hot data to reduce our storage costs. The powerful ecosystem of Alibaba Cloud helps us break through lots of technology barriers so that we can concentrate more on business development.
Customer profile
The Shanghai Electric Vehicle Public Data Collecting, Monitoring and Research Center (the Data Center) was established in late 2014 under approval from Shanghai Community Administration, and is guided by Shanghai Municipal Commission of Economy and Informatization. As the municipal supervision platform for new energy vehicles in Shanghai, the Data Center collects, analyzes, and applies NEV data to support public-sector decision-making and security supervision.
As of January 31, 2021, the Data Center had stored data for 418,000 vehicles spanning 777 models from 107 brands provided by 95 automakers. Total stored data exceeded 1 PB, placing Shanghai in the top rank among other cities worldwide for stored NEV data volume.
The Data Center operates four platforms built on this data:
Big data platform for new energy vehicles in Shanghai
Platform for tracking the sources of power batteries and managing power batteries in Shanghai
Public data platform for hydrogen fuel stations and hydrogen fuel-cell vehicles
GEF6 Shanghai energy management center
These platforms deliver insights for vehicle security supervision, battery lifecycle management, and fuel-cell vehicle subsidy management.
Challenges
The Data Center faced six interconnected data infrastructure challenges:
| Challenge | Details |
|---|---|
| Rapidly growing data volume | A surge in NEVs driven by national policies meant continuous growth in data scale. |
| Changing collection points | Electric vehicles are still maturing; data collection points shift frequently as the technology evolves. |
| Variable collection frequency | Each increase in collection frequency caused a doubling or exponential growth in throughput and total data volume. |
| Long data retention requirements | Chinese regulations mandate multi-year retention of NEV data. |
| Real-time offline archiving | Large volumes of data needed to be archived to offline data warehouses in real time for downstream analysis. |
| Online storage for analysis results | Analysis results had to be stored in an online, queryable form and exposed as a service. |
Solution
Lindorm — developed by a team with deep database expertise and hardened across Alibaba Group's large-scale services — provides the end-to-end data infrastructure the Data Center needed. Raw vehicle data flows from collection applications through to queryable analysis results without leaving the Alibaba Cloud ecosystem:
applications → LindormTable → LTS (real-time archiving) → Apache Parquet columnar storage (LindormDFS) → LDPS → BulkLoad → Lindorm
High-throughput write with LindormTable
LindormTable, Lindorm's wide table engine, addresses the throughput and latency demands of high-frequency NEV data collection through its batch commit feature. With batch commit enabled, write performance improves by more than 3x, giving the Data Center headroom to absorb both routine growth and sudden spikes caused by changes in collection frequency or collection points.
LindormTable's data compression also significantly reduces storage costs, making multi-year data retention economically viable.
Large-scale data archiving and analysis with LindormDFS
LindormDFS, Lindorm's file engine, stores files natively — enabling extract, transform, load (ETL) processing and analysis across petabytes of NEV data. Data archived in real time via LTS lands in LindormDFS as Apache Parquet columnar storage, where it is ready for batch processing, stream processing, machine learning, and online interactive queries through the LDPS compute engine.
Benefits
Scalable, cost-efficient data collection and storage. LindormTable's batch write, efficient data compression, and linear scalability handle the Data Center's rapid business growth and absorb traffic bursts from shifting data collection points and frequencies — all while keeping storage costs under control.
End-to-end data pipeline on a single platform. Raw data flows from collection applications through LindormTable, into LTS for real-time archiving, into LindormDFS as Apache Parquet columnar storage, through LDPS for processing, and back into Lindorm via BulkLoad for online queries. The unified Alibaba Cloud platform eliminates integration overhead and lets the Data Center focus on analysis and service delivery.