Community Blog Empower Big Data Storage with HBase for IoV

Empower Big Data Storage with HBase for IoV

For Internet of Vehicles, you may encounter some problems, in this article, you will get some information on the solutions for big data storage.

Designing a cloud-based architecure for Internet of Vehicles, you may encounter some problems. In this article, we will talk about solutions for big data storage. To solve the big data storage bottleneck and reduce big data development and analysis difficulties, we turn to the MaxCompute and HBase cloud services.

Issue 1: User-created big data platform is costly, difficult to maintain, and lack of big data engineers.

Solution: MaxCompute + DataWorks + ApsaraDB for HBase

MaxCompute provides improved data import solutions and a variety of typical distributed computing models. Furthermore, DataWorks works well with MaxCompute and provides MaxCompute with an all-in-one toolkit for data synchronization, task development, data workflow development, data management, and data O&M.

MongoDB in the traditional architecture stores original data reported by a vehicle, helping data tracing in special cases and data compensation in the case of data loss. Generally, the volume of data to be written is larger than that to be read. A MongoDB cluster would be subject to sharp performance reduction when it reached certain scale due to lack of deep understanding and incorrect use of MongoDB. Now, MongoDB clusters are replaced with ApsaraDB for HBase. ApsaraDB for HBase is ideal for massive data storage, business dashboard, security risk control, and searching thanks to its support for high-concurrent large data volume.

Intelligent IoV platform collects massive vehicle driving data every day, such as engine state, driving behavior, fuel consumption, mileage, and travel track. We need to process and analyze the massive data. For example, we need to prepare daily travel mileage statistics, fuel consumption statistics, and monthly driving behavior report. Data volume is small at the early stage, so Kettle is used to extract data and perform other work. Most of ETL work is completed in the MySQL-based data warehouse. Multiple data sources use Presto (cluster) as the query middleware for data analysis. However, as the business grows crazily, data becomes more and more complex when the size of a single data table reaches several terabytes and the disk size reaches several hundred gigabytes. In such case, the MySQL-based basic data warehouse no longer meets the requirements, so the response time for a query is usually long or even an execution failure occurs due to memory crash, greatly affecting the work efficiency. Therefore, we use MaxCompute to build a big data development and analysis platform.

Issue 2: Most of IoV application scenarios pose high requirements on real-time data. However, due to insufficient database writing performance, data writing latency often occurs during collection of massive data.

Solution: Use Alibaba Cloud High-Performance Time Series Database (HiTSDB) to solve writing latency of massive data. According to the tests by relevant authorities, one connected vehicle can collect 25 GB data per hour. Conventional databases are not designed to process data of this scale, and relational databases have poor performance in processing big data sets. NoSQL databases can process large volumes of data well, but they are not as good as databases that are fine-adjusted for time series data. By comparison, time series databases are optimized for this purpose.

For other issues you may encounter for desiging a cloud-based architecture for IoV, please go to Designing a Cloud-based Architecture for Internet of Vehicles: IoV Series (II).

Related Blog Posts

ApsaraDB for HBase Publishes Full-Text Indexing Service to Handle Complex Queries

ApsaraDB for HBase has published the full-text indexing service. For ApsaraDB for HBase instances created after January 25, 2019, the full-text indexing service can be enabled free of charge on the console. Using this service, users can build more feature-rich search services on HBase, without being limited to KV simple queries, worrying about designing various row keys, or fearing the ever-changing HBase complex query services. The "full-text indexing service" is designed for ApsaraDB for HBase to enhance query capabilities and automatically synchronize data, allowing users to focus on how to enrich their service architecture with powerful retrieval functions.

The full-text index service is designed to enhance HBase's query capability. This function not only provides HBase with powerful KV capability, but also enriches its query capability under complex conditions. Specifically, the following scenarios are abstracted:

  1. Arbitrary query with complex conditions
  2. Multi-dimensional sorting
  3. Complex conditional paging
  4. Word-breaking keyword Query
  5. Classification of matching result sets
  6. Common stats, such as min, max, avg, and sum

The full-text indexing service of ApsaraDB for HBase is easy to use. You only need to create an index in the DDL phase, and then automatically synchronize the data and index.

Big Data Storage and Spark on Kubernetes

This article discusses big data storage and how Alibaba Cloud container services and Spark on Kubernetes can be used to meet several different storage scenarios.

To achieve larger storage capacities, we may consider external storage which allows unlimited storage space and more options. In the age of big data, however, data features more dimensions and higher heterogeneity, which brings more challenges to data storage methods and types. Using data storage systems and connections like HDFS, HBase, and Kafka cannot meet our requirements. For example, time-series storage is a preferred option for offline storing data collected from Internet of Things (IoT) devices, and structured databases are better for storing data generated from downstream and upstream applications. Underlying infrastructures and dependencies in big data platforms increase as data sources and connections become more and more. Alibaba Cloud provides a variety of storage services to meet data processing requirements in different big data scenarios. In addition to traditional storage services like HDFS, Hbase, Kafka, OSS, NAS, and CPFS, Alibaba Cloud provides storage services including MNS, TSDB, and OAS (Open Archive Service). These storage services allow big data platforms to focus on business development instead of underlying O&M of the architecture. In addition to larger storage capacity, data storage is more secure and cost-effective.

In Spark Streaming scenarios, we often use MNS or Kafka, and sometimes Elasticsearch and HBase. These services are also supported by Alibaba Cloud. Developers can focus more on data modeling through the integration and use of these cloud services.

Related Documenation

Back up HBase

The HBase cluster created in E-MapReduce allows you to use the snapshot feature integrated into HBase to back up HBase tables, and export the backup to OSS.

Create an HBase cluster and use the HBase storage service

This topic describes how to create and configure an HBase cluster and use the HBase storage service.

Related Products


EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm.

ApsaraDB for MongoDB

Alibaba Cloud ApsaraDB for MongoDB is a secure, reliable, and elastically scalable cloud database service. It currently supports the ReplicaSet and Sharding architectures and can be quickly deployed in just a few steps.

In the big data service scenario, data is imported to cloud databases in real time. The cloud databases then deliver the data to the compute engine. The compute engine analyzes the data and returns the analysis results to the databases. This enables your business to quickly obtain the analysis results.


MaxCompute (previously known as ODPS) is a general purpose, fully managed, multi-tenancy data processing platform for large-scale data warehousing. MaxCompute supports various data importing solutions and distributed computing models, enabling users to effectively query massive datasets, reduce production costs, and ensure data security.

0 0 0
Share on

Alibaba Clouder

2,600 posts | 750 followers

You may also like