×
Community Blog Technical Architecture of a Big Data Platform

Technical Architecture of a Big Data Platform

Learn how to build a big data platform architecture with Alibaba Cloud high reliability big data services like MaxCompute, E-MapReduce and etc.

How should we build the architecture of a big data platform? This article studies the case of OpSmart Technology to elaborate on the business and data architecture of the IoT for enterprises.

Abstract: How should we design the architecture of a big data platform? Are there any good use cases for this architecture? This article studies the case of OpSmart Technology to elaborate on the business and data architecture of Internet of Things for enterprises, as well as considerations during the technology selection process.

How should we build the architecture of a big data platform? Are there any good use cases for this architecture? This article studies the case of OpSmart Technology to elaborate on the business and data architecture of the Internet of Things for enterprises, as well as considerations during the technology selection process.

Based on the "Internet + big data + airport" model, OpSmart Technology provides wireless network connectivity services on-the-go to 640 million users every year. As the business expanded, OpSmart technology faced the challenge of increasing amounts of data. To cope with this, OpSmart Technology took the lead to build an industry-leading big data platform in 2016 with Alibaba Cloud products.

Below are some tips shared by OpSmart Technology's big data platform architect:

Business architecture

Business architecture

OpSmart Technology's business architecture is shown in the figure above. Our primary business model is to collect data through our own devices, explore value in the data, and then apply the data to our business.

On the data collection layer, we founded the first official Wi-Fi brand for airports in China, "Airport-Free-WiFi", covering 25 hub airports and 39 hub high-speed rail stations nationwide and providing wireless network services on-the-go to 640 million people each year. We also have the nation's largest Wi-Fi network for driving schools and our driving school Wi-Fi network is expected to cover 1,500-plus driving schools by the end of 2017. We are also the Wi-Fi provider of China's four major auto shows (Beijing, Shanghai, Guangzhou, and Chengdu) to serve more than 1.2 million people. In addition, we are also running the Wi-Fi network for 2,000-plus gas stations and 600-plus automobile 4S (sales, spare parts, service, survey) stores across the country.

On the data application layer, we connected online and offline behavioral data for user profiling to provide more efficient and precise advertisement targeting including SSP, DSP, DMP and RTB. We also worked with the Ministry of Public Security to eliminate public network security threats.
OpSmart Technology's big data and advertising platforms also offer technical capabilities for enterprises to help them establish their own big data platforms and improve their operation management efficiency with a wealth of quantitative data.

Data architecture

Data architecture

We abstracted our data architecture, which contains a number of themes as shown in the figure. The subject in the figure can be understood as users, and the object can be understood as things. The subject and object are connected through various forms. Such connections are established in time and space and are completed through computer and telecommunication networks. The subject has its own reflection in the connection network, which can be understood as a virtual identity (Avatars). The object also has its own reflection in the connection network, such as the Wikipedia description of a topic, or a commercialized product or service. These reflections are then packaged by advertisements as an advertising image. All these are object mirrors. The interaction between the subject and the object is actually the interaction between the subject image and the object image, and such interactions leave traces in both time and space.

The individual and group characteristics of the subject and object, as well as the subject-object relationships, all constitute big data. Through in-depth mining and learning, this information will give birth to powerful insights and have immeasurable value to businesses.

Related Blogs

MaxCompute: Serverless Big Data Service with High Reliability

This article mainly describes the architecture and features of Alibaba Cloud's general-purpose computing engine MaxCompute.

Big Data in Alibaba

First, let's see some background information about big data technologies at Alibaba. As shown in the following figure, Alibaba began to establish a network of big data technologies very early, and it was safe to say that Alibaba Cloud was founded to help Alibaba solve technical problems related to big data. Currently, almost all Alibaba business units are using big data technologies. Big data technologies are applied both widely and deeply in Alibaba. Additionally, the whole set of big data systems at Alibaba Group are integrated together.

Overview of Alibaba Cloud's Computing Platform

The Alibaba Cloud Computing Platform business unit is responsible for the integration of Alibaba big data systems and R&D related to storage and computing across the whole Alibaba Group. The following figure shows the structure of the big data platform of Alibaba, where the underlying layer is the unified storage platform - Apsara Distributed File System that is responsible for storing big data. Storage is static, and computing is required for mining data value. Therefore, the Alibaba big data platform also provides a variety of computing resources, including CPU, GPU, FPGA, and ASIC. To make better use of these computing resources, we need unified resource abstraction and efficient management. The unified resource management system in Alibaba Cloud is called Job Scheduler. Based on this resource management and scheduling system, Alibaba Cloud has developed a variety of computing engines, such as the general-purpose computing engine MaxCompute, the stream computing engine Blink, the machine learning engine PAI, and the graph computing engine Flash. In addition to these computing engines, the big data platform also provides various development environments, on which the implementation of many services is based.

A Comparison of Data Modeling Methods for Big Data

Organizations need to invest in appropriate data models to draw insights from them. This article gives an overview of data modeling methods and introduces Alibaba Cloud’s Big Data modeling practices.

The explosive growth of the Internet, smart devices, and other forms of information technology in the DT era has seen data growing at an equally impressive rate. The challenge of the era, it seems is how to classify, organize, and store all of this data.

Why Is Data Modeling Necessary?

In a library, we need to classify all books and arrange them on shelves to make sure we can easily access every book. Similarly, if we have massive amounts of data, we need a system or a method to keep everything in order. The process of sorting and storing data is called "data modeling".

A data model is a method by which we can organize and store data. Just as the Dewey Decimal System organizes the books in a library, a data model helps us arrange data according to service, access, and use. Torvalds, the founder of Linux, alluded to the importance of data modeling when he wrote an article on “what makes an excellent programmer”: “Poor programmers care about code, and good programmers care about the data structure and the relationships between data”. Appropriate models and storage environments offer the following benefits to big data:

• Performance: Good data models can help us quickly query the required data and reduce I/O throughput.
• Cost: Good data models can significantly reduce unnecessary data redundancy, reuse computing results, and reduce the storage and computing costs for the big data system.
• Efficiency: Good data models can greatly improve user experience and increase the efficiency of data utilization.
• Quality: Good data models make data statistics more consistent and reduce the possibility of computing errors.

Therefore, it is without question that a big data system requires high-quality data modeling methods for organizing and storing data, allowing us to reach the optimal balance of performance, cost, efficiency, and quality.

Related Products

Simple Application Server

A single server-based service for application deployment, security management, O&M monitoring, and more

Elastic Compute Service

Elastic and secure virtual cloud servers to cater all your cloud hosting needs.

Related Courses

Introduction to Big Data Platform on the Cloud

Learn how to utilize data to make better business decisions. Optimize Alibaba Cloud's big data products to get the most value out of your data.

Alibaba Cloud Big Data Products Overview

This course briefly explains the basic knowledge of Alibaba Cloud big data product system and several products in large data applications, such as MaxCompute, DataWorks, RDS, DRDS, QuickBI, TS, Analytic DB, OSS, Data Integration, etc. Students can refer to the application scenarios explained, combine with the enterprise's own business and demand, apply what we have learned to practice.

Related Documentation

Migrate the replica set of a user-created MongoDB database to ApsaraDB for MongoDB by using DTS

This topic describes how to migrate the replica set of a user-created MongoDB database to ApsaraDB for MongoDB by using Data Transmission Service (DTS). DTS supports full data migration and incremental data migration.

To migrate all data without service interruption, you can select both full data migration and incremental data migration. You can also use the built-in commands of MongoDB to migrate user-created MongoDB databases. For more information, see Migrate user-created MongoDB databases to Alibaba Cloud by using the built-in commands of MongoDB.

For more information about data migration or synchronization solutions, see Overview.

Import or export data by using Data Integration

Background information

Data Integration in DataWorks supports the MaxCompute data source. It allows you to write data from other data sources to MaxCompute or write MaxCompute data to other data sources. Leveraging the underlying Tunnel feature, Data Integration achieves the MaxCompute data read and write functions.

You can import and export data in wizard or script mode. This topic details data import and export in wizard mode.

Related Market Products

A Quick Guide to Process Structured Data with Python

Quickly understand processing of structured data using Python Pandas through hands-on practice.

Introduction to Big Data Platform on the Cloud

Learn how to utilize data to make better business decisions. Optimize Alibaba Cloud's big data products to get the most value out of your data.

0 0 0
Share on

Alibaba Clouder

2,599 posts | 758 followers

You may also like

Comments