InMobi best practices based on open source big data services

1、 Company Profile

InMobi is a global mobile advertising and marketing technology platform driven by AI and effects. It provides mobile advertising promotion and marketing technology services for domestic brands and apps based on the massive number of apps and users connected globally, and provides commercial realization services for app developers. The platform was established in 2007 and entered the Chinese market in 2011. It is R&D technology-oriented and occupies a heavyweight position in the mobile advertising platform industry. Its professional technology is very leading in the world and China. Through its localization service teams in 23 countries and regions around the world, InMobi has reached more than 1 billion monthly active independent users, providing more than tens of thousands of refined audience categories, thousands of dimension labels, data from tens of millions of user-defined sample databases and refined mobile advertising based on LBS services.

As a leading technology company in the world, InMobi was rated by CNBC as one of the "Top 50 Global Disruption Companies" in 2019, and also by Fast Company magazine as one of the "most innovative" companies in 2018.

2、 InMobi China Big Data Solution

The above figure shows the original Chinese big data cluster architecture of InMobi, which is mainly divided into data intake layer, storage layer, computing layer, and report layer. First of all, all kinds of advertising data in the front end of the advertisement are ingested through the data intake layer, especially the RR data, and then the data is stored in the offline HDFS big data cluster, and then the data tasks are processed through the computing cluster. Finally, the processed tasks are presented to the end users in the form of reports.

During the operation and maintenance of big data clusters, some problems are gradually exposed:

• Big data cluster is built in IDC, which is not conducive to resource scaling and expansion

When computing resources are not enough, some tasks need to be deployed or even suspended, and important tasks need to be prioritized, which is not friendly to report generation

• Poor real-time performance of data report

The data report is not real-time enough to match the needs of the business party's report minute presentation

• Vertica database used to process real-time report data is relatively expensive

3、 InMobi China Big Data Cluster Optimization Scheme

Optimization of big data cluster

Based on the three typical problems mentioned above, InMobi made the following reflections on the optimization scheme:

• Build a hybrid cloud architecture, introduce Alibaba Cloud big data service, and solve the scalability of scalable storage and computing resources

Open more big data service nodes on the cloud, and expand the shortage of computing and storage capacity through the flexibility of big data service. Especially for some temporary scenarios, such as 618, Double 11 and other scenarios where the use of resources is relatively tight.

• Replace Vertica database with EMR ClickHouse to improve the efficiency of real-time report data query and save costs

ClickHouse, as an open source product, has been launched in various business scenarios of Internet companies in China on a large scale

• Build a real-time data warehouse system based on Flink+EMR ClickHouse to completely solve the real-time problem of data reports

Solve the real-time problem of data reports, at least reaching the minute level, and reach the second level for reports with special requirements.

Specific optimization scheme of big data cluster

• Decoupling of real-time data warehouse and offline data warehouse

• In IDC big data cluster, completely decouple offline data report resources and real-time report resources

• In the IDC big data cluster, completely decouple the offline data report task and the real-time report task

• Reconstruction of real-time data warehouse

• Migrate Kafka log cluster to Alibaba Cloud

• On Alibaba Cloud, reconstruct real-time data warehouse cluster based on Flink+EMR ClickHouse

• In IDC, migrate the original Storm task to the new real-time data warehouse cluster

• Optimize offline data warehouse

• Optimize and recycle HDP big data cluster resources in IDC to save costs;

• Establish offline data warehouse Hive;

• Start a new data node on Alibaba Cloud, join the offline big data cluster, and expand storage and computing resources;

• Build a new Flume cluster on Alibaba Cloud to drop the original data in KafKa to HDFS storage

Optimized big data cluster architecture

As shown in the figure above, the optimized big data cluster architecture is mainly divided into two parts:

• AliCloud (Real Time), AliCloud is mainly responsible for real-time data processing.

Read rr logs from KafKa, write them to real-time reports through ClickHouse, and read useful data from KafKa to MySQI and PostgreSQL according to business requirements.

• IDC (Offline), IDC is mainly responsible for processing offline data and report business.

Through Flume, the original data in KafKa is fully distributed to the entire HDFS cluster for storage, and then data analysis and data regulation are performed. On the offline big data cluster, the business requirements of offline reports are all run out through Spark tasks, and then the tasks are written back to ClickHouse for the presentation of offline data reports.

4、 More technology exploration and landing in the future

Build a real-time data warehouse integrating streaming and batch based on Flink+Hologres

As we all know, the architecture of Hologres is the separation of storage and calculation. Computing is fully deployed on K8s, and shared storage can be used for storage. HDFS or OSS on the cloud can be selected according to business needs to achieve elastic expansion and contraction of resources, and perfectly solve the concurrency problem caused by insufficient resources. It is very suitable for InMobi's advertising business scenario.

In addition, Flink does ETL processing of stream and batch data, writes the processed data into Hologres for unified storage and query, and enables the business side to directly interface with Hologres to provide online services, greatly improving production efficiency.

This is all the best practices of Inmobi based on Alibaba Cloud's open source big data service.

Related Articles

Explore More Special Offers

  1. Short Message Service(SMS) & Mail Service

    50,000 email package starts as low as USD 1.99, 120 short messages start at only USD 1.00

phone Contact Us