What is the stream processing and data warehouse solution for a game development company that is developing historical role-playing games - Lindorm

Solution highlights: millions of transactions per second (TPS) at peak hours for stream processing and 10 times higher performance than Greenplum.

Challenges

Some malicious users exploit credit card vulnerabilities or other vulnerabilities, so that they can pay less or no money to make in-game purchases on behalf of other users. To identify and block these illegal transactions, the system must query diverse types of log records such as logon records and in-game purchase records in real time.

Due to rapid business development, the number of behavioral log records rapidly increases. To mine the value of the data, the company needs to analyze a large volume of clickstream log records and activation log records. For example, the analysis results can be used to help the company estimate the ad conversion rate, activation rate, and costs spent on each user who installs its game.

The company previously used Greenplum for stream processing and statistical analysis. However, Greenplum cannot meet the requirements caused by fast business development due to the following limits:

1. The architecture of Greenplum is not suitable for processing increasingly complex computing tasks.

2. An upper limit is imposed on the number of partitions in a single table. In addition, Greenplum does not provide full support for multi-level partitioning. Therefore, Greenplum is not suitable for scenarios in which log records need to be permanently retained and the amount of log data in a single table is excessively large. If a single partitioned table stores an excessively large amount of data, the query performance of Greenplum may fail to meet business requirements.

3. Data is redistributed each time the capacity of Greenplum is expanded. This consumes a large amount of time.

4. Greenplum is not suitable for processing unstructured data.

Solutions

The company uses the Phoenix SQL service of Lindorm and the serverless Spark engine of Data Lake Analytics (DLA) to build a stream processing and data warehouse solution.

Phoenix SQL offers online data query capabilities. The serverless Spark engine provides capabilities such as those for stream processing and complex data analysis. Phoenix SQL and the serverless Spark engine of DLA are used together to meet the business requirements of the company.
Business data is sorted and divided into different layers before the data is stored in Phoenix SQL and the serverless Spark engine.

1. Operation data layer: stores raw log records generated by the mobile game clients, user center, advertisement monitoring, and game servers.

2. Data detail layer: stores the data that is processed at the operation data layer. At the operation data layer, services such as Spark Streaming are used to remove dirty data, deduplicate data, and format data before the data is stored to Phoenix SQL. Alternatively, Java Database Connectivity (JDBC) can be used to write data to Phoenix SQL.

3. Data aggregation layer: Data at the data detail layer can be periodically (such as daily or hourly) associated with dimension tables, filtered, aggregated, or processed by using other extract, transform, load (ETL) operations. Then, the result data is written to the destination at the data aggregation layer. Data at the data aggregation layer can be used for subsequent complex data analysis.

4. Application data layer: Phoenix SQL can be used to run external queries on the data at the application data layer. This helps you implement big data risk controls, ad recommendations, and fine-grained operations at the upper layer. After data is processed at the data aggregation layer, the result data is written to Phoenix SQL so that the data is available for external queries.

Benefits

Identifies and blocks illegal in-game purchases that are made on behalf of other users within a few milliseconds: Phoenix SQL of Lindorm supports highly concurrent real-time reads and writes. It supports millions of TPS and hundreds of thousands of concurrent requests. In addition, it can return the result for a simple query in only a few milliseconds.
Excels in computing performance due to the serverless Spark engine: For a job that involves the analysis of a large amount of data, the columnar storage format used by Apache Parquet provides 10 times higher performance than a Greenplum cluster. This helps the company meet computing requirements, such as the computing of the ad conversion rate, activation rate, and costs on each user that installs the game. This way, the company can perform data-driven and fine-grained operations.
Offers a one-stop solution: The native Spark service allows users to execute SQL statements to read data from Phoenix SQL. The native Spark service also provides features such as column pruning, predicate pushdown, and partition pruning for performance optimization. After the data stored in an online database has been extracted, transformed, and loaded, the company can archive the data to Spark data warehouses in an efficient manner.
Allows the company to focus on its business development: The fully managed Spark service ensures that jobs can run as expected and minimizes the workload of O&M personnel. Lindorm also provides a workbench that allows the company to reduce the costs of managing Spark jobs.