Use scenarios of Serverless StarRocks - E-MapReduce - Alibaba Cloud Documentation Center

Serverless StarRocks can meet various analysis requirements of enterprise users. This topic describes the use scenarios of Serverless StarRocks and solutions provided by Serverless StarRocks.

Scenarios

OLAP multidimensional analysis

User behavior analysis
User persona analysis, tag analysis, and target user identification
High-dimensional business metric reporting
Self-service reporting platform
Business problem identification and analysis
Cross-theme business analysis
Financial reporting
System monitoring analysis

Real-time data warehousing

Data analysis for e-commerce promotion activities
Result analysis for live streaming in the education industry
Waybill analysis in logistics
Performance analysis and metric calculation in the financial industry
Advertising analysis
Cockpit management
Application performance management (APM)

High-concurrency queries

Report analysis for advertisers
Analysis of sales channel-related personnel in retailing
Client-based reporting in the software as a service (SaaS) industry
Multi-page analysis on dashboards

Unified analysis

An all-in-one system is used to provide various features, such as multidimensional analysis, high-concurrency queries, pre-computing, and real-time analysis and queries. This decreases the system complexity and reduces costs in development and O&M of multiple technology stacks.
StarRocks is used to manage data lakes and data warehouses in a centralized manner. You can store business data that has high requirements on concurrency and timeliness in StarRocks for analysis. You can also use external catalogs and external tables to analyze data in data lakes.

Solutions in typical scenarios

Solution in common OLAP scenarios

This solution can be used in various business scenarios, such as scenarios related to gross merchandise value (GMV), orders, logistics, customer analysis, recommendation systems, and user personas. You can import data in offline or real-time mode for processing.
The original solution uses multiple OLAP engines to meet requirements in different scenarios. The OLAP engines use the Silo structure, which increases the complexity of O&M and requires a lot of time and effort in O&M.
A unified OLAP engine is used to meet various analysis requirements. This solution uses the MySQL protocol to connect to various BI tools. This helps you quickly analyze and process data. Compared with the original solution, this solution simplifies O&M.

Real-time data analytics solution

Procedure:

Real-time ingestion: Kafka data is directly read. A Flink connector is used to write Flink data streams and ensure exactly-once semantics. The Flink change data capture (CDC) connector is also used to capture updates to transaction processing data and store the update results in StarRocks in real time.
Data analytics: The data that is generated in the real-time data analysis process can be used for serving. This implements the integration of real-time and offline data.
Real-time data modeling: Real-time data modeling aggregation tables are provided to support real-time aggregation. Powerful engines and optimizers ensure high efficiency in real-time data modeling of databases.
Real-time update: The real-time update policy delete-and-insert is used. When you read data, you do not need to merge data that uses the same primary key. The performance of the delete-and-insert policy is 3 to 15 times higher than the performance of the merge-on-read (unique) policy.

Data lakehouse analytics solution

Query layer: The cost-based optimization (CBO) feature and query engine capabilities of StarRocks are used. This way, the query and computing performance is 3 to 5 times higher than that of Trino.
Metadata management:
- Supports multi-catalog management, seamless integration with the Hive Metastore service (HMS), and custom catalogs to facilitate connection with data lake formation services of cloud vendors.
- Supports standard formats such as Parquet, ORC, and CSV, and implements late materialization and read and write of merged small files.
- Supports various data lake formats, such as Hudi, Iceberg, Delta Lake, and Paimon.

Procedure:

Real-time ingestion: The details of underlying data sources are masked. Joint analysis of data from heterogeneous data sources is supported. Joint analysis of real-time and offline data is also supported.
Query acceleration: Policies of computing close to end users, such as expression pushdown and aggregation pushdown, and policies of optimizing the distributed read mode and data sources are used. Technologies, such as vectorized interpretation of data in the ORC or Parquet format, dictionary-based filtering, and late materialization, are supported.
Test results: TPC-H and Hive query tests are performed. Compared with the performance of Presto, the performance is improved by more than 3 to 5 times under the same conditions. The same performance experience can be achieved by using only 1/3 of Presto resources.