This topic describes the architecture of E-MapReduce (EMR) Serverless StarRocks.
Architecture of EMR Serverless StarRocks
The architecture of EMR Serverless StarRocks consists of the following modules from the bottom up:
Storage layer:
Compute-storage integration: StarRocks internal tables are stored in the StarRocks Table format in cloud disks or local disks.
Compute-storage separation: StarRocks internal tables are stored in the StarRocks Table format in data lake storage services, such as Object Storage Service (OSS) and Hadoop Distributed File System (HDFS).
Data lake analysis: StarRocks external tables are used to read Hive data or data in a lake format from data lake storage services such as OSS and HDFS in the Data Lake Table format.
StarRocks instances:
All instances are hosted on the cloud and are O&M-free. Each instance consists of frontend (FE) nodes and backend (BE) nodes or compute nodes (CNs).
Warehouses are used to implement flexible configuration and isolation of resources.
Scalability can ensure the use of resources at low costs.
A cache mechanism can be used to significantly accelerate data queries in compute-storage separation or data lake analysis scenarios. In addition, the built-in cache management feature can help you efficiently optimize performance in data caching.
Product capabilities:
Instance O&M: StarRocks provides instance management features that require no O&M. This improves the O&M efficiency and system stability. The instance management features include resource and configuration management, alerting, automatic generation of health reports, and automatic upgrade.
Data O&M: Out-of-the-box data management capabilities are provided, including visualized code development in an SQL editor, task import, slow queries, data audit, metadata management, and permission configuration.
The preceding product capabilities allow you to focus on your own business applications, such as operations analysis, user personas, self-service reports, order analysis, and user report generation.
StarRocks system architecture
The StarRocks system architecture consists of only FE nodes and BE nodes or CNs. This facilitates deployment and maintenance. Nodes can be added online. Replication mechanisms are provided for metadata and business data to prevent single points of failure (SPOFs). StarRocks supports the MySQL protocol and standard SQL syntax. You can use a MySQL client to query and analyze data in StarRocks.
With the development of StarRocks, the system architecture has evolved from compute-storage integration to compute-storage separation.
In versions earlier than StarRocks 3.0, the compute-storage integration architecture is used, in which BE nodes are used for data storage and computing, and all data access and analysis operations are directly performed on local nodes to ensure fast response to query requests.
In StarRocks 3.0 and later, the compute-storage separation architecture is used, which changes the data storage method. Original BE nodes are upgraded and transformed into stateless CNs, and the data is persistently stored in a remote object storage service or HDFS. In the new architecture, the local disks of CNs are mainly used to cache the data that is frequently accessed to improve the query speed. The compute-storage separation architecture supports the dynamic addition or removal of CNs, which implements flexible and efficient scaling.
The following figure shows the evolution from the compute-storage integration architecture to the compute-storage separation architecture.

The images and specific information in this topic are from
Architecture of open source StarRocks.
Compute-storage integration
As a Massively Parallel Processing (MPP) database, StarRocks uses the compute-storage integration architecture in versions earlier than StarRocks 3.0. In this architecture, BE nodes are used for data storage and computing. During a query request, on-premises data can be directly read for computing. This significantly improves the query speed and effectively minimizes the data transmission and copying latency. In addition, this architecture supports multi-replica data storage. This enhances the capabilities to process high-concurrency queries and ensures data reliability. This architecture is suitable for scenarios that pursue optimal query performance.
In the compute-storage integration architecture, StarRocks consists of two types of nodes: FE and BE.
FE
FE nodes are used for metadata management, client connection management, and query planning and scheduling. Each FE node retains a complete replica of metadata in memory to ensure service consistency.
Role | Metadata management | Leader election | Description |
Leader | Read and write | Automatically elected | After the leader FE node reads and writes metadata, the leader FE node uses Berkeley DB Java Edition (BDB JE) to synchronize the metadata changes to follower and observer FE nodes. The leader FE node is elected from follower FE nodes. If the leader FE node fails, other follower FE nodes will start another round of leader election. |
Follower | Read | Participating in leader election | Follower FE nodes can only read metadata. They asynchronously synchronize data based on metadata logs of the leader FE node. Follower FE nodes participate in leader election, which requires more than half of the follower FE nodes in a cluster to be active. The leader election operation is performed based on BDB JE. |
Observer | Read | Not participating in leader election | Observer FE nodes have the same read permissions as follower FE nodes. They also asynchronously synchronize data based on metadata logs of the leader FE node, but do not participate in leader election. Observer FE nodes are mainly used to increase the query concurrency of a cluster, and do not impose additional election burdens on the cluster. |
BE
BE nodes are used for data storage and SQL execution, and adopt local storage and multi-replica mechanisms to improve system availability.
Data storage: BE nodes provide equivalent data storage capabilities. FE nodes distribute data to each BE node based on predefined policies. Then, the BE nodes convert the data into data in a storable format and generate indexes for the data.
SQL execution: For an SQL query, FE nodes parse it into a logical execution plan based on the semantics of the query, and then transform the logical plan into physical execution plans that can be executed on BE nodes. Then, BE nodes that store the destination data execute the query. This eliminates the need for data transmission and copy and substantially boosts query performance.
Despite the fact that the compute-storage integration architecture has significant advantages in query performance, it also has the following limits:
High costs: To ensure the reliability of data, BE nodes must use a multi-replica mechanism, especially a triplicate mechanism. This may cause a waste of computing resources when storage resources are continuously expanded due to the increase in the data volume.
Complex architecture: The maintenance on multiple replicas requires high data consistency, which makes the compute-storage integration architecture more complex and increases the difficulty of management and maintenance.
Insufficient scalability: In the compute-storage integration architecture, scaling operations will cause data re-balancing, which may result in unsatisfactory user experience.
Compute-storage separation
The compute-storage separation architecture of StarRocks is to decouple computing resources from storage resources. In this architecture, data is persistently stored in cost-effective, reliable remote object storage services such as OSS, or HDFS. The local disks of CNs are mainly used to cache the data that is frequently accessed to accelerate queries. If the local cache is hit, the compute-storage separation architecture can provide a query speed comparable to that of the compute-storage integration architecture.
In the compute-storage separation architecture, you can dynamically add or remove CNs to implement scaling within seconds. This effectively reduces the costs of data storage and resource expansion, and facilitates resource isolation and auto scaling of computing resources. The compute-storage separation architecture is similar to the compute-storage integration architecture in the node composition. The entire system consists of two types of nodes: FE and CN. The only difference is that you must specify a backend storage service.
In the compute-storage separation architecture, FE nodes remain the same functionality. BE nodes are upgraded to stateless CNs. CNs cache only hot data, and are used for tasks such as data import, query processing, and management of cached data.
Storage
The following backend storage solutions are supported for the compute-storage separation architecture of StarRocks. You can select a solution based on your business requirements.
Alibaba Cloud OSS
HDFS, which is deployed in a self-managed Hadoop cluster or an Alibaba Cloud EMR DataLake cluster
The data file format in the compute-storage separation architecture remains consistent with that in the compute-storage integration architecture. Various indexing technologies are supported, and metadata such as tablet meta information is redesigned to better adapt to the object storage environment.
Cache
To optimize query performance, StarRocks provides a multi-tiered data cache system. Hot data is stored in memory to ensure quick access. Warm data is stored in local disks. Cold data is stored in a remote object storage service. Data flows in the three storage layers based on the access frequency.
In a query request, hot data is directly obtained from the cache, and cold data needs to be read from the backend object storage service and locally cached to accelerate subsequent access. StarRocks provides a multi-tiered data access system that encompasses memory, local disks, and remote storage. You can specify rules for hot and cold data to meet various business requirements. This achieves high-performance computing and cost-effective storage.
You can determine whether to enable the caching feature when you create tables. If the caching feature is enabled, data is simultaneously stored in the local disk and backend object storage service during the writing process. During queries, CNs first read data from the local disk. If the local cache is not hit, the CNs obtain raw data from the backend object storage service and cache the raw data to the local disk to accelerate subsequent access. StarRocks optimizes the access to cold data that is not cached based on policies, such as read-ahead and parallel scanning, in combination with the access mode of applications. This reduces the frequency of access to remote object storage services and further improves query efficiency.