All Products
Search
Document Center

ApsaraDB for SelectDB:Architecture

Last Updated:Jan 06, 2025

ApsaraDB for SelectDB is a modern real-time data warehousing service that is developed based on Apache Doris. ApsaraDB for SelectDB uses a new cloud-native architecture that supports compute-storage separation. This topic describes the architecture and basic principles of ApsaraDB for SelectDB.

Architecture diagram

image

Components

Application systems or clients

An application system or a client is a service or tool that you use to access ApsaraDB for SelectDB. ApsaraDB for SelectDB is compatible with the MySQL connection protocol and standard SQL syntax. You can use tools such as the MySQL CLI, Java Database Connectivity (JDBC) drivers, Open Database Connectivity (ODBC) drivers, and visualization tools to access ApsaraDB for SelectDB instances.

Note

To reduce the impact of network latency and instability, we recommend that you deploy your application or client in the same region as that of your ApsaraDB for SelectDB instance.

Instances

An ApsaraDB for SelectDB instance is the basic unit that you can purchase and use to manage resources in ApsaraDB for SelectDB. After you purchase an ApsaraDB for SelectDB instance, the resources of the instance and related clusters belong to your account. ApsaraDB for SelectDB uses a cloud-native architecture that supports compute-storage separation. The architecture includes components such as instances, clusters, and storage. An instance is used to receive requests and contains a group of frontends (FEs). A cluster is a distributed system that processes requests and contains a group of backends (BEs). Object Storage Service (OSS) is used as the storage system for data storage. The FEs included in an instance are hosted by ApsaraDB for SelectDB and can be scaled based on your business requirements. You do not need to manage the FEs. In this compute-storage separation architecture, resources of multiple ApsaraDB for SelectDB instances are physically isolated. This way, the instances can meet the requirements in business scenarios that are completely independent or differ in sensitivity.

The following section describes how an ApsaraDB for SelectDB instance process read and write requests:

  • Write request: To initiate a write request in ApsaraDB for SelectDB, you can use the write interface provided by ApsaraDB for SelectDB or an existing import tool. After the request is received, the instance forwards the request to the specified cluster. The specified cluster processes the write request to write data to an OSS bucket and cache. After the data is persisted in the OSS bucket, a message that indicates the write request is successfully processed is returned.

  • Query request: You can execute an SQL statement to initiate a query request in ApsaraDB for SelectDB. After the request is received, the instance parses the SQL statement in the request, uses an intelligent optimizer to generate an efficient query execution plan, and then forwards the request to the specified cluster. The specified cluster performs massively parallel processing (MPP) on the query request, reads data from an OSS bucket or cache based on your business requirements, and then returns the results over the MySQL protocol after the query is complete. When the cluster performs a query, the cluster uses the pipeline execution architecture and technologies such as indexing, caching, and vectorization to accelerate the query. This allows you to experience the enhanced data analysis performance of ApsaraDB for SelectDB. The following figure shows how ApsaraDB for SelectDB processes a query request.

    image

Cluster

A cluster in ApsaraDB for SelectDB is a distributed system that contains one or more BEs. Each BE is equipped with computing resources and cache resources. In the compute-storage separation architecture, a long period of time is required to access OSS. In this case, caching is introduced to accelerate data access. ApsaraDB for SelectDB supports multi-level caching mechanisms such as in-memory and disk-based caches. Clusters can be flexibly scaled. During the scaling, caches are prefetched and migrated to ensure smooth analysis.

ApsaraDB for SelectDB supports the multi-cluster architecture. An instance can contain multiple clusters, which are similar to compute queues or groups in a classic distributed architecture. Multiple clusters in the same instance have the following features:

  • Data sharing: Multiple clusters share and access the underlying OSS data. This eliminates the need for redundant data storage.

  • Computing isolation: The computing and cache resources of multiple clusters are completely independent. You can use these clusters to isolate different workloads. You can purchase computing and storage resources of different specifications for different clusters based on your business requirements. Data is cached based on its access characteristics.

  • Simultaneous data reads and writes: Read and write operations can be performed in parallel in the multi-cluster architecture. After data is written, the written data can be immediately queried in all clusters.

Based on the preceding features, the multi-cluster architecture is commonly used in scenarios in which you want to isolate data reads from data writes, isolate online data from offline data, or isolate the production environment from the test environment.

Storage

ApsaraDB for SelectDB uses the highly reliable and cost-effective OSS service as the storage system for persistent data storage. Due to the inherent high reliability of OSS, ApsaraDB for SelectDB does not need to maintain data replicas in the distributed data warehouse system. By capitalizing on the affordability of OSS, the per-unit storage cost of SelectDB is reduced by more than 90% compared to traditional data warehousing solutions.

When you use ApsaraDB for SelectDB, you do not need to reserve storage resources. You are charged for storage resources on a pay-as-you-go basis. You can also use storage plans to further reduce your storage costs.

To enhance analysis performance, the storage and compute systems of ApsaraDB for SelectDB are deeply integrated with each other.

  • Data organization: To improve the access efficiency of data, ApsaraDB for SelectDB organizes its underlying data in a fine-grained manner.

    • Data partitioning: Data is divided by time or hash value. This way, data is scattered to make full use of the processing power of a distributed cluster. This also facilitates data pruning during data queries.

    • Hybrid row-column storage: The default column store supports efficient analysis of large amounts of data. The row store can be used based on your business requirements to support high-performance point queries.

    • Extensive indexes: You can use various indexes and filter conditions to precisely locate data. This way, orders-of-magnitude improvements in query performance are made.

  • Data models: ApsaraDB for SelectDB provides data models optimized for typical data analysis scenarios.

    • Unique models: These models are suitable for scenarios that require unique primary keys or efficient updates. For example, you can use unique models in data analysis scenarios, such as analysis of e-commerce orders and user attribute data.

    • Aggregation models: These models are suitable for scenarios in which all original data records are retained. For example, you can use aggregation models in detailed data analysis such as log analysis and bill analysis.

    • Duplicate models): These models are suitable for aggregation statistics scenarios that use pre-aggregation to improve query performance. For example, you can use duplicate models in data analysis scenarios such as website traffic analysis and custom reports.

External ecosystem

ApsaraDB for SelectDB can be integrated with third-party data sources and visualization tools within the external ecosystem. This helps you improve the data analysis performance.

  • Various data import tools: You can use these tools to import data from various data sources, such as Alibaba Cloud data sources and self-manage data sources, to ApsaraDB for SelectDB. ApsaraDB for SelectDB provides stable, efficient, and easy-to-use data integration solutions. For more information, see Data import tools.

  • Abundant visualized data integration tools: ApsaraDB for SelectDB can be seamlessly integrated with MySQL-compatible visualization tools to greatly improve the efficiency of data development and visualized analysis. For more information, see Data visualization.

  • Federated queries: ApsaraDB for SelectDB can be integrated with external data lakes and databases based on the federated query capability. ApsaraDB for SelectDB also supports data reads and writes, provides you with data analysis experience based on data lakehouse capabilities, and reduces the resource and maintenance costs on your data analysis technology stack. For more information, see Lake warehouse.