Best Practice of Cross-border Data Warehouse Migration: MaxCompute's Nearline Query Solution Boosts Query Efficiency for Real-time Scenarios

This article introduces how MaxCompute's MaxQA solution significantly boosts query efficiency for real-time scenarios by optimizing architecture and leveraging dedicated resources.

By Alibaba Cloud MaxCompute Team

This series details the migration journey of a leading Southeast Asian technology group from Google BigQuery to MaxCompute, highlighting key challenges and technical innovations. This fifth installment analyzes the performance optimization technologies of cross-border data warehouse migration.

Note: The customer is a leading Southeast Asian technology group, referred to as GoTerra in this article.

1. Business Background and Pain Points

GoTerra, an Indonesian technology group, is one of the most influential Internet giants in Southeast Asia. GoTerra operates across various sectors, including ride-hailing, e-commerce, food delivery, logistics, and financial payments. Before migrating to MaxCompute, GoTerra utilized BigQuery of Google Cloud Platform (GCP) as its core big data platform. BigQuery offers robust capabilities, including SQL syntax compatibility, automated migration support, streaming ingestion, metadata upgrades, and intelligent resource scheduling.

However, three line-of-business projects of GoTerra demand exceptionally low latency:

1) Business intelligence (BI) reporting: GoTerra's customers rely heavily on BI reports for decision-making, requiring fast and stable query responses, typically within 30 seconds.

2) Ad hoc queries of customer services: In multiple real-time customer service scenarios, customer service representatives need to temporarily trace a large amount of historical data based on different customer requirements to address customer inquiries and resolve issues. Query performance directly impacts service quality and customer satisfaction. The previous BigQuery-based solution achieved an average query response time of approximately 5 seconds.

3) Data pipelines: These automated data processing workflows are critical for core business operations, with downstream processes heavily reliant on their output. Strict performance requirements necessitate sub-30-second timeout thresholds for most jobs to ensure efficient and stable execution.

These demanding performance and stability requirements present a significant challenge during the migration to MaxCompute. Matching BigQuery's has become one of the core challenges of the project.

2. Benefits

To meet the demanding latency and stability requirements of MaxCompute users, particularly in matching BigQuery's performance, Alibaba Cloud introduces MaxCompute Query Accelerator (MaxQA), a nearline query solution.

Building upon MaxCompute's core capabilities, MaxQA features significant architectural upgrades and optimizations. By leveraging dedicated control plane components and a query-specific computing resource pool, MaxQA comprehensively optimizes the control plane, connection protocol, query optimizer, execution engine, storage engine, I/O paths, and distributed caching mechanism. These enhancements deliver lower query latency, higher concurrency, and improved system stability, making MaxQA ideal for business scenarios that require faster response speeds and higher concurrency, such as real-time BI report refreshes, interactive ad hoc analysis, and near real-time data warehousing.

Furthermore, MaxQA maintains full compatibility with the MaxCompute SQL dialect. Existing MaxCompute SQL workloads can be seamlessly migrated to MaxQA without code modifications, achieving substantial performance gains. Combined with flexible time-based resource policies and elastic scaling capabilities, MaxQA not only boosts performance but also reduces overall costs, achieving a balance of both.

3. Technical Solution Overview

3.1. Overall MaxQA Architecture

The following figure shows the overall architecture of MaxQA.

The core modules are described below, from top to bottom:

• BI layer: MaxQA is compatible with various BI tools. For more information, see BI Tools.

• Control layer:

TopConsole: provides management capabilities such as MaxCompute project management, quota management (including MaxQA instance management), and tenant management. TopConsole also offers observability features for job O&M, as well as at the MaxQA instance level.
DataWorks: MaxCompute is deeply integrated with Alibaba Cloud DataWorks, enabling scheduled execution of MaxQA ad hoc queries and recurring nodes.

• User interfaces and access layer:

Users can connect to a MaxQA instance by using a client, the SDK for Java, Python, or GO, or Java Database Connectivity (JDBC).
The access layer primarily handles user authentication and forwards job requests to MaxQA instances.

• MaxQA instance layer:

Comprises the control layer, compute layer, and storage layer. The control and compute layers are isolated at the instance level, while the storage layer is shared at the cluster level.

3.2. MaxQA Instance Layer Architecture

A MaxQA instance consists of several machines. Each machine is equipped with hardware resources such as CPUs, memory, and SSDs. The aggregate CPU and memory resources across all machines in a MaxQA instance correspond to the purchased compute unit (CU) capacity.

The SSDs on each machine contribute to both an instance-level distributed cache layer and a memory spill pool:

• Distributed cache layer:

Leverages the SSDs across multiple machines to form a unified cache pool, accelerating access to hot data and improving overall efficiency.

• Memory spill pool:

Provides a mechanism to spill data to SSDs when an SQL operator or shuffle agent encounters memory pressure, preventing out-of-memory (OOM) errors.

A MaxQA instance includes the following core modules:

1. Coordinator:

This component serves as the entry point for the instance and encompasses several key modules:
API layer: handles incoming user requests and responses.
SQLTask module: is responsible for SQL parsing, query plan generation and optimization, permission verification, source table splitting, and operator code generation.
JobDriver module: orchestrates job execution based on the generated execution plan.

2. Pre-launched worker pool:

Upon instance creation, a pool of worker processes is pre-launched on each machine. Each worker integrates the SQL execution engine and storage engine, providing a ready execution environment.
Workers collaborate closely with the Coordinator's JobDriver module, executing the SQL operator directed acyclic graph (DAG) based on the physical execution plan.

3. MaxQAAdmin:

Manages the worker pool and facilitates seamless hot upgrades by launching new workers and gracefully decommissioning older versions, ensuring uninterrupted services.
Allocates worker resources to the Coordinator for job execution based on predefined scheduling policies, optimizing resource utilization.

4. Shuffle agent:

Handles the reading, writing, and transfer of shuffled data between tasks, enabling efficient and reliable distributed data exchange across nodes.

5. Cache agent:

Implements data caching during query execution, storing intermediate results and hot data to minimize access to underlying storage. This significantly improves query performance and reduces I/O overhead while maintaining data consistency.

3.3. MaxQA Usage

MaxCompute quotas are organized into two levels: Level 1 and Level 2.

• Level 1 quotas: These represent the top-level resource allocation and are categorized by billing method:

Pay-as-you-go: Billing is based on actual usage. Suitable for unpredictable workloads or scenarios with fluctuating demands.
Subscription: Users purchase a fixed amount of resources for a specific duration, often at a lower cost. Ideal for stable and predictable workloads.

• Level 2 quotas: Subscription-based Level 1 quotas can be further subdivided into Level 2 quotas:

Batch processing quotas: designed for periodic, large-scale data processing tasks such as extract, transform, and load (ETL) and offline computing.
Interactive quotas: intended for interactive workloads that require low latency, such as BI reporting and ad hoc queries.

The following figure shows the job submission processes for different types of quotas.

After a job is submitted to the specified quota group, MaxCompute routes it to the appropriate resource pool or instance based on the quota type:

• Subscription quotas:

Batch processing quotas: Jobs are submitted to MaxCompute's serverless resource pool.
Interactive quotas: Each interactive quota group corresponds to a dedicated MaxQA instance where jobs are executed.

• Pay-as-you-go quotas:

Batch processing quotas: Jobs are submitted to MaxCompute's serverless resource pool.
Interactive quotas: All users' interactive quota groups share a single, MaxCompute-managed MaxQA instance where jobs are executed.

For more information, see the "MaxQA connection methods" section of the MaxQA Operation Guide topic on the Alibaba Cloud official website.

3.4. MaxQA Job Execution Process

The following figure shows the differences in job processing between batch processing quota groups and interactive quota groups.

The preceding figure shows the differences in job processing between batch processing quota groups and interactive quota groups within MaxQA, depicted by black and orange lines respectively. The following sections describe the two job processes.

3.4.1 Job Process of a Batch Processing Quota Group

1. Submit an SQL request: After user authentication is complete in the frontend, an SQL request is forwarded to the framework service of the control cluster.

2. Implement request scheduling and throttling: The framework service executes logic such as throttling and queuing, and selects an arbitrary process from the SQLTask service for processing.

3. Generate an execution plan and submit a job: SQLTask parses the SQL request, generates an execution plan, and then submits an SQL deployment to the compute cluster.

4. Start JobMaster to drive job execution: The compute cluster starts JobMaster to apply for worker resources in the serverless environment and drive the job execution.

5. Obtain query results from the client: The client sends a result query request.

6. Return request routing and results: The frontend forwards the result request to the framework service or Tunnel service. Then, the framework service or Tunnel service reads the query results and returns the results to the client.

3.4.2. Job Process of an Interactive Quota Group

1. Submit an SQL request: After user authentication is complete in the frontend, an SQL request is directly forwarded to the coordinator module of the corresponding MaxQA instance.

2. Generate an execution plan and drive job execution: The coordinator module generates an execution plan and drives the job execution in the isolated MaxQA environment.

3. Directly return query results (small result set): If the job is a query job and the result set is small, the query results are directly encapsulated in the response of the client and are returned without additional intermediate forwarding steps.

Process dimension	Batch processing quota (black line)	Interactive quota (orange line)
Job type	Periodic and large-scale tasks	BI reports and ad hoc queries
Resource pool	Serverless resource pool	MaxQA instance (single-tenant isolation)
Processing path	Frontend → Control cluster → SQLTask → Compute cluster	Frontend → MaxQA instance
Result return	Multiple steps are performed to return results.	Small results are directly returned.
Response latency	Higher	Lower

3.5. Core Optimization Items of MaxQA

MaxQA uses the exclusive control layer and computing resource pool for queries to optimize the query access methods, interaction protocol, query optimizer, execution engine, storage engine, I/O paths, and distributed cache mechanism. The following sections describe the core optimization items.

3.5.1. Exclusive Access Layer and Isolated Computing Resource Pool

• Single-tenant isolation

An interactive quota group corresponds to an independent MaxQA instance. Each MaxQA instance has an exclusive access layer and a computing resource pool. This implements complete resource isolation in the architecture dimension and prevents mutual interference in multi-tenant environments.

• Query in direct connection mode

Based on single-tenant isolation, MaxQA optimizes query access paths. Requests are directly forwarded from the frontend to the coordinator module of the corresponding MaxQA instance. This greatly reduces the complexity and latency of path queries.

• More simplified interaction protocol

If the execution time of a query is less than 5 seconds and the size of the result set to be queried is 10 MB, only one request needs to be sent to submit the query and return the query result.

• Asynchronization of time-consuming paths

Synchronous write operations on instance metadata and task metadata is optimized to asynchronous write operations, which makes it faster to enter key processing steps. The non-critical operations after a job is successfully run (such as generating a summary and use LogView to display generated results) are optimized to asynchronous processing in the background. This does not occupy end-to-end (E2E) time of queries.

3.5.2. Engine Optimization

• Better concurrency calculation method

Calculate the concurrency of each vertex based on the available resources of MaxQA instances to minimize the scheduling overheads and idle resources caused by concurrency and resource misallocation.

• Memory shuffling mode preferentially used

You can only use the shuffle agent in the current MaxQA instance to shuffle data, which has higher stability.
The memory shuffling mode is preferentially used. If the memory is insufficient, the SSDs of the current machine are preferentially used. If the SSD space is insufficient, the distributed file system in the cluster is used.

• SQL operator memory preferentially spilled to I/O of the current machine

If SQL operator execution runs out of memory, the system first performs a spill operation to write data to an SSD of the current machine. If the SSD space is insufficient, the system performs a spill operation to write data to the distributed file system.

• Adaptive execution mode selection

The faster execution mode or compression algorithm with a higher compression rate can be adaptively selected based on the estimated execution time.
Codegen execution or column-oriented execution can be adaptively selected based on the amount of data processed.

• Downstream read-ahead mode

If the upstream output data is accumulated to a specific amount and no other jobs are to be executed, descendant nodes are started in advance and start to read the upstream output data. This reduces the E2E latency.

• Better control layer

SQLTask and JobMaster are integrated into the coordinator module to reduce component communication overheads and scheduling latency.
Split operations on source tables, operator Codegen processes, and resource initialization operations can be asynchronously executed.

• End-to-end caching

Metadata caching: caches metadata such as table schemas, partition information, and statistics information. This reduces frequent access to remote metastores and accelerates query plan construction.
Authentication caching: caches the permission verification results of the same user on tables, fields, resources, and user-defined functions (UDFs). This prevents repeated authentication operations within a short period of time.
Caching of split results of source tables: caches split results of source tables. This prevents repeated scan of the file system and improves split building efficiency.
Caching of operator Codegen results: caches the dynamic libraries generated by SQL operator Codegen. This prevents repeated compilation of the same operator.
Caching of source table data: The size and access frequency of source tables are adaptively identified. If a source table of an appropriate size is read multiple times, the source table is stored in the cache. This reduces repeated access to the distributed file system of the cluster and improves stability while shortening latency.
Query result caching: caches query output results. If a query is executed again, the cached query result is directly returned. If the following information appears in the job details of LogView, the query result cache is hit.

Caching of shuffled results: intelligently analyzes the repeated shuffling process of different jobs in a MaxQA instance and caches the shuffled data. The shuffled data can be shared among different jobs. This reduces repeated computing.
Resource caching: caches the resource files on which UDFs and handlers for external tables depend. This prevents repeated download operations.

3.6. Flexible Elastic Resource Features

3.6.1. Scheduled Scaling Plan

If interactive quota groups of users are mainly used to run BI reports and ad hoc query jobs, such tasks usually have obvious time series peaks and valleys. Purchased subscription quota groups are often configured based on the maximum demand of batch jobs. During off-peak hours, idle resources and low utilization often occur.

You can configure time-based scaling to flexibly allocate quotas of interactive quota groups and batch processing quota groups based on different periods of time. This implements dynamic resource scheduling in the time dimension, thereby efficiently utilizing the overall resources, reducing redundancy costs, and improving resource utilization.

As shown in the following figure, the blue line indicates the number of CUs of a purchased subscription level-1 quota group. The orange square indicates the CU quota of an interactive quota group, whereas the green square indicates the CU quota of a batch processing quota group. After time-based scaling is configured, the system automatically adjusts the number of CUs of the interactive quota group and batch processing quota group based on the preset period of time. This realizes intelligent scheduling and efficient utilization of resources within different periods of time.

3.6.2. Elastic CU

A scheduled scaling plan is suitable for scenarios in which purchased subscription quota groups have sufficient CUs. Flexibly adjusting resource allocation within different periods of time improves the overall resource utilization.

If CUs are insufficient but there are higher resource requirements within specific periods of time, you can use the elastic CU feature. You can use the feature to automatically and dynamically increase CUs within the specified period of time. You are only charged for the temporarily generated CUs. This meets the computing requirements during peak hours without increasing long-term costs as well as balances the flexibility and economy of resource usage.

As shown in the following figure, the blue line indicates the number of CUs of a purchased subscription level-1 quota group. The orange square indicates the CU quota of an interactive quota group, whereas the cyan square indicates the number of elastic CUs. After an elastic scaling policy with a preset period of time is configured, the system automatically generates elastic CUs at the specified point of time and adjusts the overall resource allocation. This implements flexible scheduling and on-demand increase of resources within different periods of time.

3.6.3. Auto Scaling

Both the scheduled scaling and elastic CU features depend on the effective time period that you configure in advance. Unexpected business growth requirements may fail to be responded in a timely manner due to insufficient reserved resources, resulting in task accumulation and queuing.

To resolve the preceding issues, the auto scaling feature is introduced. You can use this feature to detect load changes in MaxQA instances in real time and automatically trigger resource scaling based on the preset scaling rule. This helps you quickly respond to unexpected resource requirements. You are charged based on the actual usage duration of CUs. This implements on-demand increase of resources and precise cost control while ensuring the stability and timeliness of job execution.

As shown in the following figure, the CU usage curve does not exceed the maximum CU of the interactive quota within most periods of time. However, within some specific periods of time, the CU usage obviously increases and exceeds the current quota. After the auto scaling feature is used, the system automatically generates CU resources during peak hours to meet unexpected resource requirements.

✅ Note: The auto scaling feature is not available now and will be coming soon.

4. Business Values

MaxQA helps GoTerra smoothly migrate multiple business lines from GCP BigQuery to MaxCompute. Throughout the migration process, GoTerra users have zero awareness of underlying architecture changes. Business continuity is not affected by the migration, and the experience remains stable.

After MaxQA is used, the performance of GoTerra's core business scenarios is significantly improved. The overall query efficiency is doubled. This demonstrates the technical advantages of MaxQA in high performance, low latency, and high stability. This also reflects the implementation capabilities and user experience assurance capabilities of MaxQA in large-scale and complex business migration.

5. Future Outlook

In terms of performance improvement, MaxQA will upgrade the multi-threaded pipeline architecture of the execution engine and use parallel processing and pipeline processing of tasks to significantly improve the query efficiency and resource utilization. Scheduling overheads are further reduced. MaxQA better meets the business requirements of high concurrency and low latency.

In terms of stability assurance, MaxQA will build the capability of proactively identifying abnormal jobs and isolate abnormal jobs in a timely manner after they are detected. This prevents the abnormal jobs from affecting the availability of all MaxQA instances. In addition, MaxQA will support the transparent fallback mechanism. If an exception occurs, MaxQA automatically switches the abnormal task to a serverless resource pool. This minimizes the impacts on user business continuity and job performance.

In terms of intelligent optimization, MaxQA will introduce the intelligent data prefetching feature. This feature allows you to load hot data to the cache in advance based on the historical status of jobs in the MaxQA instance. At the same time, the system will improve the recognition accuracy of shuffled data cache and efficiently utilize distributed cache resources to reduce I/O pressure and improve the overall execution efficiency.

With these continuous optimization items, MaxQA will not only improve the query performance and system robustness, but also provide users with more intelligent, efficient, and stable all-in-one big data query experience, thereby helping enterprises achieve more agile data-driven decision-making.

Community

Best Practice of Cross-border Data Warehouse Migration: MaxCompute's Nearline Query Solution Boosts Query Efficiency for Real-time Scenarios

1. Business Background and Pain Points

2. Benefits

3. Technical Solution Overview

3.1. Overall MaxQA Architecture

3.2. MaxQA Instance Layer Architecture

3.3. MaxQA Usage

3.4. MaxQA Job Execution Process

3.4.1 Job Process of a Batch Processing Quota Group

3.4.2. Job Process of an Interactive Quota Group

3.5. Core Optimization Items of MaxQA

3.5.1. Exclusive Access Layer and Isolated Computing Resource Pool

3.5.2. Engine Optimization

3.6. Flexible Elastic Resource Features

3.6.1. Scheduled Scaling Plan

3.6.2. Elastic CU

3.6.3. Auto Scaling

4. Business Values

5. Future Outlook

Read previous post:

Read next post:

Alibaba Cloud Community

You may also like

Comments

Alibaba Cloud Community

Related Products

Big Data Consulting for Data Technology Solution

MaxCompute

Big Data Consulting Services for Retail Solution

Hologres