×
Community Blog Best Practice of Cross-border Data Warehouse Migration: Resource Consumption Reduced by 50% Based on Performance Optimization Technologies

Best Practice of Cross-border Data Warehouse Migration: Resource Consumption Reduced by 50% Based on Performance Optimization Technologies

This article introduces how MaxCompute reduced resource consumption by 50% through performance optimization technologies during a cross-border data warehouse migration.

By Alibaba Cloud MaxCompute Team

This series details the migration journey of a leading Southeast Asian technology group from Google BigQuery to MaxCompute, highlighting key challenges and technical innovations. This third installment analyzes the performance optimization technologies that MaxCompute uses to support this cross-border data warehouse migration.

Note: The customer is a leading Southeast Asian technology group, referred to as GoTerra in this article.

Background Information

As a core participant in the digital economy of Southeast Asia, GoTerra requires its data architecture to support transactions, high-concurrency risk control, and cross-region logistics scheduling for hundreds of millions of users. Although the original BigQuery data warehouse has powerful analysis capabilities, the costs prominently increase as the business scale exceeds specific thresholds. To migrate its data warehouse from BigQuery to MaxCompute, GoTerra expects to reduce costs and improve efficiency by remodeling the underlying architecture and optimizing performance. After technical issues such as SQL syntax conversion and storage formats are resolved, performance optimization has become a critical bottleneck of the migration project and presents three core challenges:

  • Complex business scripts: The project involves more than 10,000 production-level SQL scripts, covering core business lines such as payment, logistics, and risk control. These scripts in various modes have different performance objectives and cost requirements, and therefore are difficult to optimize.
  • Challenging incremental features: MaxCompute provides more than 600 new features, such as Append 2.0 and UNNEST, during the migration process. These incremental features make optimization more complex.
  • Tight delivery timeline: All business data must be smoothly migrated to MaxCompute within less than four months in compliance with a high-standard service level agreement (SLA), resulting in low fault tolerance.

Performance Optimization Methodology: From Extensive Optimization to Precise Optimization

In response to the preceding challenges, the migration team abandons the traditional extensive optimization strategy and establishes a data-driven, hierarchical governance-based optimization framework. The core logic is to invest limited resources in critical bottlenecks by using intelligent classification. Based on the Pareto principle, 80% of issues or resources are generated or consumed by 20% of queries, which are high-frequency or inefficient queries. To accurately identify issues, an automated classification tool is used to classify queries into the following types:

  • High-frequency queries, with the highest optimization benefits.
  • Inefficient queries, such as full table scan and unexpected CROSS JOIN, with potential optimization benefits.
  • Key business queries, with the SLA preferentially guaranteed.

In addition, the following methods are used:

  • Establishes performance baselines to analyze the trends of duration and resource consumption for each type of query.
  • Evaluates optimization effects to compare the metric changes of the same type of query before and after optimization.

The following figure shows an example of query classification in the initial stage of performance optimization. Partition pruning, complex data types, Append 2.0, and UNNEST are the major issues that need to be resolved.

1

Key Optimizations

Automatic Partitioning

Pain Points

MaxCompute uses auto-partitioned tables to implement a feature similar to the time-unit column partitioning feature of BigQuery. Compared with traditional static partitioned tables, auto-partitioned tables dynamically generate partition key columns by using the trunc_time function, as shown in the following figure.

2

The automatic partitioning feature effectively improves the flexibility of partition management. However, the partition pruning process of static partitioned tables cannot be directly reused because partition key columns cannot be directly mapped to query conditions. This makes partition pruning ineffective and causes two core issues:

  1. Deteriorated table read performance: A query requires a full scan of all partitions, resulting in a high latency.
  2. Serious waste of resources: Redundant data scans consume more computing resources.

Solution

To support the partition pruning feature of auto-partitioned tables, MaxCompute designs a solution to perform dynamic partition pruning based on expressions. The purpose is to minimize the amount of scanned data while ensuring the correctness of query semantics. For example, execute the following statements to create an auto-partitioned table and query data from the table:

create table auto_partition_table
(
  key string, 
  value bigint, 
  ts timestamp
) 
partitioned by trunc_time(ts, 'day') as pt
;
 
SELECT * FROM auto_partition_table 
WHERE ts > timestamp'2024-09-25 01:12:15';

The essence of partition pruning based on the filter conditions of the ts column is to convert the expressions related to the ts column into the expressions related to the pt column. Both equivalent conversion and non-equivalent conversion are supported, but the filtering results before and after conversion must have an "inclusive" relationship. The following three scenarios are supported:

Basic Pruning

If the filter conditions of the ts column do not include functions, expressions can be directly converted. However, the filter ranges of expressions before and after conversion are not equivalent. Therefore, the original expressions must be retained. The following figure shows the derivation process.

3

Non-equivalent Conversion

If the filter conditions of the ts column include functions such as time zone conversion or date format conversion, expressions cannot be directly converted. In this case, built-in functions are used to fold the functions in expressions. For example, execute the following statement to query data:

select * from auto_partition_table 
where TODATE(FROM_UTC_TIMESTAMP(ts, 'Asia/Jakarta')) = date '2024-09-14'

The following figure shows the derivation process during partition pruning.

4

In this case, the results of the original filter conditions are a subset of the results of partition pruning. Therefore, the derivation process applies only to partition pruning, and the original filter conditions are retained.

Equivalent Conversion

If the functions in the filter conditions of the ts column are semantically equivalent to the functions that define the partition key, such as datetrunc(ts, 'day') and trunc_time(ts, 'day'), expressions can be equivalently converted. In this case, built-in functions are also used.

select * from auto_partition_table 
where datetrunc(ts, 'day') >= timestamp '2024-09-14 10:01:01';

In the preceding query statement, datetrunc(ts, 'day') is semantically equivalent to trunc_time(ts, 'day'). The following figure shows the derivation process.

5

In this case, the original expression is converted into an equivalent expression. Therefore, after partition pruning is complete, the original filter conditions are not retained.

Benefits

MaxCompute uses the expression-based dynamic partition pruning solution to achieve the high-performance and cost-effective data processing capabilities of static partitioned tables, and also ensures the flexibility of partition management.

UNNEST for Complex Data Types

Pain Points

In Google BigQuery, UNNEST is a core operation used to process arrays. It is used to display the elements of a nested array in multiple rows, as show in the following figure.

6

GoTerra extensively uses UNNEST in queries. After these queries are migrated to MaxCompute, MaxCompute simulates the feature of UNNEST by using LATERAL VIEW + EXPLODE or CROSS JOIN. This causes the following performance issues:

  • Redundant execution plans: A single UNNEST operation is split into multiple table scans and join operations, consuming more resources.
  • Data bloat risk: Cartesian products trigger an explosion of intermediate results.
  • Feature limits: Multi-layer nested arrays cannot be efficiently processed.

The following statements provide a simplified example:

create table foo (
  creation_date date, 
  params array<struct<user_id:string, tags:string>>
) ;
 
select creation_date, 
(select count(*) from
 unnest(param) where tags='Mac' ) as m_cnt,
(select count(*) from
 unnest(param) where tags='Linux' ) as l_cnt
from foo;

The following figure shows the execution plan.

7

As shown in the preceding figure, the execution plan has the following issues:

  1. The table is scanned three times.
  2. The UNNEST operation is performed twice on columns.
  3. Multiple join operations are performed.

Solution

To resolve the preceding issues, MaxCompute redesigns the execution framework of UNNEST and uses a general-purpose framework to run sub-plans by using the APPLY operator. This framework provides common subquery execution capabilities and higher extensibility. The following figure shows the optimized execution plan for UNNEST in the new architecture.

8

The optimization process consists of the following items:

  • Internal plans are introduced. An internal plan is also an operator tree, which represents the internal computing logic of each row of data in the outer layer. The compiler generates initial plans based on the preceding statements.
  • The optimizer performs further optimizations based on the following items:

    • Internal plans cannot affect the optimization of external plans, including pushdown and column pruning.
    • Internal plans are optimized after external plans are optimized.
    • The adjacent TableFunctionScan operations are merged. Then, a subtree merge is performed on the merged internal plans.
  • The executor runs both the internal and external plans.

Benefits

After the execution framework of UNNEST is reconstructed and upgraded, MaxCompute improves the performance and stability of UNNEST: The performance is improved by up to 10 times. Out-of-memory (OOM) errors can be prevented in scenarios where a large number of UNNEST operations are performed, which lays a solid foundation for complex data analysis scenarios.

Ultra-large Queries

Pain Points

GoTerra uses a large number of ultra-large queries. The execution plans of these ultra-large queries are much larger than those of regular large queries. The former usually contains a large number of operators, deeply nested subqueries, and complex data types such as thousands of columns in a single STRUCT column.

Number of data types processed in a step Number of relational operators of an execution plan Nesting depth of subqueries Number of relational operators after an execution plan is expanded
Large query < 1,024 < 10,000 < 50 < 100,000
Ultra-large query > 5,000 > 100,000 > 100 > 100,000,000

When the optimizer processes the execution plans of ultra-large queries, issues such as high memory usage and slow graph traversal may occur.

Solution

Graph Optimization

Each execution plan is a directed acyclic graph (DAG) composed of relational operators. The traversal direction of the graph is from the output to the input. The optimizer matches the operator modes in the graph and replaces the old operator structure with the new operator structure.

Technical point Before optimization After optimization
Graph reuse among multiple stages A graph is built separately in multiple optimization stages, which is costly. Only one graph is built and shared among multiple optimization stages.
Bidirectional graph structure The traversal direction from the input has O(n) complexity. Bidirectional traversal has O(1) complexity.
Graph garbage cleaning Subgraphs that need to be deleted after replacement are still scanned and computed, with O(n) complexity. Unreferenced subgraphs are deleted, with O(1) complexity.
Traversal iterator for large graphs Iterator construction requires the pre-traversal of child nodes, with O(n) complexity. Iterator construction does not require pre-traversal, with O(1) complexity.
Traversal after deduplication A single node is traversed multiple times in scenarios where subqueries are deeply nested. Traversal after deduplication eliminates amplification effects.

Global Reuse of Digests

A digest expresses the complete information about any part of a relational operator. The optimizer identifies a relational operator based on the digest. Each time a relational operator is generated, a relational operator digest must be constructed, including a type information digest and a scalar function digest.

MaxCompute supports the caching and partial reuse of these three types of digests. In most cases, execution plans contain a large number of similar information structures. During optimization, complexity is calculated based on the hash calculation and equality comparison of related objects, and the required memory space is also reduced.

In this example, a type information digest is used.

Memory space Nested type information is globally registered and reused. For multiple similar structures, only distinct parts are stored for future reference.
Equality comparison Only references are compared for equality, with O(1) complexity.
Hash calculation The hash value of each layer of a nested type is calculated and cached for future use, with O(1) complexity.

Benefits

Based on the optimization of ultra-large queries, MaxCompute breaks through the bottlenecks of the optimizer. It supports the efficient processing of execution plans that contain deeply nested subqueries, complex data types, and millions of operators. In typical large queries, the duration is reduced from more than 15 minutes to 1 minute, and the maximum memory usage is reduced from more than 5 GB to 1 GB.

Intelligent Tuning

Pain Points

Big data computing scenarios may contain issues such as the lack of statistics, a large number of user-defined function (UDF) optimization black boxes, and complex jobs that involve thousands of lines of SQL code. It is difficult for the optimizer to generate an optimal execution plan based on an accurate cardinality or make an ideal resource allocation plan for each stage in advance.

As a result, the system misses lots of optimization opportunities. In addition, important jobs must be manually optimized, such as adjusting stage parallelism and adding tuning flags. Manual optimization is time-consuming and labor-intensive. Adjustment is also required if the data and jobs change.

Intelligent tuning resolves the preceding pain points. Recurring jobs with similar computing logic can make full use of historical execution information to guide future execution. This enables automatic system optimization and improves overall job performance.

Solution

MaxCompute collects various job statistics. After real-time feedback or offline training and analysis, the optimizer can learn the historical execution status of jobs and more intelligently understand the data and jobs. The optimizer injects historical statistics into the cost-based optimizer (CBO) framework to generate a globally optimal execution plan. Also, the optimizer intelligently adjusts stage parallelism based on the calculation amount and running speed of each stage to reduce the consumption of worker resources, and automatically accelerates the stages with low parallelism.

9
Architecture

The following figure shows the optimization capabilities of intelligent tuning, including the optimization of execution plans, resource allocation, execution mode selection, and runtime operator execution. Intelligent tuning has framework-based optimization and extension capabilities. It can automatically use the CBO framework to apply some optimization rules, and also continuously optimize all aspects of job operations.

10

Benefits

As the second growth curve of performance optimization, intelligent tuning greatly reduces manual job optimization and enhances automatic optimization. By using intelligent tuning, GoTerra optimizes resource allocation to save up to 87% of resources. GoTerra also uses more intelligent execution mode selection to prevent online jobs from rolling back to offline jobs, accelerating job execution by up to 45%. The optimization capabilities of intelligent tuning will be continuously extended to provide better engine performance and ease of use.

Summary and Future Outlook

For MaxCompute, the data migration project of GoTerra is not only a benchmark practice for a leading Asian-Pacific customer, but also a performance limit test under ultra-large loads. During the four-month performance test, MaxCompute makes paradigm-level breakthroughs in resource efficiency and computing performance based on kernel-level reconstruction and upgrades.

1. Resource utilization

  • Financial extract, transform, and load (ETL) scenarios: CU consumption is only 50% of that in BigQuery.
  • Business intelligence (BI) analysis scenarios: The end-to-end query duration is reduced by up to 83%, which fully meets business requirements.

2. Technologies

  • New features: MaxCompute provides more than 600 new features, including automatic partitioning, UNNEST, and Append 2.0, to ensure seamless migration from BigQuery in terms of syntax and performance.
  • Optimization for ultra-large execution plans: MaxCompute can parse execution plans that contain millions of operators. The query duration is reduced from more than 15 minutes to 1 minute.
  • Deep optimization for complex data types: MaxCompute optimizes the processing of complex data types by using technologies such as column pruning, predicate pushdown, and zero copy. The performance of typical cases is improved by 20 times.

In the data migration project of GoTerra, MaxCompute continuously optimizes performance by stage and scenario to effectively improve query performance in BI and ETL scenarios, and successfully achieve migration goals. In the future, MaxCompute will take into account the business requirements of GoTerra and technology development trends in the industry, and continue to improve efficiency, cost-effectiveness, and stability.

  • Faster: continuous unlocking of performance potential to achieve ultimate optimization

    • Incremental update: uses the incremental update mechanism to significantly reduce the consumption of computing resources, speed up the availability of data, and improve user experience.
    • Native support for geospatial data types: enhances the native support for geospatial data types and uses spatial indexes to greatly improve the efficiency of complex spatio-temporal data queries and enhance the core competitiveness of geospatial business.
    • Integration with heterogeneous computing power: explores the deep integration with heterogeneous computing power such as GPUs to further accelerate the processing of key analysis tasks.
  • More cost-effective: fine-grained resource management and cost optimization

    • Intelligent auto scaling: develops and improves the elastic resource scheduling mechanism based on real-time loads. This mechanism can automatically scale up and down resources during peak and off-peak hours, significantly improving resource utilization and reducing costs.
    • Pay-as-you-go and cost monitoring: introduces a finer-grained multidimensional resource metering system and proactive cost alerting to help select resources as needed, allocate budgets in a reasonable manner, and prevent resource waste.
  • More stable: future-oriented high availability and observability

    • Disaster recovery and high availability: builds a data redundancy and fault tolerance mechanism that supports active geo-redundancy to improve system continuity and reliability in the face of hardware failures, traffic spikes, and exceptions.
    • Improved observability: enhances the end-to-end performance monitoring, SQL diagnostics, and automatic health check capabilities to detect and identify faults at the earliest opportunity and enable rapid self-healing. This ensures the stable operations of core business.

Conclusion

The cross-border data warehouse migration project of GoTerra not only sets a technological benchmark for the upgrade and performance optimization of data platforms in the industry, but also lays a solid foundation for MaxCompute to deliver world-class performance. Throughout the project, the cross-border migration team has accumulated performance optimization methodologies, automated toolchains, and general practical experience for large-scale and complex scenarios. This provides a mature template and reusable experience for the subsequent migration of cross-region, ultra-large data warehouse systems. Looking to the future, MaxCompute will stick to performance-driven technological innovations. On the one hand, MaxCompute will focus on AI-driven intelligent optimization and automated O&M, continuously improve the adaptive resource scheduling, performance monitoring, and exception self-healing capabilities, and further improve the development and O&M efficiency. On the other hand, MaxCompute will actively explore new directions such as the integration of data lakes and data warehouses, native geospatial data types and unstructured data types, and incremental computing to continuously expand the boundaries of performance optimization. Based on continuous technological evolution, MaxCompute will provide stronger performance assurance and higher O&M efficiency for enterprises in large-scale data scenarios, accelerate the unlocking of business value, and help customers cope with more complex big data challenges in the future.

0 1 0
Share on

Alibaba Cloud Community

1,292 posts | 455 followers

You may also like

Comments

Alibaba Cloud Community

1,292 posts | 455 followers

Related Products