Extreme performance is one of the core competencies of AnalyticDB for MySQL. AnalyticDB for MySQL ranked first in the TPC-DS list evaluated by world-renowned institutions and provides a cost-effectiveness ratio four times greater than that of the runner up since April 2019. And AnalyticDB for MySQL is still increasing its running speed every year. These achievements are made thanks to the continuous iteration of leading technologies of the Xihe analytical computing engine developed by Alibaba Cloud.

Asynchronous execution engine

Xihe adopts a pure asynchronous execution method. Although asynchronous execution is more complex than synchronous execution, the parallel execution efficiency of the system CPU is improved by using the parallel management capabilities in user mode. This is the underlying capability of Xihe to provide extreme performance.

Vectorized execution model

Xihe adopts a vectorized query execution model based on the fully asynchronous execution engine. The computing method centers around operators and is friendlier to modern CPUs than the computing method that centers around data. Cache-friendly code is provided and out-of-order execution (OoOE) is used to improve the CPU instruction concurrency. Single instruction, multiple data (SIMD) is used to improve the CPU data concurrency and make full use of the computing capabilities of modern CPUs.

Query execution oriented to hybrid loads

Xihe is also the core of AnalyticDB for MySQL. Cloud native data warehouses oriented to large amounts of data must meet different data analysis scenarios, including online reports, online interactive data analytics, and data extract, transform, load (ETL). The analytical computing engine must use different query optimization technologies to adapt to different scenarios. These technologies include on-demand dynamic compilation, CPU-friendly memory layout, and adaptive parallelism.

Understanding of data and perception of storage

Xihe can perceive data. The following items show the advantages of Xihe compared with pure computing engines:
  • Compute specified data by using data distribution to avoid the overheads of interaction between data and command streams in distributed systems.
  • Push down predicates and aggregate operators by using data storage capabilities to accelerate near-storage computing.
  • Optimize query execution by using paradigm dependencies in data models and numeric data types.