AIACC Graph Speeding, also known as AIACC-AGSpeed (AGSpeed) is an optimizing compiler for AI training developed by Alibaba Cloud. It is used to optimize the computing performance of the PyTorch models on Alibaba Cloud GPU-accelerated compute-optimized instances. AGSpeed can be considered an improved version of the original AIACC, and is an independent product that can implement imperceptible computing optimization.

Introduction

AGSpeed is an in-house optimizing compiler for AI training developed by Alibaba Cloud. It has significant performance advantages in PyTorch models training scenarios.

The following figure shows the service architecture of AGSpeed:AGSpeed
ComponentDescription
FrontendThe AGSpeed frontend is integrated with a version of TorchDynamo that has been optimized by the AIACC training performance and acceleration team. This enables you to obtain the computing diagram directly from PyTorch Eager API and then process the diagram with AGSpeed Backend Autotuner without the need to modify your code. Autotuner automatically selects the optimal backend implementation solution for your use case.
BackendIn the backend, AGSpeed integrates the in-house intermediate representation (IR) optimization pass plugin that is developed based on TorchScript IR. This enables more fusion operations to improve performance. In addition, AGSpeed also integrates an optimized version of NvFuser to its backend. Compared to the native NvFuser, the optimized NVFuser is more robust and provides better performance.

Limits

If you use dynamic tensor shape in the AGSpeed frontend, operations such as re-capture, re-optimize, and re-compile will be triggered. This may compromise the optimization performance of AGSpeed. We recommend that you use the agspeed.optimize() to optimize the static part of the model. The following section describes the causes and suggestions.
Note The static shape is the shape that you provided to the Tensor, or the shape of the intermediate variable that is inferred during the computing progress of the model. The static shape remains unchanged.

Causes

  • If you use dynamic tensor shape in the frontend, it may cause TorchDynamo to repeatedly obtain the computing diagram and perform the convert frame operation. This greatly reduces the effects of the optimization process.
  • If you use dynamic tensor shape in the backend, TorchScript will repeatedly perform graph specialization and all optimization passes. In addition, the NvFuser may also recompile a new kernel for the new tensor shape, which greatly reduces performance.

Suggestions

You can use the agspeed.optimize() operation to optimize the static part of the model to effectively avoid the preceding consequences. For example, the head of the model may incur dynamic shapes during the computing process and affect the performance. In this case, you can use agspeed.optimize() to optimize only the backbone of the model, instead of the head.

Contact us

If you need assistance with AIACC, join the Alibaba Cloud AIACC support group for external users (Group ID: 33617640). (Download DingTalk.)