×
Community Blog Learning about AIACC-ACSpeed | AIACC-ACSpeed Performance Data

Learning about AIACC-ACSpeed | AIACC-ACSpeed Performance Data

This article describes the performance data of AIAAC 2.0-AIACC Communication Speeding (AIACC-ACSpeed) in training models.

This article describes the performance data of AIAAC 2.0-AIACC Communication Speeding (AIACC-ACSpeed) in training models. Compared to training performed by using native PyTorch DDP, ACSpeed has a significantly improved performance.

Background Information

This article uses the performance data of multi-instance training with AIACC-ACSpeed (ACSpeed) V1.0.2 enabled on an eight-GPU ECS instance as an example. This example tests the performance of ACSpeed when training models in different scenarios.

Tested Versions

  • ACSpeed: ACSpeed V1.0.2
  • CUDA: CUDA V11.1
  • Torch: Torch 1.8.1 + cu111
  • Instance type: an eight-GPU instance

Test Results

ACSpeed shows significant performance improvements in multiple model trainings, with performance improvements ranging from 5% to 200%. As shown in the results, the improved performance of ACSpeed is more notable when compared to the poor scalability of PyTorch DDP. The performance of ACSpeed is not affected by scaling. The following figure shows the results of the test.Performance improvement

1

The following table describes the terms used in this test:

Term Description
ddp_acc (x-axis) Indicates the scalability of PyTorch DDP for multiple multi-GPU instances. The scalability of PyTorch DDP for multiple multi-GPU instances is indicated by multi-instance linearity. A smaller value of the linearity means a poorer scalability. The linearity is calculated based on the following formula: Linearity = multi-instance performance / single-instance performance / the number of clusters.
acc_ratio (y-axis) Indicates the improvement ratio of ACSpeed over PyTorch DDP measured by performance metrics. For example, 1.25 indicates that the performance of ACSpeed is 1.25 times that of PyTorch DDP, which means the performance is improved by 25%.
Dots Indicates different sizes of clusters.
• Blue dot: indicates that the number of clusters is 1.
• Orange dot: indicates that the number of clusters is 2.
• Red dot: indicates that the number of clusters is 4.
• Green dot: indicates that the number of clusters is 8.

Performance Data of Example Models

This section shows only the performance data of the example models that have been tested. The performance improvements vary across different scenarios, which is caused by the different proportions of communication computing. The following section shows the performance data of specific test models.

  • Scenario 1: Training an alexnet model

    • Model: alexnet
    • Domain: COMPUTER_VISION
    • Subdomain: CLASSIFICATION
    • Batch size: 128
    • Precision: Automatic mixed precision (AMP)

The following figure shows the performance data in this training scenario:

2

  • Scenario 2: Training a resnet18 model

    • Model: resnet18
    • Domain: COMPUTER_VISION
    • Subdomain: CLASSIFICATION
    • Batch size: 16
    • Precision: AMP

The following figure shows the performance data in this training scenario:

3

  • Scenario 3: Training a resnet50 model

    • Model: resnet50
    • Domain: COMPUTER_VISION
    • Subdomain: CLASSIFICATION
    • Batch size: 32
    • Precision: AMP

The following figure shows the performance data in this training scenario:

4

  • Scenario 4: Training a vgg16 model

    • Model: vgg16
    • Domain: COMPUTER_VISION
    • Subdomain: CLASSIFICATION
    • Batch size: 64
    • Precision: AMP

The following figure shows the performance data in this training scenario:

5

  • Scenario 5: Training a timm_vovnet model

    • Model: timm_vovnet
    • Domain: COMPUTER_VISION
    • Subdomain: CLASSIFICATION
    • Batch size: 32
    • Precision: AMP

The following figure shows the performance data in this training scenario:

6

  • Scenario 6: Training a timm_vision_transformer model

    • Model: timm_vision_transformer
    • Domain: COMPUTER_VISION
    • Subdomain: CLASSIFICATION
    • Batch size: 8
    • Precision: AMP

The following figure shows the performance data in this training scenario:

7

  • Scenario 7: Training a pytorch_unet model

    • Model: pytorch_unet
    • Domain: COMPUTER_VISION
    • Subdomain: CLASSIFICATION
    • Batch size: 1
    • Precision: AMP

The following figure shows the performance data in this training scenario:

8

  • Scenario 8: Training an hf_Bart model

    • Model: hf_Bart
    • Domain: NLP
    • Subdomain: LANGUAGE_MODELING
    • Batch size: 4
    • Precision: AMP

The following figure shows the performance data in this training scenario:

9

  • Scenario 9: Training a hf_Bert model

    • Model: hf_Bert
    • Domain: NLP
    • Subdomain: LANGUAGE_MODELING
    • Batch size: 4
    • Precision: AMP

The following figure shows the performance data in this training scenario:

10

  • Scenario 10: Training a speech_transformer model

    • Model: speech_transformer
    • Domain: SPEECH
    • Subdomain: RECOGNITION
    • Batch size: 32
    • Precision: AMP

The following figure shows the performance data in this training scenario:

11

  • Scenario 11: Training a tts_angular model

    • Model: tts_angular
    • Domain: SPEECH
    • Subdomain: SYNTHESIS
    • Batch size: 64
    • Precision: AMP

The following figure shows the performance data in this training scenario:

12

0 1 0
Share on

Alibaba Cloud Community

895 posts | 201 followers

You may also like

Comments