Profile CUDA Kernels & Analyze GPU Performance with NVIDIA Nsight on DSW - Platform for AI

DSW supports the NVIDIA Nsight performance analysis tool. You can use this tool for visual performance analysis to identify and optimize performance bottlenecks

Limits

DSW instance type: The instance must be a Lingjun AI Computing Service instance type and be equipped with at least one NVIDIA GPU.
Non-Hopper architecture: For instance types with a non-Hopper architecture, pause the AMPerf service before you use Nsight. A hardware limitation from NVIDIA allows only one process to collect GPU performance metrics at a time.

Procedure

1. Create a DSW instance

Create a DSW instance using a supported GPU instance type.

2. Pause AMPerf collection

You can skip this step for instance types with a Hopper architecture, such as H20. For instance types with a non-Hopper architecture, run the `amperfd` command to pause AMPerf metric collection. After you pause collection, you can use tools such as nsys and ncu.

# Pause AMPerf collection
/run/amperf/bin/amperfd profmetric --pause -t 600

# Resume AMPerf collection
/run/amperf/bin/amperfd profmetric --resume

Important

Impact on monitoring: Performance metrics are not collected while the AMPerf service is paused. During this period, the accuracy of metrics on the Cloud Monitor dashboard for the instance is affected.
Pause duration: To prevent extended loss of monitoring data, you can pause the AMPerf service for a maximum of 10 minutes (600 seconds). Collection automatically resumes after this period. You can also resume collection manually. After the service resumes, you must wait 1 minute before you can pause it again.

3. Install and use Nsight

Download, install, and use Nsight by following the official NVIDIA documentation.

Nsight Compute (ncu): A dedicated tool for profiling CUDA kernel performance. It supports fine-grained metric analysis, such as instruction-level execution time and memory bandwidth utilization. For more information, see Nsight Compute Command-Line Interface Guide.
Nsight Systems (nsys): A system-level performance analysis suite. It captures the GPU-CPU execution trace and resource usage status for the entire call stack. For more information, see Nsight Systems Manual.

Recommendations:

Segment long-running tasks: For long-running tasks, run the analysis in segments to work around the AMPerf pause duration limit.
Resume promptly: Once the analysis is completed, resume the AMPerf service immediately to ensure the integrity of your monitoring data on the cloud platform.