DSW supports the NVIDIA Nsight performance analysis tool. You can use this tool for visual performance analysis to identify and optimize performance bottlenecks
Limits
DSW instance type: The instance must be a Lingjun AI Computing Service instance type and be equipped with at least one NVIDIA GPU.
Non-Hopper architecture: For instance types with a non-Hopper architecture, pause the AMPerf service before you use Nsight. A hardware limitation from NVIDIA allows only one process to collect GPU performance metrics at a time.
Procedure
1. Create a DSW instance
Create a DSW instance using a supported GPU instance type.
2. Pause AMPerf collection
You can skip this step for instance types with a Hopper architecture, such as H20. For instance types with a non-Hopper architecture, run the `amperfd` command to pause AMPerf metric collection. After you pause collection, you can use tools such as nsys and ncu.
# Pause AMPerf collection
/run/amperf/bin/amperfd profmetric --pause -t 600
# Resume AMPerf collection
/run/amperf/bin/amperfd profmetric --resumeImpact on monitoring: Performance metrics are not collected while the AMPerf service is paused. During this period, the accuracy of metrics on the Cloud Monitor dashboard for the instance is affected.
Pause duration: To prevent extended loss of monitoring data, you can pause the AMPerf service for a maximum of 10 minutes (600 seconds). Collection automatically resumes after this period. You can also resume collection manually. After the service resumes, you must wait 1 minute before you can pause it again.
3. Install and use Nsight
Download, install, and use Nsight by following the official NVIDIA documentation.
Nsight Compute (ncu): A dedicated tool for profiling CUDA kernel performance. It supports fine-grained metric analysis, such as instruction-level execution time and memory bandwidth utilization. For more information, see Nsight Compute Command-Line Interface Guide.
Nsight Systems (nsys): A system-level performance analysis suite. It captures the GPU-CPU execution trace and resource usage status for the entire call stack. For more information, see Nsight Systems Manual.
Recommendations:
Segment long-running tasks: For long-running tasks, run the analysis in segments to work around the AMPerf pause duration limit.
Resume promptly: Once the analysis is completed, resume the AMPerf service immediately to ensure the integrity of your monitoring data on the cloud platform.