Skip to main content
GPU Glossary/Monitoring Metrics
Monitoring Metrics

Fan Speed

GPU or chassis cooling fan speed as a percentage of maximum RPM.

What it is

Fan speed is the current operating speed of the GPU's cooling fans as a percentage of maximum RPM, reported via DCGM_FI_DEV_FAN_SPEED (range 0-100%). For passively cooled SXM-form-factor GPUs (A100 SXM, H100 SXM), this field may report 0% or be unavailable because chassis fans are controlled by BMC/IPMI; PCIe-slot GPUs like L40S and A30 have onboard fans that DCGM monitors directly. Normal ranges are 30-50% at idle and 60-80% under sustained compute workloads.

Why it matters

Persistently high fan speeds above 90% indicate the cooling system is at maximum effort -- any further temperature increase will trigger thermal throttling. A sudden drop to 0% or a stuck constant percentage indicates a fan failure that will lead to thermal shutdown within minutes under load. A PCIe-form-factor GPU's fan speed jumping from 65% to 95% over 30 minutes at stable ambient temperature points to a clogged heatsink or degraded thermal paste requiring physical maintenance.

How to monitor

Track DCGM_FI_DEV_FAN_SPEED alongside DCGM_FI_DEV_GPU_TEMP and DCGM_FI_DEV_POWER_USAGE. Alert on fan speed stuck at a constant value (fan failure) or sustained above 90% (marginal cooling). Factryze correlates fan speed trends with temperature and power draw to distinguish expected workload-driven ramp-up from anomalous cooling degradation, triggering preemptive power capping before throttling occurs.

Fan Speed - Thermal Management Feedback LoopFan Speed - Thermal Management Feedback Loop
Pinch to zoom, drag to pan, double-tap to toggle
Fan Speed - Thermal Management Feedback LoopFan Speed - Thermal Management Feedback Loop
DCGM Metric Field
DCGM_FI_DEV_FAN_SPEED

Monitor this automatically

Factryze correlates GPU signals in real time: errors, clocks, and fabric health.

Get Started Free