Fan Speed
GPU or chassis cooling fan speed as a percentage of maximum RPM.
What it is
Fan speed is the current operating speed of the GPU's cooling fans as a percentage of maximum RPM, reported via DCGM_FI_DEV_FAN_SPEED (range 0-100%). For passively cooled SXM-form-factor GPUs (A100 SXM, H100 SXM), this field may report 0% or be unavailable because chassis fans are controlled by BMC/IPMI; PCIe-slot GPUs like L40S and A30 have onboard fans that DCGM monitors directly. Normal ranges are 30-50% at idle and 60-80% under sustained compute workloads.
Why it matters
Persistently high fan speeds above 90% indicate the cooling system is at maximum effort -- any further temperature increase will trigger thermal throttling. A sudden drop to 0% or a stuck constant percentage indicates a fan failure that will lead to thermal shutdown within minutes under load. A PCIe-form-factor GPU's fan speed jumping from 65% to 95% over 30 minutes at stable ambient temperature points to a clogged heatsink or degraded thermal paste requiring physical maintenance.
How to monitor
Track DCGM_FI_DEV_FAN_SPEED alongside DCGM_FI_DEV_GPU_TEMP and DCGM_FI_DEV_POWER_USAGE. Alert on fan speed stuck at a constant value (fan failure) or sustained above 90% (marginal cooling). Factryze correlates fan speed trends with temperature and power draw to distinguish expected workload-driven ramp-up from anomalous cooling degradation, triggering preemptive power capping before throttling occurs.
DCGM_FI_DEV_FAN_SPEEDRelated terms
Automatic GPU clock reduction when die temperature exceeds 83-90C safe limits.
Continuous tracking of GPU health, thermals, errors, and performance metrics.
Maximum sustained GPU power dissipation rating, measured in watts.
Monitor this automatically
Factryze correlates GPU signals in real time: errors, clocks, and fabric health.
Get Started Free