GPU Utilization
Percentage of time GPU streaming multiprocessors are actively executing kernels.
What it is
GPU utilization measures the percentage of time during a sampling window in which the GPU's streaming multiprocessors are actively executing at least one kernel, reported as a 0-100% value via DCGM_FI_DEV_GPU_UTIL. It measures temporal occupancy, not computational efficiency -- a memory-bound kernel can show 100% utilization while leaving the majority of CUDA cores idle. Average GPU utilization across production data center clusters ranges from 30-60%.
Why it matters
Sustained GPU utilization below 15% on an allocated GPU indicates workload misconfiguration, a stalled training process, or a GPU that has silently fallen out of a distributed job while the process remains alive. During a 256-GPU training run, one GPU dropping from 95% to 8% while peers remain at 95% signals a data pipeline stall or NCCL hang -- every second it goes undetected wastes all 256 GPUs. High utilization does not guarantee efficiency: a throttled GPU can show 95-100% utilization while delivering 30% less throughput than a healthy peer.
How to monitor
Track DCGM_FI_DEV_GPU_UTIL continuously and alert on intra-job utilization divergence -- a single rank dropping 20+ percentage points below its peers is a reliable straggler signal. Correlate with DCGM_FI_DEV_SM_CLOCK to distinguish true idle from throttled states. Factryze's Performance Agent monitors utilization in real time, correlates it with SM clock and memory bandwidth, and alerts teams to anomalies indicating wasted capacity or degraded jobs.
DCGM_FI_DEV_GPU_UTILRelated terms
GPU core compute clock frequency in MHz, scaling between base and boost.
Percentage of GPU framebuffer memory allocated by active workloads.
Continuous tracking of GPU health, thermals, errors, and performance metrics.
Monitor this automatically
Factryze correlates GPU signals in real time: errors, clocks, and fabric health.
Get Started Free