Memory Utilization

Percentage of GPU framebuffer memory allocated by active workloads.

What it is

Memory utilization is the percentage of the GPU's total framebuffer memory currently allocated by running applications and the driver, calculated from DCGM_FI_DEV_FB_USED and DCGM_FI_DEV_FB_FREE. Total capacity is 80 GB for H100 SXM and A100 80GB, 48 GB for L40S. Memory utilization measures allocation, not access intensity -- a model may allocate 75 GB while memory bandwidth utilization is low during idle inference periods.

Why it matters

Memory utilization creeping above 90% on inference GPUs creates OOM risk: many frameworks allocate memory dynamically, and a burst of concurrent long-context requests can exhaust remaining headroom in seconds, killing the serving process and dropping all in-flight requests. A vLLM server running a 70B model on an 80 GB H100 at 92% allocation leaves only 6.4 GB for KV cache -- a very thin margin. Growing allocation over hours without a workload change indicates a memory leak that will eventually crash the process.

How to monitor

Track DCGM_FI_DEV_FB_USED and DCGM_FI_DEV_FB_FREE to compute allocation percentage and trend over time. Alert on allocation above 90% for inference workloads and on monotonically increasing allocation over multi-hour windows. Factryze distinguishes between high-but-stable allocation patterns (normal for large model serving) and growing memory leaks, alerting before exhaustion causes production outages.

DCGM Metric Field

DCGM_FI_DEV_FB_USED / DCGM_FI_DEV_FB_FREE

Related terms

GPU Utilization

Percentage of time GPU streaming multiprocessors are actively executing kernels.

GPU Monitoring

Continuous tracking of GPU health, thermals, errors, and performance metrics.

Memory Clock

GPU HBM/GDDR memory frequency in MHz that determines memory bandwidth.

Monitor this automatically

Factryze correlates GPU signals in real time: errors, clocks, and fabric health.

Get Started Free