Power Capping

Limiting GPU power draw below TDP to control thermals and rack density.

What it is

Power capping sets a GPU's enforced power limit below its default TDP to reduce power consumption and thermal output, trading peak performance for improved power efficiency and thermal headroom. Caps are configured via nvidia-smi -pl or the DCGM API, take effect immediately, and are tracked via DCGM_FI_DEV_ENFORCED_POWER_LIMIT alongside actual draw in DCGM_FI_DEV_POWER_USAGE. The performance impact is non-linear -- reducing an H100 from 700W to 600W (14% power cut) typically yields only a 5-8% throughput decrease for memory-bound LLM inference.

Why it matters

Power capping is a critical lever in dense GPU deployments where rack power limits constrain how many GPUs can run at full TDP simultaneously. A 1000-GPU cluster running at 600W instead of 700W saves 100 kW and equivalent cooling load, enabling additional rack density within the same facility power budget. GPUs where actual draw consistently equals the enforced limit confirm the cap is actively constraining performance -- any further reduction has a disproportionately larger throughput impact.

How to monitor

Compare DCGM_FI_DEV_POWER_USAGE against DCGM_FI_DEV_ENFORCED_POWER_LIMIT to detect whether a cap is binding. Correlate DCGM_FI_DEV_CLOCK_THROTTLE_REASONS bit 5 with the power limit to confirm power-driven throttling versus thermal-driven. Factryze uses power capping as an automated remediation action during thermal emergencies, dynamically lowering limits on overheating GPUs and restoring full power once thermal conditions stabilize.

DCGM Metric Field

DCGM_FI_DEV_POWER_USAGE / DCGM_FI_DEV_ENFORCED_POWER_LIMIT

Related terms

TDP (Thermal Design Power)

Maximum sustained GPU power dissipation rating, measured in watts.

Thermal Throttling

Automatic GPU clock reduction when die temperature exceeds 83-90C safe limits.

GPU Utilization

Percentage of time GPU streaming multiprocessors are actively executing kernels.

Monitor this automatically

Factryze correlates GPU signals in real time: errors, clocks, and fabric health.

Get Started Free