PDU Sizing
A power distribution unit takes a building feed and splits it into branch circuits each protected by a breaker. The breaker rating is what gets stenciled on the chassis. It is not the wattage you can run continuously, and it is not the wattage you can run safely under GPU transient load. Sizing PDUs from breaker rating alone is one of the most common ways to ship a rack that trips under training.
The derating staircase
Three numbers describe the same circuit:
- Nameplate. A 415 V three-phase circuit on a 30 A breaker is rated 12.4 kW. This is the marketing number.
- NEC continuous. Article 210 of the National Electrical Code (and equivalent codes in most jurisdictions) requires that any load running for three hours or more be sized to 80 % of the breaker. Training is a continuous load by any reasonable definition. The usable continuous wattage is 9.9 kW.
- Usable under transient peaks. GPUs do not draw their TDP. They draw whatever the workload needs at the moment, and that number includes brief excursions well above the nameplate. After leaving headroom for those, you are working with closer to 7.5 kW per branch in a fleet you intend to keep in production.
Each step down is real. Each one has cost a deployment that was sized to the next number up.
The transient problem
The H100 SXM5 has a published 700 W TDP. Independent power-monitor measurements show transient draws of roughly 1.1 kW for tens of milliseconds, especially during certain operations: tensor-core warmup phases, NVLink traffic bursts paired with compute, and the start of a new training step after a checkpoint flush. The average comes back down. The breaker does not care about averages over seconds; thermal and magnetic trips fire in milliseconds when the instantaneous current exceeds rating.
A breaker sized for 8× 700 W = 5.6 kW continuous looks safe on the spec sheet. The same breaker sees 8× 1.1 kW = 8.8 kW transient and trips. The whole rack goes black mid-step. Every GPU in the rack just lost the work since the last checkpoint. This is a gang failure caused by a single breaker that nobody specified for the workload's actual current shape.
Three-phase math, briefly
At hyperscale density it is worth knowing the math. Single-phase distribution at 208 V is 6.2 kW per 30 A circuit nameplate, dropping to roughly 5 kW after the NEC continuous derate. Three-phase 415 V at 30 A is 12.4 kW nameplate, and the continuous derate gives 9.9 kW. That is why every modern AI datacenter is on three-phase 415 V (or its 400/480 V equivalents): not for fashion, but because single-phase cannot reach the watts a GPU rack needs.
The deeper math is whether the building can deliver three-phase to the rack. Older facilities have step-down transformers sized for 208 V and not all of them are easily upgraded. This is part of why the 10-year facility decision cuts so deep: the breaker panel and the upstream transformers were chosen for an older density profile, and replacing them is not a software upgrade.
Practical guidance
- Size every branch circuit for 80 % of breaker continuous and a 30 to 50 % transient margin on top. Yes, that is conservative. The cost of getting it wrong is a tripped breaker mid-training, not a warning.
- Instrument per-branch current and power. PDU vendors (Vertiv, Raritan, Eaton, APC) all ship metered units; the data is invaluable for catching the workload that drifted past its budget before the breaker does it for you.
- If the GPU peak draw is the binding number and you cannot move the rack, power capping is the software lever to keep the actual current under the breaker. It costs throughput; tripping costs everything since the last checkpoint.
- Match the PDU breaker curve to the load. A breaker with a fast magnetic trip will fire on transients that a slow-trip variant absorbs. The right curve depends on your workload shape; consult the PDU vendor.
PDU sizing is unglamorous and one of the highest-impact decisions in fleet design. A breaker that holds is invisible. A breaker that does not is a Slack thread.
See also
Updated 2026-05-09