InfiniBand NDR vs HDR
The line on a cluster procurement spec that reads "8x ConnectX-7 NDR" is doing more work than it looks like. It is choosing the per-port bandwidth, the per-lane modulation, the switch radix, the rack power budget, the cable type, and the price per GPU all at once. The HDR-versus-NDR decision is the single largest knob on inter-node bandwidth in current procurement.
What changed and what did not
HDR (Mellanox's "High Data Rate" generation) runs at 200 Gb/s per port. Each port carries 4 SerDes lanes at 50 Gb/s each, NRZ encoded (one bit per symbol). NDR ("Next Data Rate") doubles that to 400 Gb/s per port, with the same 4 lanes per port, but at 100 Gb/s per lane and PAM4 modulation (two bits per symbol). The lane count and port physical layout did not change. The SerDes generation and the modulation did.
The doubling shows up linearly in the per-node aggregate. An H100 system typically provisions one IB port per GPU (8 ports total). With HDR, that is 8 x 200 Gb/s = 1.6 Tb/s = 200 GB/s of aggregate inter-node bandwidth. With NDR, it is 8 x 400 Gb/s = 3.2 Tb/s = 400 GB/s. For an inter-node all-reduce that is bandwidth-bound, the NDR cluster finishes in half the time. For tensor parallelism that has been forced to cross node boundaries, NDR makes the difference between "tractable" and "not".
What changes downstream
Switch radix is the first downstream change. NDR switches typically support 64 ports of 400G (Quantum-2 series), the same port count as HDR Quantum switches at 200G. So for the same switch chassis, you get double the bandwidth per leaf and double the bisection bandwidth for the same wiring topology. If you keep the topology fixed (same fat-tree levels, same oversubscription), an HDR-to-NDR upgrade is a 2x bandwidth bump for the same network architecture.
Rack power is the second. NDR optical transceivers run hotter (PAM4 has more demanding signal integrity and higher SerDes power) and the per-port power is roughly 25 W for NDR versus 15 W for HDR. For an 8-port-per-node configuration across 32 nodes, that is 8 x 32 x 10 W = 2.56 kW of additional NIC power per rack. Multiply by the switch power (NDR Quantum-2 draws ~1.5x what Quantum-HDR does for the same port count) and you are looking at 4-6 kW more per leaf rack. On a direct liquid cooling deployment that is a rounding error; on an air-cooled deployment it is a redesign.
Cable type and reach is the third. HDR commonly used DAC (direct-attach copper) for short rack runs and AOC (active optical cables) past 5 m. NDR effectively requires AOC or transceivers + fiber for any run past 1-2 m, because PAM4 signal integrity at 100 Gb/s per lane does not survive copper at meaningful distances. That changes both the cost (optics are pricier than copper) and the failure mode (optical transceivers fail more often than DAC cables).
What this means in practice
- For any new H100/B200 build at scale, NDR is the default. HDR clusters are still in production and are not bad, but the bandwidth ceiling per dollar has moved.
- The 2x in per-port bandwidth flows linearly into bisection bandwidth on a non-blocking rail-optimized fat-tree. For all-to-all collectives (mixture-of-experts, tensor-parallel-by-IB), this is a direct 2x speedup.
- The switch is the same physical envelope but draws more power and runs hotter. Plan for it in the PDU sizing and the rack thermal budget.
- The cable change (copper to optical) increases per-link failure rate. Optical transceivers and AOC cables are a meaningful fraction of silent data corruption sources at the link layer; CRC-induced retries are visible in
ibstatcounters. - For rear-fed rack architectures (where IB cables route along the back of each rack), NDR's higher per-port power means the rear thermal envelope tightens. Some operators deploy NDR with shorter cable runs on principle, even at the cost of more leaf switches per rack.
NDR is not magic. It is one SerDes generation. The 2x ripples through the rest of the cluster, and most of the operational complexity lives in those ripples.
See also
Updated 2026-05-10