Skip to main content

Scale AtlasChapter 1 of 88 termsUpdated 2026-05-10

Compute

Peak FLOPs are easy. Achieved FLOPs are everything. The compute story starts at the 32-thread warp, climbs through SM occupancy and tensor-core utilization, and ends at the cluster-scale roofline. FP8 doubles throughput when scaling holds. MIG and MPS slice one GPU when one job cannot fill it. Every term in this chapter is a gap between the marketed peak and what your training step actually delivers.

989 TFPeak BF161979 TFFP8 ceilingstallWarp divergenceidle SMsOccupancy gapkernel costFP / launchsync stallCollective waitreal stepAchievedpeak FLOPs are easy. achieved FLOPs are everything.TFLOPS, H100 SXM5