Skip to main content

Scale AtlasChapter 4 of 87 termsUpdated 2026-05-10

Storage

Feed the GPUs without starving them. Storage is a memory hierarchy: HBM at the top, NVMe and parallel filesystems in the middle, object stores at the bottom. Reads stream up the hierarchy on every epoch; writes flow down on every checkpoint. The job is to match each data lifecycle to the cheapest tier that still keeps the GPUs fed.

reads →S3 / object~ 5 GB/sNVMe cache~ 50 GB/sParallel FS~ 80 GB/sGPUDirect Storage~ 200 GB/sHBM4.8 TB/seach tier feeds the next; HBM is the only one that talks to compute.writes →HBM4.8 TB/sSharded write~ 100 GB/sParallel FS~ 80 GB/sS3 archive~ 5 GB/ssharded writes parallelize across PFS; archive lives on object store.two flows, one fabric. each tier is the cheapest one that can still feed the next.