Scale AtlasChapter 6 of 86 termsUpdated 2026-05-10

Orchestration

Idle GPUs are the most expensive thing in the building. The scheduler is what stops them from sitting idle and stops tenants from stepping on each other. Kubernetes sees GPUs through device plugins; Slurm and Volcano coordinate gangs that must start as one; fair-share queues and preemption decide who waits and who gets evicted; multi-tenant isolation contains the blast when something breaks.

Fair-Share Queues

Slurm and Volcano scheduling policies that allocate GPU time across teams over a sliding window. Yesterday's heavy users see today's priority dampened; light users get a boost.

Windowtypically 7-30 daysDecayexponential, half-life daysSlurm knobPriorityType=priority/multifactor

Kubernetes GPU Scheduling

Device plugins surface GPUs as schedulable resources. The NVIDIA GPU Operator wires up the driver, the device plugin, DCGM, and MIG so kube-scheduler can match Pods to silicon.

Resource namenvidia.com/gpuPlugin modelkubelet device-plugin gRPCOperator stackGPU Operator (DaemonSet)

Multi-Tenant Isolation

Stacked boundaries that keep one tenant's GPU faults from affecting another's: namespace and RBAC, network policy, resource quota, MIG or MPS partitioning. Each layer catches a different blast type.

Preemption Strategies

When and how to interrupt a running job for a higher-priority workload. Three modes: kill (state lost), checkpoint-and-evict (state saved), demote (stays running at lower priority).

Slurm Gang Scheduling

Reserve every node for a multi-rank job and start them all at once. Without it, ranks deadlock in MPI_Init waiting for peers that are still queued, burning GPU-hours on idle reservations.

Volcano Scheduler

Batch-style gang-aware Kubernetes scheduler used widely for AI training. Brings PodGroup, queues, and MinAvailable semantics that the default kube-scheduler lacks.