Skip to main content

Fair-Share Queues

Slurm and Volcano scheduling policies that allocate GPU time across teams over a sliding window. Yesterday's heavy users see today's priority dampened; light users get a boost.
Window
typically 7-30 days
Decay
exponential, half-life days
Slurm knob
PriorityType=priority/multifactor

Without fair-share, the scheduler is FIFO with maybe a priority class on top. Whoever submits first gets the GPUs, until they release them. In a fleet with one user the policy is fine. In a fleet with twelve teams competing for 200 H100s, FIFO turns into "whichever team has a script that submits jobs every 60 seconds wins everything." Fair-share is the rule that breaks the tie: instead of looking only at the queue, look at the historical usage and dampen priority for whoever has been consuming the lion's share.

What fair-share actually computes

The classical fair-share formula tracks per-account (or per-user) usage in a sliding decay window. Slurm's multifactor priority plugin uses it; Volcano's proportion plugin implements the same idea on Kubernetes.

The mechanics:

  1. Each account has a RawShare (the entitlement) configured by the admin: maybe 30% for the production team, 50% for research, 20% for shared tooling.
  2. The scheduler tracks each account's actual GPU-hours consumed, decayed exponentially. A job that ran 30 days ago counts much less than one that ran yesterday.
  3. The fair-share factor for each job is approximately RawShare / NormalizedUsage. Under target use: factor over 1, priority boosted. Over target use: factor under 1, priority dampened.
  4. Fair-share is one component of the multifactor priority; the others are wait time, partition priority, QoS, and job size. Final priority = weighted sum.

Decay is the key knob. Slurm's PriorityDecayHalfLife defaults to 7 days; values from 1 day (very reactive, jobs from yesterday matter most) to 30 days (slow-moving, captures monthly usage patterns) all show up in production configs. Shorter half-life means a team's burst of training overnight gets penalized this morning; longer half-life means consistent overconsumers carry penalty for weeks.

7-day sliding windowTeam Apriority dampenedTeam Bpriority boostedMonTueWedThuFriSatSun

Why FIFO is not enough at fleet scale

Two failure modes that fair-share addresses:

Submission-rate gaming. A team that submits a job every 60 seconds (for legitimate reasons: hyperparameter sweeps, debug iterations) can monopolize a FIFO queue. Fair-share dampens their priority once they cross their entitlement, regardless of how often they submit.

Long-running starvation. A team running multi-day training jobs builds up a queue of new submissions that all wait behind the running ones. Without fair-share, smaller teams' shorter jobs can wait days for a window. Fair-share boosts the underused team's priority so their jobs slip in.

The combined effect: high-share teams get most of the fleet, but no team can starve another. Fair-share trades absolute throughput for predictable per-team latency, which is usually the right trade in a multi-tenant cluster.

What fair-share does not solve

Fair-share is a priority adjustment, not a quota. It does not stop a team from consuming more than their share; it just makes new jobs from over-share teams less likely to dispatch ahead of under-share teams. Three operational realities follow:

  1. Hard quotas need separate enforcement. If you genuinely cannot let team A use more than 30% of the fleet, you need a quota plugin (Slurm's MaxJobs, MaxNodes per partition; Volcano's queue capacity caps), not just fair-share.

  2. Idle capacity is still up for grabs. If team A is over share but the fleet is otherwise idle, team A's jobs still run. Fair-share dampens priority during contention; it does not gate idle capacity. This is usually what you want (you do not want GPUs sitting idle to enforce a policy), but operators sometimes expect quota-style behavior.

  3. Fair-share interacts with preemption. A high-priority team's job may preempt a low-priority team's running job; without preemption, fair-share only affects which queued job runs next, not what is currently running. Fair-share + preemption is the closest thing to "real-time fair allocation" most clusters get.

Practical guidance

  • Configure fair-share with a 7-14 day half-life in production. Shorter for fast-moving teams, longer for slower-cycle workloads.
  • Keep RawShare allocations roughly proportional to budget contribution or headcount; teams that pay more should get more.
  • Pair fair-share with gang scheduling and preemption; fair-share alone does not handle multi-rank atomicity or evict over-share running jobs.
  • Monitor sshare (Slurm) or queue allocation reports (Volcano) weekly. Drift in actual share vs configured share signals a misconfigured workload or a team gaming submission patterns.

The takeaway: fair-share is what stops one team from owning the fleet in a FIFO world. It does not enforce quotas, but it makes priority a function of past use rather than past submission time, which is usually what teams actually mean by "fair."

See also

Updated 2026-05-10