Skip to main content
GPU Glossary/Errors & Failures
Errors & Failures

GPU Fallen Off Bus

Xid 79 error: GPU completely disconnects from the PCIe bus.

What it is

GPU fallen off bus is the failure condition reported as Xid 79, where the GPU becomes completely unresponsive to the host system over the PCIe bus. Root causes include PCIe link instability, power delivery issues, thermal damage, or hardware defects in the GPU or motherboard slot.

Why it matters

This is one of the most disruptive GPU failure modes: all running workloads on the affected device are killed instantly and nvidia-smi can no longer communicate with the GPU. Software-level reset via nvidia-smi -r is impossible because the device is unreachable on the bus. Repeated occurrences indicate a hardware fault requiring GPU or riser card replacement.

How to monitor

Watch dmesg for Xid 79 events and confirm with DCGM -- stale or missing telemetry from a specific GPU UUID while peers continue reporting is a secondary signal. nvidia-smi will list the GPU as Unknown or fail to enumerate it entirely. Recovery requires at minimum a cold reboot; Factryze automatically escalates through GPU reset, driver reload, and BMC-level power cycle when Xid 79 is detected.

GPU Fallen Off Bus - Xid 79: GPU Completely UnreachableGPU Fallen Off Bus - Xid 79: GPU Completely Unreachable
Pinch to zoom, drag to pan, double-tap to toggle
GPU Fallen Off Bus - Xid 79: GPU Completely UnreachableGPU Fallen Off Bus - Xid 79: GPU Completely Unreachable

Monitor this automatically

Factryze correlates GPU signals in real time: errors, clocks, and fabric health.

Get Started Free