PCIe (PCI Express)
The host bus connecting GPUs to CPUs and other system devices.
What it is
PCIe (PCI Express) is the bus standard connecting GPUs to the host CPU, system memory, network adapters, and storage. PCIe Gen5 x16 provides 64 GB/s bidirectional bandwidth -- sufficient for GPU-to-host transfers but a bottleneck for GPU-to-GPU communication compared to NVLink. DCGM exposes current PCIe link generation and width for monitoring.
Why it matters
PCIe link width degradation (x16 training down to x8) is a common hardware fault that silently halves bandwidth without generating Xid events or DCGM alerts in default configurations. A halved PCIe link causes data-loading pipelines to starve the GPU, dropping utilization from 95% to 60% with no obvious error. GPU fallen off bus (Xid 79) is the extreme case where PCIe connectivity fails entirely.
How to monitor
Track DCGM_FI_DEV_PCIE_TX_THROUGHPUT and DCGM_FI_DEV_PCIE_RX_THROUGHPUT and compare against expected throughput for the negotiated link generation and width. Confirm the negotiated link width via nvidia-smi --query-gpu=pcie.link.width.current. Factryze detects bandwidth anomalies by comparing actual PCIe throughput against expected rates for the reported link width and flags degradation silently missed by default monitoring.
DCGM_FI_DEV_PCIE_TX_THROUGHPUT / DCGM_FI_DEV_PCIE_RX_THROUGHPUTRelated terms
Monitor this automatically
Factryze correlates GPU signals in real time: errors, clocks, and fabric health.
Get Started Free