GPUDirect RDMA
Direct GPU memory access across the network, bypassing CPU copies.
What it is
GPUDirect RDMA is a technology that allows network adapters (InfiniBand HCAs or NICs) to read from and write directly to GPU memory without staging data through host CPU memory, eliminating two memory copies per transfer. It requires a peer-memory kernel module, compatible GPU and NIC placement on the same PCIe switch, and NUMA-aware configuration.
Why it matters
GPUDirect RDMA reduces AllReduce latency by 30-50% compared to CPU-staged transfers, and is essential for achieving peak inter-node bandwidth in distributed training. Without it, every inter-node NCCL collective incurs unnecessary CPU memory staging that saturates the host memory bus. Misconfigurations that silently fall back to CPU-staged transfers appear as unexpected inter-node bandwidth degradation.
How to monitor
Verify GPUDirect RDMA is active by checking for the nvidia-peermem kernel module (lsmod | grep nvidia_peermem) and confirming GPU-NIC PCIe affinity via nvidia-smi topo -m. Track DCGM_FI_DEV_PCIE_TX_THROUGHPUT alongside inter-node NCCL bandwidth to detect silent fallback to CPU-staged transfers. Factryze validates GPUDirect RDMA configuration as part of node health checks.
Related terms
Monitor this automatically
Factryze correlates GPU signals in real time: errors, clocks, and fabric health.
Get Started Free