Memory Clock
GPU HBM/GDDR memory frequency in MHz that determines memory bandwidth.
What it is
The memory clock is the operating frequency of the GPU's memory subsystem (HBM2e, HBM3, or GDDR6X) measured in MHz, reported via DCGM_FI_DEV_MEM_CLOCK, and directly determines the available memory bandwidth feeding the streaming multiprocessors. On H100 SXM, HBM3 runs at 1593 MHz for 3.35 TB/s peak; on A100 SXM, HBM2e runs at 1215 MHz for 2.0 TB/s. Unlike SM clocks, data center GPU memory clocks are typically locked at rated speed during active operation.
Why it matters
Any reduction in memory clock frequency is a significant anomaly since it should not vary under normal conditions -- even a 50-100 MHz drop means a proportional bandwidth reduction. A 10% memory clock reduction translates directly to 10% less bandwidth, devastating throughput for memory-bandwidth-bound workloads like LLM inference and attention computation. An A100 dropping from 1215 MHz to 1100 MHz during inference will see approximately 10% latency increase per layer, compounding across the entire model.
How to monitor
Track DCGM_FI_DEV_MEM_CLOCK and alert on any deviation from the GPU's rated frequency. Correlate memory clock drops with DCGM_FI_DEV_MEMORY_TEMP and DCGM_FI_DEV_ECC_SBE_VOL_TOTAL to distinguish thermal protection response from failing HBM hardware. Factryze monitors memory clock continuously and correlates anomalies with HBM temperature and ECC error rates to classify the root cause.
DCGM_FI_DEV_MEM_CLOCKRelated terms
GPU core compute clock frequency in MHz, scaling between base and boost.
Percentage of GPU framebuffer memory allocated by active workloads.
Automatic GPU clock reduction when die temperature exceeds 83-90C safe limits.
Monitor this automatically
Factryze correlates GPU signals in real time: errors, clocks, and fabric health.
Get Started Free