MFU · Model FLOPs Utilization
Achieved FLOPs divided by peak FLOPs in a training run; 35-55% is good at scale, eroded by collectives and stragglers.
Achieved FLOPs divided by peak FLOPs in a training run; 35-55% is good at scale, eroded by collectives and stragglers.