Replicating the model across GPUs that each process different data and synchronize gradients via all-reduce.
← All terms