Using a small draft model to guess several tokens that a large model verifies in parallel, speeding generation.
← All terms