A popular open-source inference engine known for PagedAttention and high-throughput continuous batching.
← All terms