A target for a service metric such as latency or availability that an inference fleet is sized to meet.
← All terms