Time to first token
With a prompt "Hi" (9 tokens). Measures the time for the first token response. |
0 ms | ||||||||||||||||||
Time for prefill 1K tokensMeasures time for the inference engine to process 1K tokens and produce the first response token.
|
0 ms | ||||||||||||||||||
Token generation performance
Measures tokens per second for the generation phase of the inference engine.
|
0 tps |