check_circle Model: Available

Time to first token

With a prompt "Hi" (9 tokens). Measures the time for the first token response.
After a fresh browser start, this measures time to load a model.
On a warm start, this is the inference engine session creation time.

0 ms

Time for prefill 1K tokens replay

Measures time for the inference engine to process 1K tokens and produce the first response token.

Run12345
ms00000
0 ms

Token generation performance replay

Measures tokens per second for the generation phase of the inference engine.
Measures time for the tokens 10 to 50 and tokens 50-100 as generation speed can vary with response length.

Run 1 2 3 4 5
Token 10-50 tps 0 0 0 0 0
Token 50-100 tps 0 0 0 0 0
 
0 tps