Prompt API - Speed Benchmark

Time to first token

With a prompt "Hi" (9 tokens). Measures the time for the first token response.
After a fresh browser start, this measures time to load a model.
On a warm start, this is the inference engine session creation time.

0 ms

Time for prefill 1K tokens replay

Measures time for the inference engine to process 1K tokens and produce the first response token.

Run	1	2	3	4	5
ms	0	0	0	0	0

0 ms

Token generation performance replay

Measures tokens per second for the generation phase of the inference engine.
Measures time for the tokens 10 to 50 and tokens 50-100 as generation speed can vary with response length.

Run	1	2	3	4	5
Token 10-50 tps	0	0	0	0	0
Token 50-100 tps	0	0	0	0	0

0 tps