check_circle Model: Available

Quality Benchmark

This test runs through a tiny subset of MMLU dataset, prompts the language model with each question, and tallies the number of correct single-letter answers.

0
Correct Answers
0
Total Questions
0
Score (%)
0
Invalid Responses
Progress 0 questions completed

Test Log