Quality Benchmark
This test runs through a tiny subset of MMLU dataset, prompts the language model with each question, and tallies the number of correct single-letter answers.
0
Correct Answers
0
Total Questions
0
Score (%)
0
Invalid Responses
Progress
0 questions completed