Enkokilish Bench
Enkokilish (እንቆቅልሽ) are Ethiopian/Amharic riddles. Riddles in Amharic are difficult, require in-depth understanding of the language and are often used to test one's knowledge and reasoning skills. So based on Enkokilish, we've made Enkokilish Bench to evaluate the ability of Large Language Models (LLMs) to understand, reason, and solve Amharic riddles.
This benchmark is built using Evalite as the evals framework, AI-SDK to make calls to Vercel AI Gateway to a variety of models. This benchmark is completely free and open-source from dataset to eval code, and to this visualization site. To quickly setup, clone the repo, set your AI Gateway API Key in the
.env
file and just
pnpm eval:dev
then open
localhost:3006
and explore. You can run the evals in node mode (just by running
node main.ts
) which enables you to export the results in JSON format, or run the evals in a CI/CD pipeline.
December 07, 2025 | 00:02:42
System Prompt
You are an Amharic riddle solver. Try your best to solve every riddle you are asked. Respond with one word or phrase only!
model success fail avgScore InTok OutTok TotTok Cost Duration
gemini-2.0-flash-lite 2 220 0.01 1,171 125 1,296 $0.00012532 2,284,750 ms
gemini-2.5-flash 5 217 0.02 1,274 97 27,547 $0.0660647 2,467,312 ms
gemini-2.5-flash-lite 2 220 0.01 552 36 588 $0.0000696 2,292,801 ms
grok-4-fast-non-reasoning 3 219 0.01 39,790 1,590 41,380 $0.0039806 1,998,036 ms
nova-lite 3 219 0.01 14,639 2,041 16,680 $0.00136818 735,325 ms
Success Rates
Fail Rates
Eval Summary
Total Models: 5
Total Cost: $0.072
Total Duration: 9,778,224 ms
Total Input Tokens: 57,426
Total Output Tokens: 3,889
Total Tokens: 87,491