Enkokilish Bench
Enkokilish (እንቆቅልሽ) are Ethiopian/Amharic riddles. Riddles in Amharic are difficult, require in-depth understanding of the language and are often used to test one's knowledge and reasoning skills. So based on Enkokilish, we've made Enkokilish Bench to evaluate the ability of Large Language Models (LLMs) to understand, reason, and solve Amharic riddles.
This benchmark is built using Evalite as the evals framework, AI-SDK to make calls to Vercel AI Gateway to a variety of models. This benchmark is completely free and open-source from dataset to eval code, and to this visualization site. To quickly setup, clone the repo, set your AI Gateway API Key in the
.env
file and just
pnpm eval:dev
then open
localhost:3006
and explore. You can run the evals in node mode (just by running
node main.ts
) which enables you to export the results in JSON format, or run the evals in a CI/CD pipeline.
December 17, 2025 | 21:44:13
System Prompt
You are a riddle solver. Try your best to solve every riddle you are asked. Respond with one word or phrase only!
Amharic Riddle Results
English Riddle Results
model success fail avgScore InTok OutTok TotTok Cost Date
gemini-2.5-flash 40 182 0.18 9,617 739 177,250 $0.4219676 2025-12-07
gemini-3-flash 29 193 0.13 4,575 410 61,500 $0.1730625 2025-12-17
gemini-2.0-flash-lite 16 206 0.07 9,653 880 10,533 $0.00098797 2025-12-07
gemini-2.5-flash-lite 14 208 0.06 9,653 628 10,281 $0.0012165 2025-12-07
deepseek-v3.2 6 216 0.03 16,664 1,476 18,140 $0.00528584 2025-12-07
grok-4-fast-non-reasoning 3 219 0.01 54,220 2,232 56,452 $0.0057224 2025-12-07
llama-3.1-8b 2 220 0.01 21,588 3,449 25,037 $0.00550814 2025-12-07
nova-lite 2 220 0.01 14,639 2,068 16,707 $0.00137466 2025-12-07
gpt-oss-20b 1 221 0 23,790 400,899 424,689 $0.121935 2025-12-07
trinity-mini 0 222 0 14,747 154,095 168,842 $0.02377786 2025-12-07
olmo-3.1-32b-think 0 222 0 6,359 191,023 197,382 $0 2025-12-27
devstral-2512 0 222 0 512 43 555 $0 2025-12-27
glm-4.5-air 0 222 0 0 0 0 $0 2025-12-27
Amharic Riddle Results
English Riddle Results
Success Rates
Fail Rates
Cost
Total Tokens
Amharic Riddle Results
English Riddle Results
Eval Summary
Total Models: 13
Total Cost: $0.761
Total Duration: 51,437,033 ms
Total Input Tokens: 186,017
Total Output Tokens: 757,942
Total Tokens: 1,167,368