Enkokilish Bench

December 17, 2025 | 21:44:13

Enkokilish (እንቆቅልሽ) are Ethiopian/Amharic riddles. Riddles in Amharic are difficult, require in-depth understanding of the language and are often used to test one's knowledge and reasoning skills. So based on Enkokilish, we've made Enkokilish Bench to evaluate the ability of Large Language Models (LLMs) to understand, reason, and solve Amharic riddles.

This benchmark is built using Evalite as the evals framework, AI-SDK to make calls to Vercel AI Gateway to a variety of models. This benchmark is completely free and open-source from dataset to eval code, and to this visualization site. To quickly setup, clone the repo, set your AI Gateway API Key in the

.env

file and just

pnpm eval:dev

then open

localhost:3006

and explore. You can run the evals in node mode (just by running

node main.ts

) which enables you to export the results in JSON format, or run the evals in a CI/CD pipeline.

December 17, 2025 | 21:44:13

System Prompt

You are a riddle solver. Try your best to solve every riddle you are asked. Respond with one word or phrase only!

Amharic Riddle Results

English Riddle Results

model	success	fail	avgScore	InTok	OutTok	TotTok	Cost	Date
gemini-2.5-flash	40	182	0.18	9,617	739	177,250	$0.4219676	2025-12-07
gemini-3-flash	29	193	0.13	4,575	410	61,500	$0.1730625	2025-12-17
gemini-2.0-flash-lite	16	206	0.07	9,653	880	10,533	$0.00098797	2025-12-07
gemini-2.5-flash-lite	14	208	0.06	9,653	628	10,281	$0.0012165	2025-12-07
deepseek-v3.2	6	216	0.03	16,664	1,476	18,140	$0.00528584	2025-12-07
grok-4-fast-non-reasoning	3	219	0.01	54,220	2,232	56,452	$0.0057224	2025-12-07
llama-3.1-8b	2	220	0.01	21,588	3,449	25,037	$0.00550814	2025-12-07
nova-lite	2	220	0.01	14,639	2,068	16,707	$0.00137466	2025-12-07
gpt-oss-20b	1	221	0	23,790	400,899	424,689	$0.121935	2025-12-07
trinity-mini	0	222	0	14,747	154,095	168,842	$0.02377786	2025-12-07
olmo-3.1-32b-think	0	222	0	6,359	191,023	197,382	$0	2025-12-27
devstral-2512	0	222	0	512	43	555	$0	2025-12-27
glm-4.5-air	0	222	0	0	0	0	$0	2025-12-27

Amharic Riddle Results

English Riddle Results

Success Rates

Fail Rates

Cost

Total Tokens

Amharic Riddle Results

English Riddle Results

Eval Summary

Total Models: 13

Total Cost: $0.761

Total Duration: 51,437,033 ms

Total Input Tokens: 186,017

Total Output Tokens: 757,942

Total Tokens: 1,167,368