| model | success | fail | avgScore | InTok | OutTok | TotTok | Cost | Date |
|---|---|---|---|---|---|---|---|---|
| gemini-2.5-flash | 40 | 182 | 0.18 | 9,617 | 739 | 177,250 | $0.4219676 | 2025-12-07 |
| gemini-3-flash | 29 | 193 | 0.13 | 4,575 | 410 | 61,500 | $0.1730625 | 2025-12-17 |
| gemini-2.0-flash-lite | 16 | 206 | 0.07 | 9,653 | 880 | 10,533 | $0.00098797 | 2025-12-07 |
| gemini-2.5-flash-lite | 14 | 208 | 0.06 | 9,653 | 628 | 10,281 | $0.0012165 | 2025-12-07 |
| deepseek-v3.2 | 6 | 216 | 0.03 | 16,664 | 1,476 | 18,140 | $0.00528584 | 2025-12-07 |
| grok-4-fast-non-reasoning | 3 | 219 | 0.01 | 54,220 | 2,232 | 56,452 | $0.0057224 | 2025-12-07 |
| llama-3.1-8b | 2 | 220 | 0.01 | 21,588 | 3,449 | 25,037 | $0.00550814 | 2025-12-07 |
| nova-lite | 2 | 220 | 0.01 | 14,639 | 2,068 | 16,707 | $0.00137466 | 2025-12-07 |
| gpt-oss-20b | 1 | 221 | 0 | 23,790 | 400,899 | 424,689 | $0.121935 | 2025-12-07 |
| trinity-mini | 0 | 222 | 0 | 14,747 | 154,095 | 168,842 | $0.02377786 | 2025-12-07 |
| olmo-3.1-32b-think | 0 | 222 | 0 | 6,359 | 191,023 | 197,382 | $0 | 2025-12-27 |
| devstral-2512 | 0 | 222 | 0 | 512 | 43 | 555 | $0 | 2025-12-27 |
| glm-4.5-air | 0 | 222 | 0 | 0 | 0 | 0 | $0 | 2025-12-27 |