THE HIDDEN TAX ON
NIGERIAN AI

7 tokenizers. 5 languages. 3,500 data points.
Yorùbá costs up to 3.5× more to process than English.
Same meaning. Same API. Different price.

3.5×
WORST COST MULT.
Yorùbá on Mistral
17:1
WORST TOKEN RATIO
"telephone" in Yorùbá
₦11,824
HIGHEST COST / 1M
Yorùbá via Mistral
7
MODELS TESTED
OpenAI, Meta, Google, Mistral

What Nigerian developers actually pay

GPT-4o · $2.50/1M input tokens · ₦1,350/$1

English ₦3,375
Pidgin ₦3,613
Hausa ₦4,892 1.45×
Igbo ₦5,201 1.54×
Yorùbá ₦8,084 2.40×
Dashed line = English baseline ₦3,375. Everything to the right is the language tax.

How tokenizers shred Yorùbá

ENGLISH INPUT
telephone = 1 token
17×
YORÙBÁ → MISTRAL v0.3 → 17 FRAGMENTS
<0xE1> <0xBA> <0xB9> ̀ r ì b án is ̀ r ̀
"ẹ̀rọ ìbánisọ̀rọ̀" — the tokenizer doesn't recognize a single Yorùbá character.
TOP 5 WORST CASES
17×
ẹ̀rọ ìbánisọ̀rọ̀
eng: "telephone" (1 tok) → 17 tokens
Mistral
15×
pápá ọkọ̀ òfuurufú
eng: "airport" (1 tok) → 15 tokens
GPT-4
13×
ilé ẹ̀kọ́ gíga
eng: "university" (1 tok) → 13 tokens
Mistral
11×
iná mọ̀nàmọ́ná
eng: "electricity" (1 tok) → 11 tokens
Mistral
10×
mmịrị ọzụzọ
eng: "rain" (1 tok) → 10 tokens
GPT-4

Token ratio vs English — all models

1.0× = same as English. Higher = more expensive to process.

GPT-4 GPT-4o Llama 3 Gemma 2 Mistral BERT-ML XLM-R
Yorùbá 3.4× 2.4× 3.1× 2.9× 3.5× 2.4× 2.8×
Igbo 2.4× 1.5× 2.3× 2.3× 2.5× 2.2× 2.3×
Hausa 1.8× 1.5× 1.8× 1.6× 1.9× 1.5× 1.3×
Pidgin 1.1× 1.1× 1.1× 1.1× 1.1× 1.1× 1.0×
~1× fair ~2× penalty 3×+ severe
Two startups build the same chatbot.
One in London. One in Lagos.
Same API. Same product. Same users.
Lagos pays 2.4× more.
WHAT THIS MEANS
We can't fix GPT-4's tokenizer. But we can measure the problem, publish the data, and build tools that work for Nigerian languages from the ground up.

This benchmark is part of NaijaML, an open-source toolkit building ML infrastructure for Nigerian languages — diacritization, language detection, sentiment analysis, and more.
METHODOLOGY
100 entries/lang × 5 languages × 7 tokenizers = 3,500 data points
Languages: English, Yorùbá, Hausa, Igbo, Nigerian Pidgin
Categories: greetings, nouns, verbs, sentences, names, places
Matched meanings across all languages
Code + data: github.com/naijaml/naijaml