- AI models now perform strongly in obscure languages with minimal training data
- Cross-lingual transfer allows shared patterns to boost rare language performance
- Tokenizer efficiency improvements significantly impact multilingual processing cost and quality
Large language models (LLMs) are closing the global language gap at an unexpected pace, with frontier models now performing well in rare languages that previous generations struggled with.
According to RWS’s TrainAI Multilingual LLM Synthetic Data Generation Study, Google‘s Gemini Pro achieved high-quality scores above 4.5 out of 5 in Kinyarwanda, a language spoken by about 12 million people in Rwanda, Uganda, and the DRC.
“This study signals a transformative moment that’s not about replacing human expertise, but about elevating it with the right technology,” said Vasagi Kothandapani, CEO of TrainAI by RWS.
Article continues below
How LLMs learn languages with limited training data
Unlike the Biblical “Tower of Babel,” where a sudden confusion of tongues halted construction, AI now appears to be dismantling linguistic barriers that once seemed insurmountable.
Tomáš Burkert, Head of Innovation at TrainAI, explained that AI tools often share statistical patterns across languages.
Frontier models do not need massive datasets for each language to produce reliable outputs because cross-lingual transfer allows shared knowledge to compensate for limited training data.
The RWS team also documented improvements in tokenizer efficiency, which affects how efficiently models process text in any given language.
These improvements compound with other model advancements into meaningful performance gains for rare and obscure languages.
Burkert’s team identified “benchmark drift,” where LLM capabilities can unexpectedly shift from one version to the next.
For example, the latest version of GPT fell behind smaller models on several content generation tasks, even though its predecessor had been competitive on those same tasks.
Tokenizer efficiency also varied widely between model generations, with one model proving 3.5 times more cost-effective than another in certain languages.
This means enterprises cannot rely on past performance when choosing which model to deploy for multilingual applications.
Until recently, AI labs prioritized performance in English and a handful of major languages, but now models have improved in those areas, some labs are starting to prioritize global audiences, and experts expect more labs to follow.
Successful enterprise AI strategies require continuous validation built on high-quality, culturally nuanced data rather than public leaderboards.
That said, a score of 4.5 out of 5 on a synthetic benchmark does not guarantee real-world fluency, and multilingual data are not really a focus.
According to Burkert, AI labs are only turning to multilingual data partly because labs have likely exhausted high-quality English sources.
Still, by dismantling language barriers, AI proves itself as a true “King of Babel” — not one who built a tower, but one who tore down the walls that divided human speech.
At the moment, the crown obviously does not fit perfectly, but the direction and ideas are very clear.
Follow TechRadar on Google News and add us as a preferred source to get our expert news, reviews, and opinion in your feeds.
You must be logged in to post a comment Login