Fluent, Fast, and Fair

Published: May 19, 2025

Author: Jaimie Patterson

Category:

Research

A wall of cubes, showing different country flags. — “Many researchers are claiming that their models can support multiple languages, but in reality, these models only have satisfactory performance in big languages like English and lack practical performance in smaller languages like Icelandic,” — Haoran Xu, Engr ’24 (PhD), lead author

The large language models that power today’s AI applications can now translate different languages with high accuracy—but only if they have access to a translation model for each specific language pair, such as English-French or Spanish-Russian. This leads to huge, bloated models requiring computing power that the average person simply doesn’t have, especially if they want to translate into or out of lower-resource languages.

Aiming to help democratize machine translation, Johns Hopkins computer scientists have built a new, more efficient LLM capable of translating to and from 50 diverse languages while still maintaining top-tier performance. Joined by collaborators at Microsoft Research, the researchers presented their work as a spotlighted poster—a distinction offered to only the top 5% of submissions—at the 13th International Conference on Learning Representations held last month in Singapore.

“Many researchers are claiming that their models can support multiple languages, but in reality, these models only have satisfactory performance in big languages like English and lack practical performance in smaller languages like Icelandic,” explains lead author Haoran Xu, Engr ’24 (PhD), now a senior researcher at Microsoft Generation AI.

That’s partly due to the “curse of multilinguality”—the tendency for translation quality to drop as more languages are added to a model—which pushes researchers to prioritize high-resource languages like English.

To level the playing field, Xu and his team—including his then-advisors Philipp Koehn, a professor of computer science, and Kenton Murray, a research scientist at the Human Language Technology Center of Excellence—used a “plug-and-play” language-specific module architecture that prevents language conflicts during training, ensuring that strong performance in one group of languages doesn’t come at the expense of another. This architecture also allows the LLM to load only the translation module it needs for a specific translation, vastly reducing the computational power it requires to perform translations.

They started with Meta’s Llama model and, using a carefully designed training regimen, gradually added translation data and data in other languages to make the model multilingual. In the final training stage, they used a new type of reinforcement learning technique called “adaptive-rejection preference optimization,” where the model is given multiple translations of the same sentence and told which one is best so it can learn to generate strong translations.

“Our new method builds off of our prior work with many of the same efficiency benefits, but with the added ability to handle multiple languages at once,” Xu explains. This work, “Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation”, is displayed in arxiv.

The key is not penalizing translations that aren’t the best for a particular task—just knowing when to choose them.

“When there are multiple good translations, we don’t want to penalize the ones we’re not looking at right now,” Xu says. “For instance, telling a model to do better at an English-to-Spanish translation can make it think it shouldn’t care about English-to-German. We don’t want the model to over-reject and penalize otherwise good translations—which happens all over AI research, really.”

The researcher’s new learning method helps prevent models from hyper-focusing on one task at the expense of others. By limiting unnecessary stylistic variations in translations, it makes the model less likely to reject perfectly good translations just because they are not the absolute best.

The team tested its new language model against open-source, state-of-the-art multilingual translation models and found that it outperformed them in all 50 supported languages.

“This means that we can build tools that meet people in their native languages and make the internet more accessible to more people,” Xu says.

He also adds that the team’s new training method can be applied to other kinds of AI tasks.

“It doesn’t just have to be 50 languages,” he says. “Any AI model that can learn to ‘cheat’ to get good at one task by rejecting a ton of good options for other tasks could be improved with our method.”

Additional authors of this work include Microsoft researchers Hieu Hoang, Akiko Eriguchi, and Huda Khayrallah.

Stay Connected

Address

Site Menu

Share Options

Site Menu