Published:
Author: Salena Fitzgerald
Category:
Computer training itself

Deep learning training could soon become faster and more reliable thanks to two new optimization methods developed by Johns Hopkins researchers. The techniques solve common problems where models learn inconsistently or perform poorly on new data. Stochastic Polyak Step-sizes and Momentum (MomSPS) and Unified Sharpness-Aware Minimization (USAM) make the training process more predictable while producing more robust models.

The researchers presented their findings in April at the 2025 International Conference on Learning Representations.

“So much of deep learning today is about trial and error,” said first author Dimitrios Oikonomou, a graduate student in the Whiting School of Engineering’s Department of Computer Science. “You spend days or weeks tuning learning rates and momentum, just hoping something works. What if the algorithm could figure that out for you?”

The team’s first paper provides the solution through Polyak step sizes—a smarter way to automatically set the size of the steps an algorithm takes during learning.  Instead of manually tuning the learning rate and momentum (which helps the model “remember” past updates), MomSPS uses adaptive rules that adjust these critical settings automatically as the training progresses. Getting these values wrong can completely derail the process, but with these smart updates that adjust themselves, the training becomes more reliable and predictable.

“These algorithms are adaptive and update step size as they progress—that is, they do not require tuning. With our work, we’ve shown that these types of algorithms can reach stable solutions reliably even when momentum is used in their update rules,” said senior author Nicolas Loizou, an assistant professor in the Department of Applied Mathematics and Statistics.

The team says that this means everything from simple spam filters to massive image-recognition systems can now be trained faster and more reliably, with far less trial-and-error during the tuning phase.

The team’s second paper addresses another big issue in the training of deep neural networks: getting models to work well in the real world, and not just on their training data.

“USAM helps the model become more stable by guiding it toward solutions that aren’t thrown off by small changes in the data,” said Oikonomou. “This makes the model’s predictions more consistent and reliable, especially when it sees new or slightly different examples.

The team’s latest version of SAM combines two earlier versions of the method and introduces a new way to balance how much each part of the model learns during training. They show that what once seemed like two separate approaches are actually part of the same continuum. The researchers also strengthened the underlying theory, removing some unrealistic assumptions and providing more reliable evidence that the algorithm works.

“Deep neural networks can be used in many ways, from diagnosing cancer, to recommending news articles, to making hiring decisions. If their outcomes fail, the consequences can be real,” Loizou said. “Improving the efficiency and robustness of the training process isn’t just good for science—it’s good for society. And the fact that our proposed training algorithms are open-source means anyone—from startups to high school students—can build on them.”

The team says that both papers pave the way for smarter training methods. While MomSPS avoids the prohibitively expensive hyperparameter tuning phase, USAM focuses on finding a model that generalizes better, they say.

“Smarter training beats brute-force effort. With tools like MomSPS and USAM, we’re not just getting to the top—we’re getting there with fewer missteps, more understanding, and a clearer view of what comes next,” said Oikonomou.