{"id":53123,"date":"2025-07-08T10:04:18","date_gmt":"2025-07-08T14:04:18","guid":{"rendered":"https:\/\/engineering.jhu.edu\/ams\/?post_type=news&#038;p=53123"},"modified":"2025-09-17T13:31:01","modified_gmt":"2025-09-17T17:31:01","slug":"smarter-training-for-smarter-ai","status":"publish","type":"news","link":"https:\/\/engineering.jhu.edu\/ams\/news\/smarter-training-for-smarter-ai\/","title":{"rendered":"Smarter training for smarter AI"},"content":{"rendered":"<p style=\"font-weight: 400;\">Deep learning training could soon become faster and more reliable thanks to two new optimization methods developed by Johns Hopkins researchers. The techniques solve common problems where models learn inconsistently or perform poorly on new data. <a href=\"https:\/\/arxiv.org\/abs\/2406.04142\"><span>Stochastic Polyak Step-sizes and Momentum<\/span><\/a> (MomSPS) and <a href=\"https:\/\/arxiv.org\/abs\/2503.02225\"><span>Unified Sharpness-Aware Minimization<\/span><\/a> (USAM) make the training process more predictable while producing more robust models.<\/p>\n<p style=\"font-weight: 400;\">The researchers presented their findings in April at the 2025 <a href=\"https:\/\/openreview.net\/forum?id=8rvqpiTTFv\">International Conference on Learning Representations<\/a>.<\/p>\n<p style=\"font-weight: 400;\">\u201cSo much of deep learning today is about trial and error,\u201d said first author Dimitrios Oikonomou, a graduate student in the Whiting School of Engineering\u2019s <a href=\"https:\/\/www.cs.jhu.edu\/\">Department of Computer Science<\/a>. \u201cYou spend days or weeks tuning learning rates and momentum, just hoping something works. What if the algorithm could figure that out for you?\u201d<\/p>\n<p style=\"font-weight: 400;\">The team\u2019s first paper provides the solution through Polyak step sizes\u2014a smarter way to automatically set the size of the steps an algorithm takes during learning. <sub><span>\u00a0<\/span><\/sub>Instead of manually tuning the learning rate and momentum (which helps the model \u201cremember\u201d past updates), MomSPS uses adaptive rules that adjust these critical settings automatically as the training progresses. Getting these values wrong can completely derail the process, but with these smart updates that adjust themselves, the training becomes more reliable and predictable.<\/p>\n<p style=\"font-weight: 400;\">\u201cThese algorithms are adaptive and update step size as they progress\u2014that is, they do not require tuning. With our work, we\u2019ve shown that these types of algorithms can reach stable solutions reliably even when momentum is used in their update rules,\u201d said senior author <a href=\"https:\/\/engineering.jhu.edu\/faculty\/nicolas-loizou\/\">Nicolas Loizou,<\/a> an assistant professor in the <a href=\"https:\/\/engineering.jhu.edu\/ams\/\">Department of Applied Mathematics and Statistics<\/a>.<\/p>\n<p style=\"font-weight: 400;\">The team says that this means everything from simple spam filters to massive image-recognition systems can now be trained faster and more reliably, with far less trial-and-error during the tuning phase.<\/p>\n<p style=\"font-weight: 400;\">The team\u2019s second paper addresses another big issue in the training of deep neural networks: getting models to work well in the real world, and not just on their training data.<\/p>\n<p style=\"font-weight: 400;\">\u201cUSAM helps the model become more stable by guiding it toward solutions that aren\u2019t thrown off by small changes in the data,\u201d said Oikonomou. \u201cThis makes the model\u2019s predictions more consistent and reliable, especially when it sees new or slightly different examples.<\/p>\n<p style=\"font-weight: 400;\">The team\u2019s latest version of SAM combines two earlier versions of the method and introduces a new way to balance how much each part of the model learns during training. They show that what once seemed like two separate approaches are actually part of the same continuum. The researchers also strengthened the underlying theory, removing some unrealistic assumptions and providing more reliable evidence that the algorithm works.<\/p>\n<p style=\"font-weight: 400;\">\u201cDeep neural networks can be used in many ways, from diagnosing cancer, to recommending news articles, to making hiring decisions. If their outcomes fail, the consequences can be real,\u201d Loizou said. \u201cImproving the efficiency and robustness of the training process isn\u2019t just good for science\u2014it\u2019s good for society. And the fact that our proposed training algorithms are open-source means anyone\u2014from startups to high school students\u2014can build on them.\u201d<\/p>\n<p style=\"font-weight: 400;\">The team says that both papers pave the way for smarter training methods. While MomSPS avoids the prohibitively expensive hyperparameter tuning phase, USAM focuses on finding a model that generalizes better, they say.<\/p>\n<p style=\"font-weight: 400;\">\u201cSmarter training beats brute-force effort. With tools like MomSPS and USAM, we\u2019re not just getting to the top\u2014we\u2019re getting there with fewer missteps, more understanding, and a clearer view of what comes next,\u201d said Oikonomou.<\/p>\n","protected":false},"template":"","class_list":["post-53123","news","type-news","status-publish","hentry","news_categories-applied-mathematics","news_categories-data-science","news_categories-research"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Smarter training for smarter AI | Department of Applied Mathematics and Statistics<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/engineering.jhu.edu\/ams\/news\/smarter-training-for-smarter-ai\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Smarter training for smarter AI | Department of Applied Mathematics and Statistics\" \/>\n<meta property=\"og:description\" content=\"Deep learning training could soon become faster and more reliable thanks to two new optimization methods developed by Johns Hopkins researchers. The techniques solve common problems where models learn inconsistently&hellip;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/engineering.jhu.edu\/ams\/news\/smarter-training-for-smarter-ai\/\" \/>\n<meta property=\"og:site_name\" content=\"Department of Applied Mathematics and Statistics\" \/>\n<meta property=\"article:modified_time\" content=\"2025-09-17T17:31:01+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/engineering.jhu.edu\/ams\/wp-content\/uploads\/2025\/07\/Firefly_A-futuristic-computer-training-itself-with-glowing-lines-and-shapes-showing-it-learn-543554.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2304\" \/>\n\t<meta property=\"og:image:height\" content=\"1792\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/engineering.jhu.edu\/ams\/wp-content\/uploads\/2025\/07\/Firefly_A-futuristic-computer-training-itself-with-glowing-lines-and-shapes-showing-it-learn-543554.jpg\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"3 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Smarter training for smarter AI | Department of Applied Mathematics and Statistics","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/engineering.jhu.edu\/ams\/news\/smarter-training-for-smarter-ai\/","og_locale":"en_US","og_type":"article","og_title":"Smarter training for smarter AI | Department of Applied Mathematics and Statistics","og_description":"Deep learning training could soon become faster and more reliable thanks to two new optimization methods developed by Johns Hopkins researchers. The techniques solve common problems where models learn inconsistently&hellip;","og_url":"https:\/\/engineering.jhu.edu\/ams\/news\/smarter-training-for-smarter-ai\/","og_site_name":"Department of Applied Mathematics and Statistics","article_modified_time":"2025-09-17T17:31:01+00:00","og_image":[{"width":2304,"height":1792,"url":"https:\/\/engineering.jhu.edu\/ams\/wp-content\/uploads\/2025\/07\/Firefly_A-futuristic-computer-training-itself-with-glowing-lines-and-shapes-showing-it-learn-543554.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_image":"https:\/\/engineering.jhu.edu\/ams\/wp-content\/uploads\/2025\/07\/Firefly_A-futuristic-computer-training-itself-with-glowing-lines-and-shapes-showing-it-learn-543554.jpg","twitter_misc":{"Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/engineering.jhu.edu\/ams\/news\/smarter-training-for-smarter-ai\/","url":"https:\/\/engineering.jhu.edu\/ams\/news\/smarter-training-for-smarter-ai\/","name":"Smarter training for smarter AI | Department of Applied Mathematics and Statistics","isPartOf":{"@id":"https:\/\/engineering.jhu.edu\/ams\/#website"},"datePublished":"2025-07-08T14:04:18+00:00","dateModified":"2025-09-17T17:31:01+00:00","breadcrumb":{"@id":"https:\/\/engineering.jhu.edu\/ams\/news\/smarter-training-for-smarter-ai\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/engineering.jhu.edu\/ams\/news\/smarter-training-for-smarter-ai\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/engineering.jhu.edu\/ams\/news\/smarter-training-for-smarter-ai\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/engineering.jhu.edu\/ams\/"},{"@type":"ListItem","position":2,"name":"News","item":"https:\/\/engineering.jhu.edu\/ams\/news\/"},{"@type":"ListItem","position":3,"name":"Smarter training for smarter AI"}]},{"@type":"WebSite","@id":"https:\/\/engineering.jhu.edu\/ams\/#website","url":"https:\/\/engineering.jhu.edu\/ams\/","name":"Hopkins Applied Math & Statistics","description":"Department of Applied Mathematics and Statistics","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/engineering.jhu.edu\/ams\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Department of Applied Mathematics and Statistics","distributor_original_site_url":"https:\/\/engineering.jhu.edu\/ams","push-errors":false,"_links":{"self":[{"href":"https:\/\/engineering.jhu.edu\/ams\/wp-json\/wp\/v2\/news\/53123","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/engineering.jhu.edu\/ams\/wp-json\/wp\/v2\/news"}],"about":[{"href":"https:\/\/engineering.jhu.edu\/ams\/wp-json\/wp\/v2\/types\/news"}],"wp:attachment":[{"href":"https:\/\/engineering.jhu.edu\/ams\/wp-json\/wp\/v2\/media?parent=53123"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}