{"id":575374,"date":"2025-11-19T13:54:54","date_gmt":"2025-11-19T18:54:54","guid":{"rendered":"https:\/\/engineering.jhu.edu\/ece\/?post_type=news&#038;p=575374"},"modified":"2025-11-19T14:03:24","modified_gmt":"2025-11-19T19:03:24","slug":"ai-that-listens-like-a-human-even-in-a-crowd","status":"publish","type":"news","link":"https:\/\/engineering.jhu.edu\/ece\/news\/ai-that-listens-like-a-human-even-in-a-crowd\/","title":{"rendered":"AI That Listens Like a Human, Even in a Crowd"},"content":{"rendered":"<p style=\"font-weight: 400;\">Humans can effortlessly focus on a single voice in a noisy room, a skill that has long challenged artificial intelligence. Now, researchers at Johns Hopkins have developed an AI system that can do just that. Called <a href=\"https:\/\/arxiv.org\/abs\/2509.18606\"><span>FlexSED<\/span><\/a>, short for Flexible Sound Event Detection, the model can recognize and precisely mark when a sound occurs within an audio recording, using a plain-language description of the sound.<\/p>\n<p style=\"font-weight: 400;\">\u201cFor example, if someone types \u2018dog barking,\u2019 FlexSED can scan a long audio clip and highlight the exact moments when a bark occurs, down to the second,\u201d said co-author Jiarui Hai, a PhD student in the Department of Electrical and Computer Engineering.<\/p>\n<p style=\"font-weight: 400;\">Presented in October at the 2025 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), the system is a major leap toward open vocabulary sound understanding. What makes the system unique is that it can understand sounds through everyday language, a step beyond traditional models limited to a fixed set of labeled categories.<\/p>\n<p style=\"font-weight: 400;\">Traditional sound detection models can only identify sounds they were trained on, which limits how well they work in real-world settings. FlexSED takes a different approach: it can understand natural language descriptions of sounds, so it isn\u2019t tied to a fixed list. \u201cThe system can respond to whatever sound the user describes,\u201d said Hai. This allows FlexSED to recognize unfamiliar sounds (a process known as zero-shot learning) and quickly learn new ones with just a few examples (called few-shot learning), making it useful in settings like medical monitoring and wildlife tracking.<\/p>\n<p style=\"font-weight: 400;\">This work was recognized by the organizing committee as a notable contribution and was selected for a Spotlight oral presentation, which is reserved for a small number of papers that receive high evaluations from the reviewers.<\/p>\n<p style=\"font-weight: 400;\">To build this model, the researchers combined two pretrained systems: one that learns sound patterns from large amounts of unlabeled audio, and another that understands text descriptions such as \u201ccar horn,\u201d \u201cperson laughing,\u201d or \u201cglass shattering.\u201d An adaptive fusion strategy allows the model to integrate and fine-tune these components using a relatively small amount of labeled data, enabling open-vocabulary sound event detection without extensive task-specific retraining.<\/p>\n<p style=\"font-weight: 400;\">FlexSED performed better than traditional sound detection models on benchmark tests. \u201cEven when tested on sounds outside its training set, the model showed a strong ability to identify them,\u201d said Hai. \u201cWith just a few examples of new sounds, its accuracy rose even higher, showing that the system can quickly learn and adapt to new environments.\u201d<\/p>\n<p style=\"font-weight: 400;\">Because FlexSED can understand everyday language, it could be used in many real-world settings\u2014from spotting safety alerts in noisy workplaces to recognizing animal sounds in nature recordings. It can support audio-aware AI agents by helping them determine what occurred and when in an audio clip. Its speed and accuracy also make it a great platform to enhance assistive technologies for people with hearing loss, helping them interpret sounds in their surroundings.<\/p>\n<p style=\"font-weight: 400;\">FlexSED is open-source, with code and pretrained models <a href=\"https:\/\/github.com\/JHU-LCAP\/FlexSED\"><span>available on GitHub<\/span><\/a>.<\/p>\n<p style=\"font-weight: 400;\">Study co-authors include Charles Renn Faculty Scholar and Professor Mounya Elhilali, and PhD students Helin Wang and Weizhe Guo in the Department of Electrical and Computer Engineering.<\/p>\n","protected":false},"template":"","class_list":["post-575374","news","type-news","status-publish","hentry","news_categories-research"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>AI That Listens Like a Human, Even in a Crowd - Department of Electrical and Computer Engineering<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/engineering.jhu.edu\/ece\/news\/ai-that-listens-like-a-human-even-in-a-crowd\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"AI That Listens Like a Human, Even in a Crowd - Department of Electrical and Computer Engineering\" \/>\n<meta property=\"og:description\" content=\"Humans can effortlessly focus on a single voice in a noisy room, a skill that has long challenged artificial intelligence. Now, researchers at Johns Hopkins have developed an AI system&hellip;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/engineering.jhu.edu\/ece\/news\/ai-that-listens-like-a-human-even-in-a-crowd\/\" \/>\n<meta property=\"og:site_name\" content=\"Department of Electrical and Computer Engineering\" \/>\n<meta property=\"article:modified_time\" content=\"2025-11-19T19:03:24+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"3 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"AI That Listens Like a Human, Even in a Crowd - Department of Electrical and Computer Engineering","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/engineering.jhu.edu\/ece\/news\/ai-that-listens-like-a-human-even-in-a-crowd\/","og_locale":"en_US","og_type":"article","og_title":"AI That Listens Like a Human, Even in a Crowd - Department of Electrical and Computer Engineering","og_description":"Humans can effortlessly focus on a single voice in a noisy room, a skill that has long challenged artificial intelligence. Now, researchers at Johns Hopkins have developed an AI system&hellip;","og_url":"https:\/\/engineering.jhu.edu\/ece\/news\/ai-that-listens-like-a-human-even-in-a-crowd\/","og_site_name":"Department of Electrical and Computer Engineering","article_modified_time":"2025-11-19T19:03:24+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/engineering.jhu.edu\/ece\/news\/ai-that-listens-like-a-human-even-in-a-crowd\/","url":"https:\/\/engineering.jhu.edu\/ece\/news\/ai-that-listens-like-a-human-even-in-a-crowd\/","name":"AI That Listens Like a Human, Even in a Crowd - Department of Electrical and Computer Engineering","isPartOf":{"@id":"https:\/\/engineering.jhu.edu\/ece\/#website"},"datePublished":"2025-11-19T18:54:54+00:00","dateModified":"2025-11-19T19:03:24+00:00","breadcrumb":{"@id":"https:\/\/engineering.jhu.edu\/ece\/news\/ai-that-listens-like-a-human-even-in-a-crowd\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/engineering.jhu.edu\/ece\/news\/ai-that-listens-like-a-human-even-in-a-crowd\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/engineering.jhu.edu\/ece\/news\/ai-that-listens-like-a-human-even-in-a-crowd\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/engineering.jhu.edu\/ece\/"},{"@type":"ListItem","position":2,"name":"News","item":"https:\/\/engineering.jhu.edu\/ece\/news\/"},{"@type":"ListItem","position":3,"name":"AI That Listens Like a Human, Even in a Crowd"}]},{"@type":"WebSite","@id":"https:\/\/engineering.jhu.edu\/ece\/#website","url":"https:\/\/engineering.jhu.edu\/ece\/","name":"Department of Electrical and Computer Engineering","description":"Department of Electrical and Computer Engineering","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/engineering.jhu.edu\/ece\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Department of Electrical and Computer Engineering","distributor_original_site_url":"https:\/\/engineering.jhu.edu\/ece","push-errors":false,"_links":{"self":[{"href":"https:\/\/engineering.jhu.edu\/ece\/wp-json\/wp\/v2\/news\/575374","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/engineering.jhu.edu\/ece\/wp-json\/wp\/v2\/news"}],"about":[{"href":"https:\/\/engineering.jhu.edu\/ece\/wp-json\/wp\/v2\/types\/news"}],"wp:attachment":[{"href":"https:\/\/engineering.jhu.edu\/ece\/wp-json\/wp\/v2\/media?parent=575374"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}