{"id":573557,"date":"2025-07-08T19:42:37","date_gmt":"2025-07-08T23:42:37","guid":{"rendered":"https:\/\/engineering.jhu.edu\/ece\/?post_type=news&#038;p=573557"},"modified":"2025-07-08T19:42:37","modified_gmt":"2025-07-08T23:42:37","slug":"making-ai-video-generators-smarter-about-physics","status":"publish","type":"news","link":"https:\/\/engineering.jhu.edu\/ece\/news\/making-ai-video-generators-smarter-about-physics\/","title":{"rendered":"Making AI Video Generators Smarter About Physics"},"content":{"rendered":"<div>\n<p><span>A bouncing ball that never falls, or a person who seemingly glides across a room without taking a step? These are just some of the oddities that can emerge from today\u2019s most advanced AI video generators. Johns Hopkins researchers have developed a <\/span><a href=\"https:\/\/arxiv.org\/pdf\/2505.21653\"><span>new framework called DiffPhy<\/span><\/a><span> that corrects these physics-defying glitches by bringing real-world physical laws into AI video generation.<\/span><\/p>\n<\/div>\n<div>\n<p><span>\u201cWhile recent advances in video diffusion models have made it possible to create compelling visuals from a text prompt, these models often ignore the fundamental rules of motion and interaction,\u201d said co-author Ke Zhang, a PhD student in the <\/span><a href=\"https:\/\/engineering.jhu.edu\/ece\/\"><span>Whiting School of Engineering\u2019s Department of Electrical and Computer Engineering<\/span><\/a><span>. \u201cObjects may float, shift without apparent cause, or collide together in impossible ways. DiffPhy enhances these models by grounding them in physical principles, helping them generate scenes that do not just look real, but move and behave like they actually are.\u201d<\/span><\/p>\n<\/div>\n<div>\n<p><strong>How DiffPhy Works<\/strong><\/p>\n<\/div>\n<div>\n<p><span>DiffPhy combines the strengths of large language models (LLMs) and video generation systems. Most existing video diffusion models learn physics indirectly by analyzing large amounts of video data. This approach can capture basic motion patterns, but it struggles with complex scenarios involving forces, collisions, or nuanced interactions between objects.<\/span><\/p>\n<\/div>\n<div>\n<p><span>\u201cDiffPhy takes a different approach. It uses LLMs to explicitly reason about the physical context of a given prompt,\u201d said co-author <\/span><a href=\"https:\/\/engineering.jhu.edu\/faculty\/vishal-patel\/\"><span>Vishal Patel,<\/span><\/a><span> an associate professor of electrical and computer engineering and a member of Johns Hopkins\u2019 <\/span><a href=\"https:\/\/ai.jhu.edu\/\"><span>Data Science and AI Institute<\/span><\/a><span>. \u201cFor example, if the prompt is \u2018a box falls off a table,\u2019 the LLM fills in the missing physical details\u2014like the force that caused the fall, how the box should move through the air, and what happens when it hits the ground. This enhanced, physics-aware version of the prompt is then used to guide the video generation process.\u201d<\/span><\/p>\n<\/div>\n<div>\n<p><span>To ensure that the resulting videos reflect both the meaning of the prompt and the laws of physics, DiffPhy introduces a second layer of oversight using a multimodal large language model (MLLM). <\/span><\/p>\n<\/div>\n<div>\n<p><span>\u201cThis model serves as an intelligent supervisor, evaluating whether the generated video aligns with the described physical phenomena and makes sense visually,\u201d said co-author Cihan Xiao, a PhD student in the Department of Electrical and Computer Engineering. \u201cIt checks not only if the video looks good but also if it behaves in ways that are physically plausible.\u201d <\/span><\/p>\n<\/div>\n<div>\n<p><strong>Building a Better Dataset<\/strong><\/p>\n<\/div>\n<div>\n<p><span>Training such a system requires a dataset rich in physical diversity, and most available datasets fall short. To address this, the team curated a new dataset called HQ-Phy, containing over 8,000 real-world video clips covering a broad range of physical actions and interactions. <\/span><\/p>\n<\/div>\n<div>\n<p><span>\u201cThis dataset allows the model to learn from real examples rather than relying on limited or synthetic footage, which often lacks the complexity of natural motion and force,\u201d said Patel.<\/span><\/p>\n<\/div>\n<div>\n<p><strong>Testing Physical Realism <\/strong><\/p>\n<\/div>\n<div>\n<p><span>In testing, DiffPhy outperformed state-of-the-art models on benchmarks designed to evaluate physical realism in video generation. On the VideoPhy2 and PhyGenBench datasets, which include prompts related to everyday physical scenarios, DiffPhy generated videos that more accurately captured how objects and people should move and interact, while also scoring higher on semantic accuracy and physical commonsense according to human evaluators. <\/span><\/p>\n<\/div>\n<div>\n<p><span>\u201cEven without advanced prompting strategies like chain-of-thought reasoning, DiffPhy delivered strong results, improving even further when such techniques were applied, making it especially valuable for applications in simulation, robotics, gaming, and education,\u201d said Zhang.<\/span><\/p>\n<\/div>\n<div>\n<p><span>The paper\u2019s co-authors also include Yiqun Mei and Jiacong Xu, both PhD students in the Department of Computer Science.<\/span><\/p>\n<\/div>\n","protected":false},"template":"","class_list":["post-573557","news","type-news","status-publish","hentry","news_categories-research"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Making AI Video Generators Smarter About Physics - Department of Electrical and Computer Engineering<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/engineering.jhu.edu\/ece\/news\/making-ai-video-generators-smarter-about-physics\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Making AI Video Generators Smarter About Physics - Department of Electrical and Computer Engineering\" \/>\n<meta property=\"og:description\" content=\"A bouncing ball that never falls, or a person who seemingly glides across a room without taking a step? These are just some of the oddities that can emerge from&hellip;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/engineering.jhu.edu\/ece\/news\/making-ai-video-generators-smarter-about-physics\/\" \/>\n<meta property=\"og:site_name\" content=\"Department of Electrical and Computer Engineering\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"3 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Making AI Video Generators Smarter About Physics - Department of Electrical and Computer Engineering","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/engineering.jhu.edu\/ece\/news\/making-ai-video-generators-smarter-about-physics\/","og_locale":"en_US","og_type":"article","og_title":"Making AI Video Generators Smarter About Physics - Department of Electrical and Computer Engineering","og_description":"A bouncing ball that never falls, or a person who seemingly glides across a room without taking a step? These are just some of the oddities that can emerge from&hellip;","og_url":"https:\/\/engineering.jhu.edu\/ece\/news\/making-ai-video-generators-smarter-about-physics\/","og_site_name":"Department of Electrical and Computer Engineering","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/engineering.jhu.edu\/ece\/news\/making-ai-video-generators-smarter-about-physics\/","url":"https:\/\/engineering.jhu.edu\/ece\/news\/making-ai-video-generators-smarter-about-physics\/","name":"Making AI Video Generators Smarter About Physics - Department of Electrical and Computer Engineering","isPartOf":{"@id":"https:\/\/engineering.jhu.edu\/ece\/#website"},"datePublished":"2025-07-08T23:42:37+00:00","breadcrumb":{"@id":"https:\/\/engineering.jhu.edu\/ece\/news\/making-ai-video-generators-smarter-about-physics\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/engineering.jhu.edu\/ece\/news\/making-ai-video-generators-smarter-about-physics\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/engineering.jhu.edu\/ece\/news\/making-ai-video-generators-smarter-about-physics\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/engineering.jhu.edu\/ece\/"},{"@type":"ListItem","position":2,"name":"News","item":"https:\/\/engineering.jhu.edu\/ece\/news\/"},{"@type":"ListItem","position":3,"name":"Making AI Video Generators Smarter About Physics"}]},{"@type":"WebSite","@id":"https:\/\/engineering.jhu.edu\/ece\/#website","url":"https:\/\/engineering.jhu.edu\/ece\/","name":"Department of Electrical and Computer Engineering","description":"Department of Electrical and Computer Engineering","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/engineering.jhu.edu\/ece\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Department of Electrical and Computer Engineering","distributor_original_site_url":"https:\/\/engineering.jhu.edu\/ece","push-errors":false,"_links":{"self":[{"href":"https:\/\/engineering.jhu.edu\/ece\/wp-json\/wp\/v2\/news\/573557","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/engineering.jhu.edu\/ece\/wp-json\/wp\/v2\/news"}],"about":[{"href":"https:\/\/engineering.jhu.edu\/ece\/wp-json\/wp\/v2\/types\/news"}],"wp:attachment":[{"href":"https:\/\/engineering.jhu.edu\/ece\/wp-json\/wp\/v2\/media?parent=573557"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}