Assistant Professor Berrak Sisman has been selected for a 2025–2026 Faculty Research Award from the Johns Hopkins University + Amazon Initiative for Artificial Intelligence (AI2AI). Her project, “Multimodal Speech Synthesis: Leveraging Computer Vision and LLMs for Expressive Voice Generation,” is one of eight research efforts across the Whiting School of Engineering chosen for this year’s award cycle.
The project explores new research directions in speech synthesis using multimodal machine learning. By integrating information from multiple modalities—including text, images, and gestures—the work aims to improve the naturalness and expressiveness of synthesized speech. The project will investigate methods that combine computer vision and large language models (LLMs) to enhance the synthesis process and support more dynamic voice generation.
In addition to developing new multimodal techniques, the research will introduce novel evaluation approaches for assessing synthesized speech. These methods will focus on measuring expressiveness, intelligibility, and overall performance, providing clearer benchmarks for future advancements in speech synthesis systems.
The AI2AI Faculty Research Awards support innovative work across Johns Hopkins at the intersection of artificial intelligence and real-world applications.