Published:
Author: Salena Fitzgerald
Category:
Soufiane Hayou, Assistant Professor

Soufiane Hayou joined the Department of Applied Mathematics and Statistics, as well as the Data Science and AI Institute (DSAI), on August 1. Prior to joining Hopkins, he was a Peng Tsu Ann Assistant Professor at the National University of Singapore and a researcher at the Simons Institute for the Theory of Computing at UC Berkeley. His research explores the mathematical foundations of deep learning, with an emphasis on uncovering how large neural networks behave and scale.  

 

Tell us a little about yourself.   

I’m originally from Morocco and grew up in Khenifra, a small town in the Atlas Mountains. After high school and two years of intensive preparatory courses (classes préparatoires) for Frances elite universities, I was admitted to École Polytechnique, where I earned an engineering degree and a master’s in applied mathematics. I also completed a master’s in mathematical finance at Pierre et Marie Curie University in Paris. I then moved to the UK for a PhD in Statistics and Machine Learning at the University of Oxford. Afterward, I joined the National University of Singapore as a Peng Tsu Ann Assistant Professor of Mathematics, followed by two years at the Simons Institute for the Theory of Computing at UC Berkeley. Outside of work, I enjoy playing football (soccer), watching movies, and traveling. 

Describe your research.   

My work lies at the intersection of theory and application, where I use mathematical tools to study the behavior of large-scale neural networks and develop principled methods to improve their training and deployment. Lately, my focus has been on enhancing the efficiency of training, fine-tuning, and inference in large language models. I aim to design Pareto-optimal approaches that span the entire lifecycle of these models—from pre-training to deployment—and ultimately apply these ideas to more general AI systems. I’m continually drawn to the interplay between mathematics and artificial intelligence and plan to explore this direction throughout my career. 

What are some real-world applications of your research?  

I develop techniques to make AI systems smarter and more efficient. AI is changing how we work and live–from summarizing documents to powering complex systems that understand images, speech, and text all at once.  These models are being adopted at an unprecedented rate, and we’re only beginning to see their economic impact. To be useful, these systems typically go through two key stages: large-scale pre-training on vast datasets, followed by task-specific adaptation through post-training. My research spans both phases. On the pre-training side, I’ve worked on depth parametrization techniques like Stable ResNet and Depth Hyperparameter Transfer, which offer efficient ways to scale neural networks by increasing their depth. In the post-training phase, I developed LoRA+, an extension of the LoRA method for lightweight fine-tuning of large models. These techniques improve the adaptability of language and vision models while significantly boosting efficiency for downstream tasks.  

What drew you to this field and focus area?   

I’ve long been fascinated by studying patterns that emerge when things are pushed to extremes. This mathematical approach offers a powerful way to tackle real0world problems, particularly those dealing with uncertainty. This curiosity led me to study probability theory and high-dimensional statistics at École Polytechnique. During that time, I interned as a quantitative researcher at a major investment bank and began exploring deep learning. I quickly noticed that much of model development relied on trial and error, whereas mathematical analysis could offer more principled guidance. Viewing large neural networks as functions of random variables opens the door to rigorous study of their behavior. This realization led me to pursue a PhD in statistical deep learning, with a focus on mathematically grounded methods for training large-scale models. While statistics help address the data side, tools from applied mathematics—such as dynamical systems, stochastic processes, random matrix theory, and PDEs—are essential for understanding model dynamics at scale. 

What excites you about bringing this work to Johns Hopkins?   

I’m thrilled to be joining Johns Hopkins University, a leading interdisciplinary research institution with an outstanding reputation across multiple scientific disciplines. The newly established Data Science and AI Institute (DSAI) is an example of the university’s commitment to advancing AI research, alongside other institutes like the Mathematical Institute for Data Science and the SNF Agora Institute. Working at the intersection of theory and applications, I see DSAI as an ideal environment for fostering collaborations with colleagues throughout the Whiting School of Engineering. I’m equally excited about joining the AMS department, which hosts a dynamic community of exceptional researchers across diverse fields. Another compelling aspect is the opportunity to develop practical AI tools, particularly in health care.