Across society—from labs to legislatures, boardrooms to classrooms—artificial intelligence is transforming how knowledge is produced and put to work. Like the internet, or even electricity before it, the technology is fast becoming a kind of infrastructure, deeply embedded in the tools scholars rely on every day. And no field is immune to its influence.
“AI is causing a revolution in how we do science,” says Mark Dredze, the John C. Malone Professor of Computer Science at the Whiting School of Engineering. “And it’s happening across the board. This isn’t limited to engineering or biology, chemistry or sociology. It’s everywhere.”

Universities are facing a critical question: How can they harness AI responsibly to advance knowledge across every field? Doing so will demand more than technical know-how. It will require leaders who can transcend disciplinary boundaries—researchers who can connect computer scientists with clinicians, engineers with humanists, policymakers with data experts—turning technological advances into collaborative research programs that solve real-world problems.
At Johns Hopkins, that’s a responsibility that’s fallen to Dredze.
Last October, after an extensive international search, Dredze was appointed director of the Data Science and AI Institute, a university-wide initiative housed in the School of Engineering. The institute, which was launched in 2023 and brings together more than 150 faculty members across disciplines, aims to coordinate AI research across Johns Hopkins divisions and to accelerate interdisciplinary partnerships at a moment when the technology is advancing at breathtaking pace.
“I don’t say this lightly,” says Dredze, who is a specialist in natural language processing and public-interest AI. “These tools have reached a point where they can fundamentally change the way research is done.”
Dredze’s new role is to guide the university’s efforts at the foreground of the AI revolution. But long before AI dominated the headlines, Dredze was helping lay the foundations for that revolution, integrating computational methods with public health and medicine, showing how big data can shape critical decisions.
He’s harnessed social media platforms to track disease and developed tools to analyze online misinformation. He’s pioneered techniques for identifying racial bias in the way doctors interpret what their patients tell them and used AI to track online tobacco discourse, arming public health leaders with real-time intelligence. He’s also advised congressional staff on AI. And although Dredze is a computer scientist by training, his breakthroughs are only possible by way of an unusually deep engagement with a multitude of domains.
What sets Mark apart,” says Nanyun (Violet) Peng, Engr ’17 (PhD), Dredze’s former graduate student and now associate professor of computer science at the University of California, Los Angeles, “is his curiosity about the human processes that create data: how a journalist plans a story or how a patient describes a symptom. Because Mark combines a domain curiosity with deep technical insight into natural language processing and machine learning, he can ask questions that others miss.”

Room To Explore
It’s an instinct that took root early.
Between earning dual bachelor’s degrees in computer science and engineering at Northwestern University and completing his PhD in computer science at the University of Pennsylvania, Dredze took an unlikely sidetrack: a master’s degree in Jewish studies, where he focused on textual analysis and historical interpretation—an early glimpse of the boundary-defying scholar he would later become.
“It was always meant to be a detour,” Dredze says. And while he remained set on computer science, immersing himself in a humanistic discipline sharpened his interest in language and the ways meaning is constructed—questions that would later animate his work in natural language processing.
During his doctoral studies, Dredze was torn between going into industry or academia. He took roles with IBM, Microsoft, and Google. But then he realized a university career would allow him more freedom to follow his diverse interests.
“One of the things I really love about an academic career is that there’s space to get interested in different things and see where it takes you, without being too worried about the destination,” he says. “You can look at a problem and think, ‘This is interesting,’ and maybe it takes you somewhere completely different than you expected. Maybe it’s a dead end. Maybe it circles back. But the university gives you room to explore.”
“I don’t say this lightly. These tools have reached a point where they can fundamentally change the way research is done.”— Mark Dredze
Arriving at the Whiting School of Engineering in 2009 as an assistant research professor, Dredze had never worked on anything health-related. His training was in speech and language processing, building systems to analyze text and model how people write and speak. But within two years, he’d pioneered an entirely new subfield of public health: social media health informatics.
So how does a scholar trained to parse language start tracking disease?

Decoding Social Media
While Dredze continued his work teaching AI systems to interpret spoken language when he began at Johns Hopkins, he was also scanning the horizon for larger questions. Rather than rushing into a new specialty, he chose to spend his first years simply listening.
“I said, ‘Okay, well, this is great—but where are the other interesting, smart people at Hopkins? And everyone said, ‘Well, you really should go to East Baltimore and talk to people in public health and medicine,’” Dredze recalls. “And so that’s what I did.”
He discovered that epidemiologists and clinicians at the university’s medical campus and the Bloomberg School of Public Health were grappling with a persistent challenge: how to track infectious diseases such as influenza quickly and accurately. Traditional surveillance systems depended on hospital reports and laboratory testing. They were rigorous—but they were also slow, with outbreaks often outpacing the official data.
At the same time, Twitter was taking off, with millions suddenly narrating their daily routines in real time. Alongside people’s posts about their breakfast or their travel plans were updates about their fevers, sore throats, and aches and pains. Dredze, who had already been studying the social media platform as a source of large-scale language data, wondered whether that volume of data could be made useful for public health officials.
It looked promising at first sight. But turning those posts—like “I gots da flu” or “sick with this this flu it’s taking over my body ughhhhh”—into public health insights required more than keyword searches. In a project that came to be known as You Are What You Tweet, Dredze and his students developed models capable of distinguishing genuine reports of infection from expressions of worry or general discussion.
A sudden rise in tweets mentioning “flu” could signal an uptick in cases, Dredze explains, or just heightened anxiety following a news story. “Spotting that distinction came from Mark’s ability to think like an epidemiologist while solving problems like a natural language-processing researcher,” says Peng. “He realized that to track the disease, we first had to decode the way people talk about the disease.”
The work began modestly. But it quickly gathered momentum. NPR and The Washington Post covered the research, with subsequent research teams citing it heavily. The Centers for Disease Control and Prevention began taking notice. Colleagues in medicine and public health started reaching out about using social media platforms to study vaccine hesitancy, mental health, and emerging outbreaks. Within a few years, a new field had firmly taken root, with Dredze having helped establish what is now known as social media health informatics. But he would soon confront a more sinister dimension of the data he’d learned to decode.
Sensing ‘The Pulse’
By 2018, a different kind of outbreak was commanding attention.
Intelligence agencies had concluded that foreign actors had infiltrated American social media, with Twitter disclosing 10 million tweets that had been posted by Russia’s Internet Research Agency.
Dredze, who at that point had been collecting and archiving Twitter data to study public health for years, quickly understood the stakes. He was in an ideal position to ask what role health might be playing in these ominous campaigns.
He and his collaborators sifted through the posts and began to decode the agenda. Their expectation was that Russian-linked accounts would be pushing anti-vaccine propaganda—but they soon realized that the data was telling a more complicated story. Some accounts did criticize vaccination. Unexpectedly, though, some accounts seemed to promote it. Instead of pushing a single agenda, the bots seemed to be amplifying the most inflammatory voices on both sides.

“The trolls seemed to be using vaccination as a wedge issue, promoting discord in American society,” Dredze says.
Thanks to his previous work, Dredze was able to move quickly. “When someone identified this information need, we were in a position to fulfill it,” Dredze says, about a project that ended up informing journalists, lawmakers, and public health officials about the nature of an unfolding threat.
“Mark possesses a unique radar for emerging data sources,” says David Broniatowski, a professor of engineering management and systems engineering at The George Washington University, who collaborated on the project. “By the time the rest of the field caught up to the problem, Mark had already built the infrastructure to answer it.”
The project was timely, but it was also prescient. Since 2018, the misinformation ecosystem has grown more complex, and distrust in science runs deeper. Dredze was one of the first researchers to show how digital language could shape public health in ways few had previously anticipated. “Mark can sense the ‘pulse’ of a field, identifying critical problems before they hit the mainstream radar,” says Broniatowski. And then Dredze began considering other ways written language can shape human health. If digital language could influence our health through our smartphones, he wondered, what might it be doing inside the clinic?
Becoming Bilingual
By the late 2010s, electronic health records had become near universal across major American health systems. Inside all those histories, assessments, and discharge summaries lay patterns—trends that had the potential to shed light on treatment decisions and even uncover unexpected therapeutic benefits of drugs. The challenge was in finding a way to sift through the overwhelming volume of data and sniff out what was useful.
“Medical decisions today, big and small, are all driven by evidence,” Dredze says. “We need data to make these kinds of decisions. We’ve been collecting data in the medical system for a long time that may contain the answers to many questions. But it’s really hard to analyze because the data is so large. It’s at scale. It’s diverse.”
Mary Catherine Beach, BSPH ’99 (MPH), professor of medicine at the Johns Hopkins University School of Medicine, had been thinking about one potentially concerning facet of that vast archive: the way physicians write about their patients. Some details in a medical record are necessarily difficult—a history of substance use, say, or missed appointments. But other phrases, she says, carry judgments that may not be clinically essential and could end up informing the care a patient receives.
“We can bring our perspective, he can bring his perspective, and together we can create something that neither one of us could have done without the other.” — Mary Catherine Beach
“We were looking at what we now call stigmatizing language in patient records,” explains Beach, who has been collaborating with Dredze and other researchers on a long-term project to use clinical records to study bias since around 2020. “Sometimes there might be details that are important to record. But then there’s stuff that is somewhat gratuitous, that clinicians will write, maybe out of frustration. That comes across to the people reading it.”
Much of that language operates indirectly. A physician may not write that they doubt a patient’s pain. Instead, they might say the patient “claims” to have “10 out of 10 pain,” or describe someone as a “poor historian.” It’s a subtle signal—but to clinicians it’s as clear as day.
Beach and Dredze began discussing whether natural language processing could detect these patterns—which meant Dredze had to learn to speak doctor.
Computer scientists and physicians often don’t speak the same language, and bridging the gap required practice. “I’m not a medical doctor,” says Dredze, who compares the process—which gave him the fluency to identify where his tools can meet clinical needs and when to defer to the domain experts— to learning French. “Becoming bilingual in disciplines is the same as becoming bilingual in language—you just have to practice,” he says. “If you want to learn how to speak French, you need to take some French lessons, but also, you just need to talk to people in French a lot, all the time.”
The collaboration, which led to a Johns Hopkins Discovery Award, given annually to interdisciplinary teams across the university that are poised to arrive at important breakthroughs, resulted in models capable of identifying several categories of stigmatizing sentiment in clinical notes.
Their findings were sobering. Notes about Black and Hispanic patients, they discovered, contained more language undermining credibility and less language affirming it, compared to notes about white and Asian patients.
When clinicians discount a patient’s account, diagnoses can be delayed and treatments misdirected. “A lot of patient safety events or medical errors are caused by not taking seriously or listening to what somebody is telling you,” says Beach. And because medical records are read by multiple providers, the tone of one note can shape the assumptions of the next. “That is infectious in the note,” she says.
Beach describes the partnership as a model of interdisciplinary investigation. “It was a perfect example of why interdisciplinary collaboration works,” she says. “We can bring our perspective, he can bring his perspective, and together we can create something that neither one of us could have done without the other.”
Over the years, Dredze’s refusal to stay in his lane has led him to examine gun violence prevention, suicide risk assessment, geriatric syndromes in older adults, and drug use monitoring in online forums. He helped build Tobacco Watcher, a platform that has delivered actional intelligence to tobacco control researchers for more than a decade. He has published more than 350 papers across public health, medicine, computer science, and linguistics.
But Dredze’s ultimate goal, amid all this prolific output, might seem surprising: to make himself unnecessary. He wants to design AI tools to replace himself.
“Putting myself out of business would be lovely,” he says, “because the number of people who want to do these types of studies far exceeds the time I will ever have. If I can eliminate myself from that pipeline, that would remove a huge bottleneck in medical research.”
But Dredze won’t be putting himself out of a job any time soon. As the new director of the Data Science and AI institute, his impact across the university is now greater than ever—an appointment he was, as he puts it, “surprised and incredibly honored” to receive.
As to what comes next, that’s now much larger than any one project. As AI transforms the way science is conducted in every domain, Dredze’s task is to guide that revolution thoughtfully while bringing as many scholars to the table as possible—while ensuring it remains anchored in real-world problems.
“Building those bridges and making those connections,” he says, “is going to be the most impactful thing we can do.”
