Published:
Author: Salena Fitzgerald
PhD Candidates
Left to right: Yao Zhao and Lang Lang.

Two doctoral candidates are reshaping how pharmaceutical companies and clinicians access and analyze clinical study data. Lang Lang and Yao Zhao, both advised by Yanxun Xu, assistant professor in the Department of Applied Mathematics and Statistics, recently received top honors at the American Statistical Association (ASA) conference for their paper on reconstructing patient-level datasets from figures in scientific papers.  Their online platform enables researchers to analyze how specific groups of patients respond to treatmentsinsights that are often hidden when studies report only summary results.  

Lang’s first-place paper describes a tool he developed that reconstructs individual patient data (IPD) from published clinical trial results. Much of the data generated in clinical trials is shared publicly only as   statistical summary or in visual figures in journal articles, making it difficult for researchers to conduct deeper analysis. Lang says clinicians, medical researchers, and pharmaceutical scientists often need access to more detailed information, known as patient-level data, which describes how individual participants in a study responded to a treatment.  

“Sometimes the data researchers need just isn’t accessible,” Lang explains. “Our tool allows scientists to extract granular patient-level data from published papers, enabling more rigorous and powerful downstream analysis.”  

Zhao’s work, which earned an ASA honorable mention, complements Lang’s by improving how the system reconstructs data from Kaplan-Meier survival plots, the same figures commonly published in clinical trial papers that summarize how long patients survive or remain disease-free during a study. While these graphs show overall trends, they do not reveal the individual patient data behind them.  

Yao developed methods to reverse-engineer these curves and recover the underlying survival data with remarkable accuracy. “We can reconstruct individual survival data even from summary-level plots, including key censoring informationlike when a patient leaves a study early or is no longer tracked before researchers observe an outcomethat previous methods often missed,” Yao says. “This high fidelity is crucial for meta-analyses and other studies where accurate patient-level data is needed but not publicly available.” 

Their co-authored first-place paper, published on the preprint website ArXiv, describes a practical platform that is already used by researchers in academia and industry. Lang focused on the project’s final stages, ensuring the extracted data could be precisely reconstructed and used for downstream analysis, while Yao concentrated on the initial stages, developing algorithms capable of extracting and interpreting the underlying data from complex figures.  

The tool is publicly available and lets users upload figures from published studies and receive reconstructed datasets for analysis.  

Xu emphasizes the significance of their work: “Lang and Yao have bridged a gap between publicly available data and the needs of clinicians and drug developers. Their tools are not just theoretical; they are being actively used in research and industry.” 

Both tools have been accessed by over 1,000 users worldwide, from academic researchers conducting meta-analyses to pharmaceutical companies evaluating drug responses. 

The students will present their work at the 2026 ASA Biopharmaceutical Section Regulatory-Industry Statistics Workshop, scheduled for September in Rockville, Maryland, showcasing tools that combine technical innovation with real-world applications. Beyond awards and recognition, their research underscores a critical advance in clinical studies and drug design: making individual patient data accessible, transparent, and actionable for the broader scientific community.