Skip to main content

Langmead, Ben

Associate Professor
Computer Science
Lab Website

Malone 329
(410) 516-2033

Jump to:


Ben Langmead Receives Sloan Fellowship

February 20, 2014

Benjamin Langmead, an assistant professor in the Department of Computer Science, has been selected as a 2014 Alfred P. Sloan Research Fellow in Computational & Molecular Biology for his work in the analysis of DNA sequencing data. The Alfred P. Sloan Foundation awards 126 two-year fellowships each year to early career researchers in recognition of […]

Read More


  • Ph.D., Johns Hopkins
  • Ph.D. 2012, University of Maryland College Park
  • Master of Science 2009, University of Maryland College Park
  • Bachelor of Arts 2003, Columbia College, Columbia University, Columbia, NY
  • 2013 - 2021:  Assistant Professor, SPH Biostatistics
  • 2009 - 2013:  Historical, SPH Biostatistics
  • 2003 - 2007:  Senior Engineer, Reservoir Labs, Inc.
Research Areas
  • Computational Genomics
  • Data-intensive computing
  • Text Indexing
  • 2018:  Professor Joel Dean Excellence in Teaching Award
  • 2018:  William H. Huggins Excellence in Teaching Award
  • 2016:  2016 Benjamin Franklin Award in the Life Sciences
  • 2014:  NSF CAREER Award
  • 2014:  Alfred P. Sloan Foundation Research Fellowship
  • "Marshaling public data for lean and powerful splicing studies", Bioinformatics Training and Education Program (BTEP) Distinguished Speakers Seminar Series.  Bethesda Maryland, United States of America (the).  January 16, 2020
  • "Fighting reference bias with plural and panel references", Personal Genome Diagnostics Seminar Series.  Baltimore Maryland, United States of America (the).  December 11, 2019
  • "Genomic sketching with HyperLogLog", Cold Spring Harbor Laboratory Meeting on Genome Informatics.  Cold Spring Harbor New York, United States of America (the).  November 7, 2019
  • "Marshaling public data for lean and powerful splicing studies", Vanderbilt Human Genetics Student Association Seminar.  Nashville Tennessee, United States of America (the).  March 7, 2019
  • "Making the most of petabases of genomic data", IBM Almaden Research Distinguished Lecture Series.  Almaden California, United States of America (the).  October 25, 2018
  • "Alignment using prior information", CMU Workshop on the Future of Algorithms in Biology.  Pittsburgh Pennsylvania, United States of America (the).  September 29, 2018
  • "Using huge public sequencing datasets to answer scientific questions", UCLA Computational Genomics Summer Institute.  Los Angeles California, United States of America (the).  July 30, 2018
  • "Tales of scale", Rocky Mountain Genomics HackCon.  Boulder Colorado, United States of America (the).  June 20, 2018
  • "Using huge public sequencing datasets to answer scientific questions", Department of Biomedical Engineering seminar series.  Portland Oregon, United States of America (the).  May 25, 2018
  • "Practical lessons from scaling read aligners to hundreds of threads", IEEE International Workshop on High Performance Computational Biology (HiCOMB).  Vancouver British Columbia, Canada.  May 21, 2018
  • "Making the most of petabases of genomic data", Open Data Science Conference (ODSC) East.  Boston Massachusetts, United States of America (the).  May 4, 2018
  • "Summarizing tens of thousands of RNA-seq datasets in the cloud", Seven Bridges Genomics Cancer Genomics Cloud course.  Boston Massachusetts, United States of America (the).  May 1, 2017
  • "Summarizing tens of thousands of RNA-seq samples: themes and lessons", Meeting on Statistical and Computational Challenges in Large Scale Molecular Biology.  Banff Alberta, Canada.  March 30, 2017
  • "Unlocking sequence data archives with scalable software and resources", Genomics Seminar.  State College Pennsylvania, United States of America (the).  March 1, 2017
  • "Reshaping core genomics software tools for the many-core era", Intel HPC Developer Conference.  Salt Lake City, UT.  November 12, 2016
  • "Scalable analysis of many sequencing datasets at once", USTAR Center for Genetic Discovery seminar series.  Salt Lake City, UT.  November 11, 2016
  • "A tandem simulation approach to predicting mapping quality", Meeting on Biological Data Science.  Cold Spring Harbor, NY.  October 28, 2016
  • "Navigating tens of thousands of RNA-seq datasets with recount, SciServer & Jupyter", Annual Symposium for the Institute of Data Intensive Engineering and Science.  JHU.  October 21, 2016
  • "Promoting open data by making it more usable for biologists", Bio-IT World Conference & Expo.  Boston, MA.  April 6, 2016
  • "Scalable analysis of many sequencing datasets at once", UCSD Bioinformatics Seminar.  University of California, San Diego, CA.  October 22, 2015
  • "Exploring tens of thousands of RNA-seq samples with Rail-RNA", Workshop on Parallel Software Libraries for Sequence Analysis.  Georgia Institute of Technology, Atlanta, GA.  September 9, 2015
  • "The DNA Data Deluge", Workshop on Future Perspectives in Computational Pan-Genomics.  Lorentz Center, Leiden, The Netherlands.  June 8, 2015
  • "Toward scalable analysis of many sequencing datasets at once", UCLA Bioinformatics Seminar.  University of California, Los Angeles.  March 30, 2015
  • "Scalable software for analyzing large collections of DNA sequencing data", CSE Colloquium.  Buffalo, NY.  November 24, 2014
  • "Scalable software for uniform analysis of many RNA-seq samples", Cold Spring Harbor Laboratory Meeting on Biological Data Science.  Cold Spring Harbor, NY.  November 17, 2014
  • "Scalable Software for Analyzing Large Collections of RNA Sequencing Data", 1st Annual Symposium for the Institute of Data Intensive Engineering and Science.  Baltimore, MD.  October 17, 2014
  • "Scalable software for analyzing many sequencing datasets at once", 3rd Annual Biomedical Informatics Symposium.  Washington DC.  October 2, 2014
  • "Scalable software for sequencing data", Baltimore Life Science Association Speaker Series.  Baltimore, MD.  September 30, 2014
  • "Scalable software for sequencing data", Quantitative and Computational Biology program seminar series.  Princeton, NJ.  September 22, 2014
  • "Designing software for statistical analysis of huge collections of sequencing data", Joint Statistical Meetings.  Boston, MA.  August 3, 2014
  • "Computational approaches for analyzing big DNA sequencing datasets", Campbell & Company seminar series.  Baltimore, MD.  July 25, 2014


Journal Articles
  • Ling JP, Wilks C, Charles R, Leavey PJ, Ghosh D, Jiang L, Santiago CP, Pang B, Venkataraman A, Clark BS, Nellore A, Langmead B, Blackshaw S (2020).  ASCOT identifies key regulators of neuronal subtype-specific splicing.  Nature Communications.  11(1).
  • Darby CA, Gaddipati R, Schatz MC, Langmead B (2020).  Vargas: heuristic-free alignment for assessing linear and graph read aligners.  Bioinformatics (Oxford, England).  36(12).  3712-3718.
  • Kuhnle A, Mun T, Boucher C, Gagie T, Langmead B, Manzini G (2020).  Efficient Construction of a Complete Index for Pan-Genomics Read Alignment.  Journal of Computational Biology.  27(4).  500-513.
  • Mun T, Kuhnle A, Boucher C, Gagie T, Langmead B, Manzini G (2020).  Matching Reads to Many Genomes with the r-Index.  Journal of Computational Biology.  27(4).  514-518.
  • Baker DN, Langmead B (2019).  Dashing: Fast and accurate genomic distances with HyperLogLog.  Genome Biology.  20(1).
  • Wood DE, Lu J, Langmead B (2019).  Improved metagenomic analysis with Kraken 2.  Genome Biology.  20(1).
  • Wulfridge P, Langmead B, Feinberg AP, Hansen KD (2019).  Analyzing whole genome bisulfite sequencing data from highly divergent genotypes.  Nucleic acids research.  47(19).  e117.
  • Darby CA, Fitch JR, Brennan PJ, Kelly BJ, Bir N, Magrini V, Leonard J, Cottrell CE, Gastier-Foster JM, Wilson RK, Mardis ER, White P, Langmead B, Schatz MC (2019).  Samovar: Single-Sample Mosaic Single-Nucleotide Variant Calling with Linked Reads.  iScience.  18.  1-10.
  • Boucher C, Gagie T, Kuhnle A, Langmead B, Manzini G, Mun T (2019).  Prefix-free parsing for building big BWTs.  Algorithms for Molecular Biology.  14(1).
  • Mangul S, Martin LS, Langmead B, Sanchez-Galan JE, Toma I, Hormozdiari F, Pevzner P, Eskin E (2019).  How bioinformatics and open data can boost basic science in countries and universities with limited resources.  Nature Biotechnology.  37(3).  324-326.
  • Langmead B, Wilks C, Antonescu V, Charles R (2019).  Scaling read aligners to hundreds of threads on general-purpose processors.  Bioinformatics.  35(3).  421-432.
  • Kuhnle A, Mun T, Boucher C, Gagie T, Langmead B, Manzini G (2019).  Efficient Construction of a Complete Index for Pan-Genomics Read Alignment.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  11467 LNBI.  158-173.
  • Madugundu AK, Na CH, Nirujogi RS, Renuse S, Kim KP, Burns KH, Wilks C, Langmead B, Ellis SE, Collado-Torres L, Halushka MK, Kim MS, Pandey A (2019).  Integrated Transcriptomic and Proteomic Analysis of Primary Human Umbilical Vein Endothelial Cells.  Proteomics.  19(15).
  • Pritt J, Chen NC, Langmead B (2018).  FORGe: Prioritizing variants for graph genomes 06 Biological Sciences 0604 Genetics.  Genome Biology.  19(1).
  • Langmead B, Nellore A (2018).  Cloud computing for genomic data analysis and collaboration.  Nature Reviews Genetics.  19(4).  208-219.
  • Wilks C, Gaddipati P, Nellore A, Langmead B (2018).  Snaptron: Querying splicing patterns across tens of thousands of RNA-seq samples.  Bioinformatics.  34(1).  114-116.
  • Marschall T, Marz M, Abeel T, Dijkstra L, Dutilh BE, Ghaffaari A, Kersey P, Kloosterman WP, Mäkinen V, Novak AM, Paten B, Porubsky D, Rivals E, Alkan C, Baaijens JA, De Bakker PIW, Boeva V, Bonnal RJP, Chiaromonte F, Chikhi R, Ciccarelli FD, Cijvat R, Datema E, Van Duijn CM, Eichler EE, Ernst C, Eskin E, Garrison E, El-Kebir M, Klau GW, Korbel JO, Lameijer EW, Langmead B, Martin M, Medvedev P, Mu JC, Neerincx P, Ouwens K, Peterlongo P, Pisanti N, Rahmann S, Raphael B, Reinert K, de Ridder D, de Ridder J, Schlesner M, Schulz-Trieglaff O, Sanders AD, Sheikhizadeh S, Shneider C, Smit S, Valenzuela D, Wang J, Wessels L, Zhang Y, Guryev V, Vandin F, Ye K, Schönhuth A (2018).  Computational pan-genomics: Status, promises and challenges.  Briefings in Bioinformatics.  19(1).  118-135.
  • Nellore A, Collado-Torres L, Jaffe AE, Alquicira-Hernández J, Wilks C, Pritt J, Morton J, Leek JT, Langmead B (2017).  Rail-RNA: scalable analysis of RNA-seq splicing and coverage.  Bioinformatics (Oxford, England).  33(24).  4033-4040.
  • Langmead B (2017).  A tandem simulation framework for predicting mapping quality.  Genome Biology.  18(1).
  • Collado-Torres L, Nellore A, Kammers K, Ellis SE, Taub MA, Hansen KD, Jaffe AE, Langmead B, Leek JT (2017).  Reproducible RNA-seq analysis using recount2.  Nature biotechnology.  Nature Publishing Group.  35.  319.
  • Collado-Torres L, Nellore A, Frazee AC, Wilks C, Love MI, Langmead B, Irizarry RA, Leek JT, Jaffe AE (2017).  Flexible expressed region analysis for RNA-seq with derfinder.  Nucleic Acids Research.  45(2).  e9.
  • Nellore A, Jaffe AE, Fortin JP, Alquicira-Hern?ndez J, Collado-Torres L, Wang S, Phillips RA, Karbhari N, Hansen KD, Langmead B, Leek JT (2016).  Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive.  Genome Biology.  17(1).
  • Darby MM, Leek JT, Langmead B, Yolken RH, Sabunciyan S (2016).  Widespread splicing of repetitive element loci into coding regions of gene transcripts.  Human Molecular Genetics.  25(22).  4962-4982.
  • Pritt J, Langmead B (2016).  Boiler: Lossy compression of RNA-seq alignments using coverage vectors.  Nucleic Acids Research.  44(16).
  • Nellore A, Wilks C, Hansen KD, Leek JT, Langmead B (2016).  Rail-dbGaP: Analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce.  Bioinformatics.  32(16).  2551-2553.
  • Frazee AC, Jaffe AE, Langmead B, Leek JT (2015).  Polyester: Simulating RNA-seq datasets with differential transcript expression.  Bioinformatics.  31(17).  2778-2784.
  • Frazee AC, Pertea G, Jaffe AE, Langmead B, Salzberg SL, Leek JT (2015).  Ballgown bridges the gap between transcriptome assembly and expression analysis.  Nature biotechnology.  Nature Publishing Group.  33.  243.
  • Wilton R, Budavari T, Langmead B, Wheelan SJ, Salzberg SL, Szalay AS (2015).  Arioc: High-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space.  PeerJ.  2015(3).
  • Kim D, Langmead B, Salzberg SL (2015).  HISAT: A fast spliced aligner with low memory requirements.  Nature Methods.  12(4).  357-360.
  • Reinert K, Langmead B, Weese D, Evers DJ (2015).  Alignment of Next-Generation Sequencing Reads.  Annual Review of Genomics and Human Genetics.  16.  133-151.
  • Hansen KD, Sabunciyan S, Langmead B, Nagy N, Curley R, Klein G, Klein E, Salamon D, Feinberg AP (2014).  Large-scale hypomethylated blocks associated with Epstein-Barr virus-induced B-cell immortalization.  Genome Research.  24(2).  177-184.
  • Song L, Florea L, Langmead B (2014).  Lighter: fast and memory-efficient sequencing error correction without counting.  Genome biology.  15(11).  509.
  • Schatz MC, Langmead B (2013).  The DNA data deluge.  IEEE Spectrum.  50(7).  28-33.
  • Hansen KD, Langmead B, Irizarry RA (2012).  BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions.  Genome Biology.  13(10).
  • Herb BR, Wolschin F, Hansen KD, Aryee MJ, Langmead B, Irizarry R, Amdam GV, Feinberg AP (2012).  Reversible switching between epigenetic states in honeybee behavioral subcastes.  Nature Neuroscience.  15(10).  1371-1373.
  • Gurtowski J, Schatz MC, Langmead B (2012).  Genotyping in the cloud with crossbow.  Current Protocols in Bioinformatics.  (SUPPL.39).
  • Langmead B, Salzberg SL (2012).  Fast gapped-read alignment with Bowtie 2.  Nature Methods.  9(4).  357-359.
  • Frazee AC, Langmead B, Leek JT (2011).  ReCount: A multi-experiment resource of analysis-ready RNA-seq gene count datasets.  BMC Bioinformatics.  12.
  • Hansen KD, Timp W, Bravo HC, Sabunciyan S, Langmead B, McDonald OG, Wen B, Wu H, Liu Y, Diep D, Briem E, Zhang K, Irizarry RA, Feinberg AP (2011).  Increased methylation variation in epigenetic domains across cancer types.  Nature Genetics.  43(8).  768-775.
  • Leek JT, Scharpf RB, Bravo HC, Simcha D, Langmead B, Johnson WE, Geman D, Baggerly K, Irizarry RA (2010).  Tackling the widespread and critical impact of batch effects in high-throughput data.  Nature Reviews Genetics.  11(10).  733-739.
  • Langmead B (2010).  Aligning short sequencing reads with Bowtie.  Current Protocols in Bioinformatics.  (SUPP.32).
  • Langmead B, Hansen KD, Leek JT (2010).  Cloud-scale RNA-sequencing differential expression analysis with Myrna..  Genome biology.  11(8).
  • Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL (2009).  Searching for SNPs with cloud computing.  Genome Biology.  10(11).
  • Langmead B, Trapnell C, Pop M, Salzberg SL (2009).  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.  Genome Biology.  10(3).
Book Chapters
  • Frazee AC, Torres LC, Jaffe AE, Langmead B, Leek JT (2014).  Measurement, Summary, and Methodological Variation in RNA-sequencing.  Statistical Analysis of Next Generation Sequencing Data.  Springer, Cham.  115--128.
Back to top