Schatz, Michael

Associate Professor
Computer Science

Malone Hall 323
(410) 516-5765

Jump to:


  • Ph.D. 2010, University of Maryland College Park
  • Master of Science 2008, University of Maryland College Park
  • Bachelor of Science 2000, Carnegie Mellon University
  • 2016 - Present:  Joint, SOM Oncology Center
  • 2016 - Present:  Adjunct Associate Professor, Department of Oncology, The Johns Hopkins University
  • 2016 - Present:  Associate Professor, Department of Biology, The Johns Hopkins University
  • 2016 - Present:  Adjunct Associate Professor, Cold Spring Harbor Laboratory
Research Areas
  • CLOUD computing
  • Cancer Biology
  • Computational Biology
  • Genome Assembly and Validation
  • Genomics
  • High Performance and Multicore Computing
  • Sequence Alignment
  • Transcriptional Dynamics
  • Variant Analysis
  • 2017:  President's Award to Distinguished Doctoral Students
  • 2015:  Research Fellowship in Computational and Evolutionary Molecular Biology
  • 2014:  CAREER Award for Algorithms for Single Molecule Sequence Analysis
  • 2012:  Winship Herr Award for Excellence in Teaching
  • 2011:  Winship Herr Award for Excellence in Teaching
  • "In pursuit of perfect genome sequencing".  Baltimore, MD.  December 7, 2017
  • "In pursuit of perfect genome sequencing", Joint Institute for Metrology in Biology.  Palo Alto California, United States of America (the).  May 22, 2017
  • "Personalized Phased Diploid Genomes of the EN-Tex Samples.", Advances in Genome Biology and Technology.  Hollywood Florida, United States of America (the).  February 15, 2017
  • "Heterozygosity, Phased Genomes, and Personalized-omics".  College Park Maryland, United States of America (the).  January 12, 2017
  • "Scikit-ribo reveals precise codon-level translational control by dissecting ribosome pausing and codon elongation", Biological Data Science.  Cold Spring Harbor Laboratory.  October 28, 2016
  • "Accurate and fast detection of complex and nested structural variations using long read technologies", Biological Data Science.  Cold Spring Harbor Laboratory.  October 28, 2016
  • "Complex structural variations and oncogene amplifications revealed by single cell and single molecule sequencing", Cancer Research Seminar.  Sidney Kimmel Comprehensive Cancer Center at Johns Hopkins.  September 21, 2016
  • "Fast and accurate detection of structural variations using SMRT-sequencing", PacBio Users Meeting.  Gaithersburg, MD.  September 14, 2016
  • "The resurgence of reference quality genomes".  Atlanta, FA.  August 24, 2016
  • "Recurrent noncoding regulatory mutations in pancreatic ductal adenocarcinoma", Biology of Genomes.  Cold Spring Harbor Laboratory.  May 12, 2016
  • "SplitThreader: A graphical algorithm for analysis of highly rearranged and amplified cancer genomes", Advances in Genome Biology and Technology.  Orlando, FL.  February 10, 2016
  • "The Resurgence of Reference Quality Genomes", Plant and Animal Genomes Conference.  San Diego, CA.  January 12, 2016


Journal Articles
  • Fang, H, Huang, YF, Radhakrishnan, A, Siepel, A, Lyon, G, Schatz M (2018).  Scikit-ribo: Accurate estimation and robust modeling of translation dynamics at codon resolution..  Cell Systems.
  • Hulse-Kemp AM, Maheshwari S, Stoffel K, Hill TA, Jaffe D, Williams SR, Weisenfeld N, Ramakrishnan S, Kumar V, Shah P, others (2018).  Reference quality assembly of the 3.5-Gb genome of Capsicum annuum from a single linked-read library.  Horticulture Research.  Nature Publishing Group.  5.  4.
  • Pal D, Pertot A, Shirole NH, Yao Z, Anaparthy N, Garvin T, Cox H, Chang K, Rollins F, Kendall J, others (2017).  TGF-$beta$ reduces DNA ds-break repair mechanisms to heighten genetic diversity and adaptability of CD44+/CD24- cancer cells.  Elife.  eLife Sciences Publications, Ltd.  6.
  • Heath LS, Bravo HC, Caccamo M, Schatz M (2017).  Bioinformatics of DNA [Scanning the Issue].  Proceedings of the IEEE.  IEEE.  105.  419--421.
  • Schatz MC (2017).  Nanopore sequencing meets epigenetics.  Nature methods.  Nature Publishing Group.  14.  347.
  • Vurture GW, Sedlazeck FJ, Nattestad M, Underwood CJ, Fang H, Gurtowski J, Schatz MC (2017).  GenomeScope: fast reference-free genome profiling from short reads.  Bioinformatics.  Oxford University Press.  33.  2202--2204.
  • Luo R, Sedlazeck FJ, Darby CA, Kelly SM, Schatz MC (2017).  LRSim: A Linked-Reads Simulator generating insights for better genome partitioning.  Computational and Structural Biotechnology Journal.  Elsevier.  15.  478--484.
  • Luo R, Schatz MC, Salzberg SL (2017).  16GT: a fast and sensitive variant caller using a 16-genotype probabilistic model.  GigaScience.
  • Feigin ME, Garvin T, Bailey P, Waddell N, Chang DK, Kelley DR, Shuai S, Gallinger S, McPherson JD, Grimmond SM, others (2017).  Recurrent noncoding regulatory mutations in pancreatic ductal adenocarcinoma.  Nature genetics.  Nature Publishing Group.  49.  825.
  • Miller JR, Zhou P, Mudge J, Gurtowski J, Lee H, Ramaraj T, Walenz BP, Liu J, Stupar RM, Denny R, others (2017).  Hybrid assembly with long and short reads improves discovery of gene family expansions.  BMC genomics.  BioMed Central.  18.  541.
  • Vembar SS, Seetin M, Lambert C, Nattestad M, Schatz M, Baybayan P, Scherf A, Smith ML (2016).  Complete telomere-to-telomere de novo assembly of the Plasmodium falciparum genome through long-read (> 11 kb), single molecule, real-time sequencing.  DNA Research.  23(4).  339-351.
  • Rosenfeld JA, Reeves D, Brugler MR, Narechania A, Simon S, Durrett R, Foox J, Shianna K, Schatz M, Gandara J, others (2016).  Genome assembly and geospatial phylogenomics of the bed bug Cimex lectularius.  Nature communications.  7.
  • Nattestad M, Schatz M (2016).  Assemblytics: a web analytics tool for the detection of variants from an assembly.  Bioinformatics.  32(19).  3021-3023.
  • Lemmon ZH, Park SJ, Jiang K, Van Eck J, Schatz M, Lippman ZB (2016).  The evolution of inflorescence diversity in the nightshades and heterochrony during meristem maturation.  Genome Research.  26(12).  1676-1686.
  • Goodwin S, Gurtowski J, Ethe-Sayers S, Deshpande P, Schatz M, McCombie WR (2015).  Oxford Nanopore sequencing and de novo assembly of a eukaryotic genome.  BioRxiv.  013490.
  • Church DM, Schneider VA, Steinberg KM, Schatz M, Quinlan AR, Chin C, Kitts PA, Aken B, Marth GT, Hoffman MM, others (2015).  Extending reference assembly models.  Genome Biol.  16.  13.
  • Garvin T, Aboukhalil R, Kendall J, Baslan T, Atwal GS, Hicks J, Wigler M, Schatz M (2015).  Interactive analysis and assessment of single-cell copy-number variations.  Nature methods.  12(11).  1058-1060.
  • Schatz M (2015).  Biological data sciences in genome research.  Genome research.  25(10).  1417-1422.
  • Zhou X, Battistoni G, El Demerdash O, Gurtowski J, Wunderer J, Falciatori I, Ladurner P, Schatz M, Hannon GJ, Wasik KA (2015).  Dual functions of Macpiwi1 in transposon silencing and stem cell maintenance in the flatworm Macrostomum lignano.  RNA.  21(11).  1885-1897.
  • Wences AH, Schatz M (2015).  Metassembler: Merging and optimizing de novo genome assemblies.  Genome biology.  16(1).  1-10.
  • MacColl E, Therkelsen MD, Sherpa T, Ellerbrock H, Johnston LA, Jariwala RH, Chang W, Gurtowski J, Schatz M, Hossain MM, others (2015).  Molecular genetic diversity and characterization of conjugation genes in the fish parasite Ichthyophthirius multifiliis.  Molecular phylogenetics and evolution.  86.  1-7.
  • Smolka M, Rescheneder P, Schatz M, von Haeseler A, Sedlazeck FJ (2015).  Teaser: Individualized benchmarking and optimization of read mapping results for NGS data.  Genome biology.  16(1).  1-10.
  • Ming R, VanBuren R, Wai CM, Tang H, Schatz M, Bowers JE, Lyons E, Wang M, Chen J, Biggers E, others (2015).  The pineapple genome and the evolution of CAM photosynthesis.  Nature genetics.  47(12).  1435-1442.
  • Narzisi G, Schatz M (2015).  The challenge of small-scale repeats for indel discovery.  Frontiers in bioengineering and biotechnology.  3.
  • Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, Iyer R, Schatz M, Sinha S, Robinson GE (2015).  Big data: astronomical or genomical?.  PLoS Biol.  13(7).  e1002195.
  • Schatz M (2015).  The next 20 years of genome research.  bioRxiv.  020289.
  • Garvin T, Aboukhalil R, Kendall J, Baslan T, Atwal GS, Hicks J, Wigler M, Schatz M (2014).  Interactive analysis and quality assessment of single-cell copy-number variations.  BioRxiv.  011346.
  • Marcus S, Lee H, Schatz M (2014).  SplitMEM: Graphical pan-genome analysis with suffix skips.  bioRxiv.  003954.
  • Fang H, Wu Y, Narzisi G, O’Rawe JA, Barron L, Rosenbaum J, Ronemus M, Iossifov I, Schatz M, Lyon GJ (2014).  Reducing INDEL calling errors in whole genome and exome sequencing data.  Genome Med.  6(10).  89.
  • Lee H, Gurtowski J, Yoo S, Marcus S, McCombie WR, Schatz M (2014).  Error correction and assembly complexity of single molecule sequencing reads..  BioRxiv.  006395.
  • Narzisi G, Rawe JA, Iossifov I, Fang H, Lee Y, Wang Z, Wu Y, Lyon GJ, Wigler M, Schatz M (2014).  Accurate detection of de novo and transmitted INDELs within exome-capture data using micro-assembly.  bioRxiv.  001370.
  • Schatz M, Maron LG, Stein JC, Wences AH, Gurtowski J, Biggers E, Lee H, Kramer M, Antonio E, Ghiban E, others (2014).  New whole genome de novo assemblies of three divergent strains of rice (O. sativa) documents novel gene space of aus and indica.  bioRxiv.  003764.
  • Schatz M, Maron LG, Stein JC, Wences AH, Gurtowski J, Biggers E, Lee H, Kramer M, Antoniou E, Ghiban E, others (2014).  Whole genome de novo assemblies of three divergent strains of rice, Oryza sativa, document novel gene space of aus and indica.  Genome biology.  15(11).  1-16.
  • Titmus MA, Gurtowski J, Schatz M (2014).  Answering the demands of digital genomics.  Concurrency and Computation: Practice and Experience.  26(4).  917-928.
  • Ming R, VanBuren R, Liu Y, Yang M, Han Y, Li L, Zhang Q, Kim M, Schatz M, Campbell M, others (2013).  Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.).  Genome Biology.  14(5).  1-11.
  • Doolittle WF, Fraser P, Gerstein MB, Graveley BR, Henikoff S, Huttenhower C, Oshlack A, Ponting CP, Rinn J, Schatz M, others (2013).  Sixty years of genome biology.  Genome Biol.  14(4).  113-119.
  • Roberts RJ, Carneiro MO, Schatz M (2013).  The advantages of SMRT sequencing.  Genome Biol.  14(6).  405.
  • Schatz M, Taylor J, Schelhorn S (2013).  The DNA60IFX contest.  Genome biology.  14(6).  124.
  • Schatz M, Langmead B (2013).  The DNA data deluge.  Spectrum, IEEE.  50(7).  28-33.
  • Schatz M, Phillippy AM, Sommer DD, Delcher AL, Puiu D, Narzisi G, Salzberg SL, Pop M (2013).  Hawkeye and AMOS: visualizing and assessing the quality of genome assemblies.  Briefings in bioinformatics.  14(2).  213-224.
  • Saw JH, Schatz M, Brown MV, Kunkel DD, Foster JS, Shick H, Christensen S, Hou S, Wan X, Donachie SP (2013).  Cultivation and complete genome sequencing of Gloeobacter kilaueensis sp. nov., from a lava cave in Kilauea Caldera, Hawai'i.  PloS one.  8(10).  e76376.
  • Park SJ, Jiang K, Schatz M, Lippman ZB (2012).  Rate of meristem maturation determines inflorescence architecture in tomato.  Proceedings of the National Academy of Sciences.  109(2).  639-644.
  • Schatz M, Witkowski J, McCombie WR, others (2012).  Current challenges in de novo plant genome sequencing and assembly.  Genome Biol.  13(4).  243.
  • Schatz M, Delcher AL, Roberts M, Marccais G, Pop M, Yorke JA (2012).  GAGE: A critical evaluation of genome assemblies and assembly algorithms Steven L. Salzberg, Adam M. Phillippy, Aleksey Zimin, Daniela Puiu, Tanja Magoc, Sergey Koren, Todd J. Treangen.  Genome research.  22.  557-567.
  • Gurtowski J, Schatz M, Langmead B (2012).  Genotyping in the cloud with crossbow.  Current Protocols in Bioinformatics.  15-3.
  • Lee H, Schatz M (2012).  Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score.  Bioinformatics.  28(16).  2097-2105.
  • Koren S, Schatz M, Walenz BP, Martin J, Howard JT, Ganapathy G, Wang Z, Rasko DA, McCombie WR, Jarvis ED, others (2012).  Hybrid error correction and de novo assembly of single-molecule sequencing reads.  Nature biotechnology.  30(7).  693-700.
  • Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, Treangen TJ, Schatz M, Delcher AL, Roberts M, others (2012).  GAGE: A critical evaluation of genome assemblies and assembly algorithms.  Genome research.  22(3).  557-567.
  • Price JC, Udall JA, Bodily PM, Ward JA, Schatz M, Page JT, Jensen JD, Snell QO, Clement MJ (2012).  De novo identification of" heterotigs" towards accurate and in-phase assembly of complex plant genomes.  Proceedings of the International Conference on Bioinformatics & Computational Biology (BIOCOMP).  207.
  • Schatz M (2012).  Illuminating the genetics of complex human diseases.  BMC Proceedings.  6(Suppl 6).  O4.
  • Schatz M, Phillippy AM (2012).  The rise of a digital immune system.  GigaScience.  1(1).  4.
  • Schatz M (2012).  Computational thinking in the era of big data biology.  Genome biology.  13(11).  177.
  • Donia MS, Fricke WF, Partensky F, Cox J, Elshahawi SI, White JR, Phillippy AM, Schatz M, Piel J, Haygood MG, others (2011).  Complex microbiome underlying secondary and primary metabolism in the tunicate-Prochloron symbiosis.  Proceedings of the National Academy of Sciences.  108(51).  E1423-E1432.
  • Schatz M, Langmead B, Salzberg SL (2010).  Cloud computing and the DNA data race.  Nature biotechnology.  28(7).  691.
  • Schatz M, others (2010).  The missing graphical user interface for genomics.  Genome Biol.  11(8).  128.
  • Schatz M (2010).  High Performance Computing for DNA Sequence Alignment and Assembly.
  • Schatz M, Phillippy AM, Gajer P, DeSantis TZ, Andersen GL, Ravel J (2010).  Integrated microbial survey analysis of prokaryotic communities for the PhyloChip microarray.  Applied and environmental microbiology.  76(16).  5636-5638.
  • Schatz M, Sommer D, Kelley D, Pop M (2010).  De Novo assembly of large genomes using cloud computing.  Proceedings of the Cold Spring Harbor Biology of Genomes Conference.
  • Kingsford C, Schatz M, Pop M (2010).  Assembly complexity of prokaryotic genomes using short reads.  BMC bioinformatics.  11(1).  1.
  • Cornman RS, Schatz M, Johnston J, Chen Y, Pettis J, Hunt G, Bourgeois L, Elsik C, Anderson D, Grozinger CM, others (2010).  Genomic survey of the ectoparasitic mite Varroa destructor, a major pest of the honey bee Apis mellifera.  Bmc Genomics.  11(1).  1.
  • Kelley DR, Schatz M, Salzberg SL (2010).  Quake: quality-aware detection and correction of sequencing errors.  Genome Biol.  11(11).  R116.
  • Schatz M, Delcher AL, Salzberg SL (2010).  Assembly of large genomes using second-generation sequencing.  Genome research.  20(9).  1165-1173.
  • Schatz M (2009).  Scalable Solutions for DNA Sequence Analysis.
  • Langmead B, Schatz M, Lin J, Pop M, Salzberg SL (2009).  Searching for SNPs with cloud computing.  Genome Biol.  10(11).  R134.
  • Schatz M (2009).  CloudBurst: highly sensitive read mapping with MapReduce.  Bioinformatics.  25(11).  1363-1369.
  • Navlakha S, Schatz M, Kingsford C (2009).  Revealing biological modules via graph summarization.  Journal of Computational Biology.  16(2).  253-264.
  • Cornman RS, Chen YP, Schatz M, Street C, Zhao Y, Desany B, Egholm M, Hutchison S, Pettis JS, Lipkin WI, others (2009).  Genomic analyses of the microsporidian Nosema ceranae, an emergent pathogen of honey bees.  PLoS Pathog.  5(6).  e1000466.
  • Schatz M (2009).  High Throughput Sequence Analysis with MapReduce.
  • Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz M, Puiu D, Hanrahan F, Pertea G, Van Tassell CP, Sonstegard TS, others (2009).  A whole-genome assembly of the domestic cow, Bos taurus.  Genome Biol.  10(4).  R42.
  • Trapnell C, Schatz M (2009).  Optimizing data intensive GPGPU computations for DNA sequence alignment.  Parallel computing.  35(8).  429-440.
  • Langmead B, Schatz M, Lin J, Pop M, Salzberg SL (2009).  Human SNPs from short reads in hours using cloud compu ng.
  • Suzuki JY, Tripathi S, Fermín GA, Jan F, Hou S, Saw JH, Ackerman CM, Yu Q, Schatz M, Pitz KY, others (2008).  Characterization of insertion sites in Rainbow papaya, the first commercialized transgenic fruit crop.  Tropical Plant Biology.  1(3-4).  293-309.
  • Schatz M, Cooper-Balis E, Bazinet A (2008).  Parallel network motif finding.  Techinical report, University of Maryland Insitute for Advanced Computer Studies.
  • Desjardins CA, Gundersen-Rindal DE, Hostetler JB, Tallon LJ, Fadrosh DW, Fuester RW, Pedroni MJ, Haas BJ, Schatz M, Jones KM, others (2008).  Comparative genomics of mutualistic viruses of Glyptapanteles parasitic wasps.  Genome Biol.  9(12).  R183.
  • Schatz M (2008).  BlastReduce: high performance short read mapping with MapReduce.  University of Maryland, http://cgis. cs. umd. edu/Grad/scholarlypapers/papers/MichaelSchatz. pdf.
  • Salzberg SL, Sommer DD, Schatz M, Phillippy AM, Rabinowicz PD, Tsuge S, Furutani A, Ochiai H, Delcher AL, Kelley D, others (2008).  Genome sequence and rapid evolution of the rice pathogen Xanthomonas oryzae pv. oryzae PXO99A.  BMC genomics.  9(1).  204.
  • Phillippy AM, Schatz M, Pop M (2008).  Genome assembly forensics: finding the elusive mis-assembly.  Genome Biol.  9(3).  R55.
  • Schatz M, Trapnell C (2007).  Fast exact string matching on the GPU.  Center for Bioinformatics and Computational Biology.
  • Desjardins CA, Gundersen-Rindal DE, Hostetler JB, Tallon LJ, Fuester RW, Schatz M, Pedroni MJ, Fadrosh DW, Haas BJ, Toms BS, others (2007).  Structure and evolution of a proviral locus of Glyptapanteles indiensis bracovirus.  BMC microbiology.  7(1).  1.
  • Schatz M, Phillippy AM, Shneiderman B, Salzberg SL (2007).  Hawkeye: an interactive visual analytics tool for genome assemblies.  Genome biology.  8(3).  R34.
  • Schatz M, Trapnell C, Delcher AL, Varshney A (2007).  High-throughput sequence alignment using Graphics Processing Units.  BMC bioinformatics.  8(1).  474.
  • Carlton JM, Hirt RP, Silva JC, Delcher AL, Schatz M, Zhao Q, Wortman JR, Bidwell SL, Alsmark UC, Besteiro S, others (2007).  Draft genome sequence of the sexually transmitted pathogen Trichomonas vaginalis.  Science.  315(5809).  207-212.
  • Gajer P, Schatz M, Salzberg SL (2004).  Automated correction of genome sequence errors.  Nucleic acids research.  32(2).  562-569.
  • Schatz M, Gajer P, Salzberg S (2004).  Automated correction of genome sequence errors.  Nucleic acids research.  (2).  562-569.
Book Chapters
  • Narzisi G, Mishra B, Schatz M (2014).  On algorithmic complexity of biomolecular sequence assembly problem.  Algorithms for Computational Biology.  Springer International Publishing.  183-195.
Other Publications
  • Schatz M (2012).  Entering the Era of Mega-genomics (JGI Seventh Annual User Meeting 2012: Genomics of Energy and Environment).  DOE Joint Genome Institute (JGI), LBNL (Lawrence Berkeley National Laboratory (LBNL), Berkeley, CA (United States)).
Conference Proceedings
  • Amin MR, Skiena S, Schatz M (2016).  NanoBLASTer: Fast alignment and characterization of Oxford Nanopore single molecule sequencing reads.  Computational Advances in Bio and Medical Sciences (ICCABS), 2016 IEEE 6th International Conference on.  1-6.
  • Schatz M (2015).  The resurgence of reference quality genome sequence.  Plant and Animal Genome XXIII Conference.
  • Schatz M (2015).  Perfect Long Read Assembly and the Rise of Pan-Genomics.  Plant and Animal Genome XXIII Conference.
  • Schatz M (2014).  de novo Assembly of Complex Genomes Using Single Molecule Sequencing.  Plant and Animal Genome XXII Conference.
  • Schatz M (2014).  Variation & RNAseq Service in KBase.  Plant and Animal Genome XXII Conference.
  • Blood PD, Marcus S, Schatz M (2014).  Large-scale Sequencing and Assembly of Cereal Genomes Using Blacklight.  Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment.  20.
  • Schatz M (2012).  De novo assembly of complex genomes using 3rd generation sequencing.  Plant and Animal Genome XX Conference (January 14-18, 2012).
  • Menon RK, Bhat GP, Schatz M (2011).  Rapid parallel genome indexing with MapReduce.  Proceedings of the second international workshop on MapReduce and its applications.  51-58.
  • Lin J, Schatz M (2010).  Design patterns for efficient graph algorithms in MapReduce.  Proceedings of the Eighth Workshop on Mining and Learning with Graphs.  78-85.
Back to top