The Data Science Master’s degree at the Johns Hopkins University will provide the training in applied mathematics, statistics and computer science to serve as the basis for an understanding, and appreciation, of existing data science tools. Our program aims to produce the next generation of leaders in data science by emphasizing mastery of the skills needed to translate real-world data-driven problems in mathematical ones, and then solving these problems by using a diverse collection of scientific tools.
In addition to Introduction to Data Science (EN.553.636), students will take one course in each of the four core areas: Statistics, Machine Learning, Optimization, and Computing. Students will decide on an area of focus and take three courses in either Computational Medicine, Computational Machine Learning, Computer Vision, Computational Finance, Mathematics of Data Science, Language and Speech, or Statistical Theory. The final capstone project is course EN.553.806 Big Data Design, or another project-oriented course approved by the faculty advisor and the Internal Oversight Committee, and a written paper on a related topic (approved by the instructor) which includes a deeper study of the pre-approved topic. The goal of the final course and written paper is to allow the student to apply data analysis techniques learned in the program, and possibly to extend those ideas to more general settings or to new application areas. Lastly, the paper will be summarized in a poster session organized at the end of each semester.
- A two-day orientation program will precede the first Fall semester of enrollment.
- EN.553.636 Introduction to Data Science, and
One course in each of the four Core Areas below. Courses chosen in this section must be distinct from the courses used to satisfy requirements 2 and 3.
- Statistics – Introduction to Statistics (EN.553.630), Bayesian Statistics (EN.553.632), Statistical Theory I (EN.553.730), Statistical Theory II (EN.553.731)
- Machine Learning – Statistical Machine Learning: Methods, Theory, and Applications (PH.140.644), Machine Learning (EN.601.675), Statistical Machine Learning (EN.601.775), Machine Learning (EN.553.740)
- Optimization – Nonlinear Optimization I (EN.553.761), Nonlinear Optimization II (EN.553.762), Convex Optimization (EN.553.765)
- Computing – Data Processing (EN.553.688), Parallel Programming (EN.601.620)
Three courses from one of the following focus areas.
- Computational Medicine: Computational medicine is an interdisciplinary field that combines mathematics, computer science, medicine, and engineering to analyze and interpret biological and medical data. The following courses are approved for this focus area 1: Introduction to Bioinformatics (AS.410.633), Bioinformatics: Tools for Genome Analysis (AS.410.635), Methods in Proteomics (AS.410.661), Gene Expression Data Analysis and Visualization (AS.410.671), Computational Molecular Medicine (EN.553.650), Algorithms for Bioinformatics (EN.605.620) or Foundations of Algorithms (EN.605.621), Computational Genomics (EN.605.653), Analysis of Gene Expression and High-Content Biological Data (EN.605.754).
- Computational Machine Learning: Computational machine learning uses methods from linear algebra, statistics and optimization to build machines that can learn to make predictions from data. The following courses are approved for this focus area: Statistical Machine Learning: Methods, Theory, and Applications (PH.140.644), Information Theory (EN.520.447), Machine Learning for Signal Processing (EN.520.612), Compressed Sensing and Sparse Recovery (EN.520.648), Random Signal Analysis (EN.520.651), Machine Learning (EN.553.740), Graphical Models (EN.553.743), Machine Learning (EN.601.675), Machine Learning: Data to Models (EN.601.676), Machine Learning: Optimization (EN.601.681), Causal Inference (EN.601.677), Machine Learning: Representation Learning (EN.601.679), Statistical Machine Learning (EN.601.775), Unsupervised Learning: Big Data to Low-Dimensional Representations (EN.601.780).
- Computer Vision: Compressed Sensing and Sparse Recovery (EN.520.648), Computer Vision (EN.601.661), Machine Learning: Deep Learning (EN.601.682), Vision as Bayesian Inference (EN.601.783).
- Computational Finance: Stochastic Processes in Finance I, II (EN.553.627, EN.553.628), Equity Markets and Quantitative Trading (EN.553.641), Investment Science (EN.553.642), Introduction to Financial Derivatives (EN.553.644), Interest Rate and Credit Derivatives (EN.553.645), Risk Measurement and Management in Financial Markets (EN.553.646), Quantitative Portfolio Theory & Performance Analysis (EN.553.647), Financial Engineering and Structured Products (EN.553.648), Advanced Equity Derivatives (EN.553.649), Commodities and Commodity Markets (EN.553.753).
- Mathematics of Data Science: Monte Carlo Methods (EN.553.633), Introduction to Convexity (EN.553.665), High-Dimensional Approximation, Probability, and Statistical Learning (EN.553.738), Machine Learning (EN.553.740), Nonlinear Optimization I, II (EN.553.761, EN.553.762), Stochastic Search and Optimization (EN.553.763), Convex Optimization (EN.553.765), Combinatorial Optimization (EN.553.766), Matrix Analysis (EN.553.792), Randomized and Big Data Algorithms (EN.601.634), Approximation Algorithms (EN.601.635),
- Language and Speech: Semantics I, II (AS.050.617, AS.050.622), Syntax (AS.050.620), Phonology (AS.050.625), Information Extraction (EN.520.666), Speech and Auditory Processing by Humans and Machines (EN.520.680), Natural Language Processing (EN.601.665), Machine Learning: Linguistic and Sequence Modeling (EN.601.765).
- Statistical Theory: Statistical Machine Learning: Methods, Theory, and Applications (PH.140.644), Bayesian Statistics (EN.553.632), Statistical Theory (EN.553.730), Statistical Theory II (EN.553.731), Topics in Statistical Pattern Recognition (EN.553.735), Distribution-free Statistics and Resampling Methods (EN.553.737), High-Dimensional Approximation, Probability, and Statistical Learning (EN.553.738), Statistical Pattern Recognition Theory & Methods (EN.553.739), Causal Inference (EN.601.677), Statistical Machine Learning (EN.601.775).
- The program requires the student to take one elective course. To maximize a student’s flexibility in choosing this course, the student may choose any course offered at JHU that is directly or indirectly related to data science. The elective course must be approved by the student’s advisor as well as the Internal Oversight Committee.
- Big Data Design (EN.553.806), or another project-oriented course approved by the faculty advisor and the Internal Oversight Committee.
- In addition to taking the course, the student must write a paper on a related topic, approved in advance by the course instructor. The content of the paper should include a deeper study of the pre-approved topic that allows the students to apply data analysis techniques learned in the program, and possibly to extend those ideas to more general settings or to new application areas.
- The written paper will be summarized in a poster presented in a poster session organized at the end of each semester.
Prior to Fall Semester of Year I
Orientation Program (2 days)
Year 1: Fall Semester
Introduction to Data Science
Area of Focus 1
Year 1: Winter Break
Online Data Ethics Course
Year 1: Spring Semester
Area of Focus 2
Area of Focus 3
Year 2: Fall Semester