Ever since the first draft of the human genome was published nearly 15 years ago, researchers have been working hard to make sense of the information encoded in the 3 billion base pairs that make up our genome—most importantly, for those that might have implications for health. Thus far, they’ve linked thousands of genes to diseases ranging from cancers to autism, mostly through genetic association studies. These studies look for commonalities between the genetic profiles of people, numbering from a handful to hundreds of thousands, to connect their specific genetic variants to diseases.
But what if a genetic variant responsible for a patient’s disease is very rare—perhaps so rare that there has been only one person sequenced in the entire world who has it? Such a scenario makes an association test mathematically impossible.
However, says Alexis Battle, an assistant professor in the Department of Computer Science and member of the Computational Biology and Medicine Group and Machine Learning and Data Intensive Computing Group, it may still be possible to attach meaning even to very rare genetic variants using machine learning.
Even though researchers may have never seen a particular genetic variant before, they’ve often seen other variants that have something in common with this one, explains Battle. Maybe it’s located close in the genome to other genes that they already know something about. Or perhaps it binds proteins that are known to bind to other disease-related genes. The sequence near it might be similar to better-studied regions of the genome.
Battle and her colleagues are programming computers to analyze thousands of data points, combining diverse medical and biological measurements from individuals with rare genetic variants. Anything that might offer a clue into their function is valuable, Battle says.
“If we can say anything at all about these rare variants,” she adds, “it could have a huge impact on countless diseases.”