Data Driven

Fall 2011

As computing ability jumps by leaps and bounds, researchers wrestle with making the best use—and reuse—of all that data.

data-drivenOn July 16, 1969, Apollo 11 left the Earth bound for the moon with the most sophisticated guidance computer of its time. Built by the finest engineering minds, it held the lives of three Americans and the hopes of the entire planet in its solid-state memory.

All 32 kilobytes of it.

These days, one doesn’t have to travel to orbiting satellites to find that kind of computational capacity. Heck, just reach in your front pocket; the average 32 GB iPhone has 1 million times the computing memory that navigated for Neil Armstrong and Co.

Quantifying this quantum jump in artificial intelligence often comes down to an explanation of the law, in this case Moore’s law. In 1965, Intel’s Gordon Moore noted that the number of transistors that could be placed on a chip was roughly doubling every year. Ever since, Moore’s law has held almost true (the doubling has occurred more on the order of every two years, but still…).

Moore’s law applied to a computer’s processing speed, but clearly data storage has kept pace. Entire genomes can now be stored on drives that fit in the palm of the hand, while galaxies might call for something a little bigger—like a drive the size of a cigar box.

That ability to collect and store data has infiltrated nearly every aspect of the human condition while captivating human imagination. From exploring the heavens to researching repetitive motion injuries, if an action can be observed—whether it’s stargazing or typing—it stands a very good chance these days of being quantified and placed in a database.

Organizing all that data —and making it more easily accessible—is arguably the next great wave in computing, akin to the way the Dewey Decimal System brought order to what had been the chaos of referencing the printed word.

“I think of computing as having gone through three generations at this point,” says Greg Hager, chair and professor of computer science at the Whiting School. “We had the hardware generation, which was concerned with constructing the computer, the software generation where we were more concerned with what ran inside the computer, and now the data generation—which is about what computing can do to essentially take data and turn it into usable information.”

Information that for many Whiting School faculty is changing the way they do business and, in turn, impacting how each of us lives our lives.

 

Data-DrivenMODEL HEART

Even before Apollo soared into space, or Gemini, or even Mercury, the human heart was the target of some of the first computational modeling. It is a story Rai Winslow, professor of biomedical engineering and director of the Whiting School’s Institute for Computational Medicine, is fond of telling. How, in November of 1960, a lone wolf researcher named Denis Noble published the first paper showing how heart cells—aka cardiac myocytes—functioned, notably how they could fire off long-lasting bolts of electricity, the first clue into how the heart as an organ contracted and acted as a pump.

It was glorious work that, Winslow laughs, Noble conducted under less than ideal conditions at the University of Oxford. “[His model] was done on an old computer, very slow, very little memory—something like 16 kilobytes—it was hard to program and took a long time to even simulate one [myocyte electrical] action potential. The computer was hard to get access to; he sneaked into the basement where this computer was, programmed it at night … but it was a tour de force of modeling at the time.”

Some 30 years later, that paper and its author changed the course of Winslow’s career path. The two met by happenstance and found their work, though separated by a generation, had a computational link. Winslow was modeling electrical activity in a different part of the body. He was investigating neural information processing—collecting thousands of readings and millions of data points to reconstruct the electrical action of neurons up and down the visual pathway from the retina to the brain. The goal: to better understand how the eye encoded vision in a way that the brain could process.

“Denis saw how we were using large-scale parallel computing to simulate neural network models, no one had done that before, and his response was ‘Gee…you have [the equivalent of] a heart here. You just need to put in different [electrical] currents.’” By looking at Winslow’s leap from the work of a cell to an entire system, “Noble realized that the cell modeling he had been doing could be translated to the level of tissue and the whole heart using these same computer modeling techniques,” says Winslow.

That’s exactly what Winslow has done, embarking on a fascinating journey that has allowed him to model the heart on both a macro- and micro-level. In the 1990s, collaborating with Noble, Winslow uncovered how the heart’s spark plug—the sinus node—functioned. It seemed so incongruous to scientists: How could a tiny clump of cells located in the northwest corner of the heart essentially run the whole show? “You’d think that the [heart] is so big that node couldn’t generate enough current to do that,” says Winslow.

It turns out the elegant node didn’t have to provide the entire electrical wallop, just a wee oscillation of current—enough to activate the microscopic filaments of atrial [heart] tissue that penetrate deep into the node. Like a roar that starts with a single voice, the node’s spark sends its voltage out the filaments, and the atrial tissue does the rest of the work, convincing other, larger cardiac cells to add their juice to the wave that spreads over the heart, causing it to contract in time.

Winslow had successfully modeled a healthy heart, which led to an obvious question. What happens in a diseased heart marked by notoriously poor electrical timing, such as the common case of congestive heart failure? To model that, Winslow has gone back to where it all started—those cardiac myocytes—but in levels only recently made possible by advanced computer imaging and data processing. Winslow was able to dissect what he says were “terabytes” of information to make a rather startling and somewhat apropos discovery: It turns out that not only is the heart a pump, but each myocyte is also a pump—perhaps many of them—that regulates the flow of calcium, sodium, and potassium throughout the cell. Those internal mechanisms control both the cell’s ability to create energy and the timing by which it fires out of the cell.

Winslow’s models—and the data upon which they are built—have uncovered a connection between failing cells and a lower release of calcium during heart contractions. Armed with that knowledge, Winslow says at least one drug company is already investigating whether it can target the calcium imbalance and perhaps improve failing hearts.

“I think that’s the point of these kinds of models,” says Winslow. “If you find a compound that blocks a particular calcium pump, how does that fit into the big picture of the networks that molecule participates in?

“It’s a systems-level problem, and that’s the power of these sophisticated models … to help point out targets in treating disease.”

And that, my friends, gets right to the heart of the matter.

 

data-driven-2EASY ACCESS: A TOUGH PROBLEM

It’s one thing to create reams of data but quite another to have anyone else come along and make sense of it. That’s the conundrum facing Sayeed Choudhury, associate dean for Library Digital Programs at Hopkins’ Sheridan Libraries. The Sheridan Libraries are the information nexus for all of Hopkins and for researchers around the world. Choudhury’s job is to make sure the digital end—which is quickly becoming the majority of the various collections—remains accessible as data storage technology constantly changes.

That old library model of placing a book on a shelf for a decade until some curious researcher decides to retrieve it is quickly changing. Imagine said researcher opening those pages to find them either blank or horribly jumbled, and you get some idea of the challenge facing Choudhury ’88, MSE ’90, who is both a librarian and a Whiting School alumnus.

“Our view is preservation. When you’re talking about digital content, there are plenty of cases of data generated five years ago where I could not go to a researcher today and say, ‘Could you please give me those data?’” Part of the issue is how and where data has been stored. Think of having a 5 ½-inch floppy disk in your possession. Sure there may still be information on it, but it’s like owning a Betamax tape: Where are you going to find a playback unit?

“I’ve said that in five years, a 1-terabyte hard drive [the current standard] is going to be like a 5 ½-inch floppy disk,” notes Choudhury, adding that there’s more than technological obsolescence facing data collection. There’s often a false sense of security, what Choudhury calls “a naïve approach that ‘I’ve got copies of [the data] so what’s the problem?

“The problem is that copies get corrupted—you ought to check those copies regularly. And even if you do everything right from the viewpoint of the bits and the media, on a deeper fundamental level, there’s context associated with data. [Researchers] tell you an amazing story of their data and what they’re doing with it, but I can assure you they have often not documented all of that in a way that someone without any knowledge of the work could come along and say, ‘OK, I get it, I can take your data and run my own analysis against it.’”

Given his unique background, Choudhury sees himself as the human interface between researchers who create the data, librarians who want users to make data easily searchable, and software engineers who can design the appropriate algorithms and metadata context to make it so. On a pragmatic level that means he has lots of conversations with everyone from storage manufacturers to scientists anticipating future use of their data.

Working with National Science Foundation grants, Choudhury has reached out to large data collection projects such as the Hopkins-led Sloan Digital Sky Survey, which bills itself as “the most ambitious astronomical survey ever undertaken.” Its goal is to map literally a quarter of the heavens, and that’s a lot of data … on the order of 140 terabytes. “Sloan can keep its data on disks, back it up, and have IT experts who can fine-tune the databases, it’s part of their budget,” says Choudhury. “We interfaced with this group as their project was winding down about preservation. They concluded it’s preferable for us to provide long-term curation. We have definitely learned a great deal [from them] that we’re applying to other domains or projects.”

Pretty soon it may be the scientists who are first approaching Choudhury. There’s nothing like money to motivate, and Choudhury notes that the National Science Foundation now requires all primary investigators to include a two-page data management plan in their proposals, a guarantee that their results won’t end up in the data desert.

Choudhury says his team is helping researchers formulate data retention strategies—“We’ve heard from people saying, ‘Yeah, I understand I have to do this, but honestly, it’s useful, because I can’t even get back to the data I produced five years ago,’” he says—and also sparking curiosity. As word spreads, Choudhury has found some researchers thrilled that their colleagues’ data may be coming online. “[They say], ‘That will help with collaboration more than anything, because the best insight I get into somebody’s research is the kind of data they produce and the kind of questions they ask about those data.’”

Now that’s what we call recycling.