Network Will Manage ‘Data Deluge’

Winter 2019

Alexander Szalay
(Image: Joey Pulone)

Alexander Szalay, director of the Institute for Data Intensive Engineering and Science, is the principal investigator on a two-year national effort to build a network that allows scientists to more efficiently store and analyze caches of data and share them with other researchers. The Open Storage Network project is underwritten by a $1.8 million grant from the National Science Foundation.

“The goal is to create a robust, industrial-strength national storage substrate that can impact 80 percent of the NSF research community,” says Szalay, a Bloomberg Distinguished Professor who holds appointments in computer science in the Whiting School of Engineering and in physics and astronomy in the Krieger School of Arts and Sciences.

The network’s eventual build-out may cost between $20 million and $30 million in hardware and software, a relatively modest investment that “could completely change the academic big data landscape,” says Szalay.

A conservative projection of universities that eventually would join would make OSN—at about 200 petabytes—one of the largest distributed data storage networks dedicated to science in the world, with economies of scale that would make management of huge data sets cheaper for all involved.

Szalay has a deep interest in how all of science is managing the “data deluge”—the avalanche of data that advanced scientific methods make available to researchers studying complex questions as diverse as the origin of the universe, climate change, and the genetic origins of disease.

Other OSN team members come from the National Data Service and each of the four NSF-funded Big Data Regional Innovation Hubs: the west hub, in California; the midwest hub, in Illinois; the south hub, based in North Carolina and Georgia; and the northeast hub, located in Massachusetts and Pennsylvania.

Additional software and service layers will be added as it is developed, notes the NSF.