Combing the Deep Web

Summer 2015

Students_CombingDeepWebChris White, PhD ’09, thinks that the current “one-size-fits-all” Internet is not suited for data analysis. “You use the same search engine to buy birthday presents that law enforcement uses for investigations,” says White, the Defense Advanced Research Projects Agency’s former country lead for Afghanistan. “There’s a need for technology that allows for domain-specific discovery of information on the Web and a domain-specific mechanism for evaluating it.”

So White, now a program manager at DARPA, has built Memex, a program that uses algorithms to extract content from text, images, and videos, then brings that content back in an aggregate for analysis. The program can also find information on the deep Web, where content is hidden from commercial search engines. For example, with sex trafficking, Memex’s first domain, Memex often spots temporary ads that are published and taken down before search engines can index them.

White and his team are currently refining the Memex software based on feedback from their pilot partners, including law enforcement, district attorneys, offices, and nonprofits that focus on sex trafficking cases. “We pick the partners because of the impact we’ll have if they use our tools,” says White, who also created the open source data program XDATA as part of President Obama’s Big Data Initiative.

