Data-Intensive Systems for the Social Sciences with Michael Cafarella (Michigan)
Data science has proven an excellent matchmaker for collaborations across disciplines. In particular, new partnerships between the social sciences and computer science have benefitted from these new approaches, as researchers in economics, sociology, and other fields seek to extract knowledge from the influx of new data sources facilitated by the internet and other technologies.
Michael Cafarella, the first speaker of our May 2019 Speaker Series — in fact, our first invited speaker in CDAC history — perfectly captured the promise of these new alliances with his talk, “Data-Intensive Systems for the Social Sciences.” As one of the co-founders of Hadoop and Lattice Data, Cafarella has been at the leading edge of data-driven discovery, and he has recently turned his interests to applications in the field of economics.
Cafarella opened his talk with an overview of how data can flip the script on traditional economic analysis, which typically runs on slow surveys that often have sample size issues. Today, Cafarella said, economists have access to new data sources that cover a huge range of topics, collected at high frequency with fine granularity. But these advances come with their own issues, such as spurious correlations and massive computational overhead.
“As exciting as the data-intensive social sciences are, and they offer huge opportunities in many ways, contributing to them meaningfully means not only doing the social science analysis, but doing computer science at the same time,” Cafarella said. “It’s very unusual that we can simply grab one of these brand-new, very different datasets and have it work properly using the traditional tools that social scientists have used to date.”
In his talk, Cafarella describes his group’s approach for using social media data to quantify and predict economic trends, such as unemployment and recovery from natural disasters or other shocks, or in other words, “making nowcasting safe for economics.” To make similar queries easier for scientists to execute, Cafarella’s team created RacoonDB, a tool “named for the creature that rifles through your trash to find one tiny nugget that’s valuable…if you’ve spent a lot of time on Twitter, you’ll know that feeling.”
Using RacoonDB, Cafarella has examined measuring and predicting other phenomena such as movie box office returns, gas prices, and gun sales. He also describes more recent work working with law enforcement agencies to analyze Craigslist advertisements for evidence of human trafficking, building a model that detects warning signs of victimized sex workers.
“The general field of combining computer science with social science seems absolutely terrific to me,” Cafarella said. “There’s a huge amount of opportunity to have a real impact both on people who really need help or alternatively, on the largest machine around, the overall economy. It really dwarfs shipping a bunch of downloads, it’s really exciting.”