Diverse Perspectives Needed To Provide Better Analysis Of Big Data

Leading experts give emerging economists, scientists, and statisticians insight on the future of social science research
University of Chicago undergraduates heard an array of views on the economic, moral, and political effects of so-called "big data" at a student-organized panel discussion held on April 10, 2015 at the Saieh Hall for Economics.
Led by fourth-year students students in the College, the panel featured economists Susan Athey of Stanford University and Hal Varian of the University of California, Berkeley, as well as statistician and machine learning expert Larry Wasserman of Carnegie Mellon University. In their opening remarks, all three panelists noted that one of the biggest challenges of big data is how to draw the right conclusions from it. While the vast pools of stored data on virtually every transaction are useful in showing correlations between phenomena, applying economic tools to interpret those correlations and show causality between them remains difficult.
Athey, who has served as an economics consultant for Microsoft since 2007, noted that while economic models are designed to find causality, figuring out how to fit these models to big data sets is a necessary next step in the field.
“We really don’t know how to apply these techniques to big data sets. Some of the particular things that go wrong is if I have three variables, I can think really hard about how to specify my model, but if I’ve got thousands—nobody has a theory of how a thousand variables affect your outcome,” she said. “Where I’m trying to do research to bring these things together is to combine the best of both worlds.”
Athey went on to say that it’s important to be both data-driven in research and to exploit the wealth of data now available, but that it’s also essential to tailor methods to answer questions of causality.
Echoing aspects of Athey’s concerns, Wasserman said that he thinks large data sets are too often interpreted with too little analysis, and that no matter the size of the data set, poor analysis results in poor decisions.
From his perspective as a statistician, Wasserman said that statistics are too often left by the wayside when dealing with big data. “I can go on and on about things that I’ve seen at my own university, various initiatives that get started from different directions, and no one thinks to call statistics. They get [computer science] involved, they get all these other departments involved, and they always leave out statisticians.”
Wasserman likened analyzing big data without the help of a statistician to getting brain surgery from a cardiologist. Leaving statisticians out can lead to significant errors in analysis, he argued.
Varian, who is currently Google’s chief economist, noted that while these challenges regarding big data need to be addressed, the continuing automatization of transactions means much more data about purchases and purchasing behavior can be extracted than ever before. This allows providers to tailor transactions to specific parties involved based on their preferences as shown by their history.
Varian also mentioned that computers can continuously improve and update the experiences of users. He used the examples of Uber and Airbnb as services that can function because of the extensive purchasing history and background information provided by computer-mediated transactions.
The student moderators, fourth-years Kayla Reinherz and Justin Manley, also asked the panelists about a pressing issue arising from the ubiquitous collection and unlimited storage of transactional data: privacy. If there are no safeguards in place, data can be traced back to specific individuals. Concerns over compromised security of sensitive personal information are top of mind for any who have had their identity stolen, or those who fear that it could be abused in any other myriad ways.
Wasserman demonstrated why this is such a complicated issue, saying that in order to guarantee privacy, noise must be introduced, diluting it to a degree that robust analysis becomes impractical. “You have to dissolve the amount of information in the data set so much that it quickly becomes useless,” he said. Wasserman also mentioned that there is tension between the statistician and internet security communities, saying that the two sides of the argument seem far from coming to a solution to the problem.
Varian offered such a possible solution, presenting the idea of a data “clean room,” a physical location where legitimate researchers could access data that may pose risks to privacy, but only with safeguards set in place to prevent the data from ever being taken out of the room.
 “We had a set of data at the University of Michigan when I was there, which was very lightly anonymized tax returns, which could pose a privacy threat, but you could access it only in a windowless room with a computer, with no USB, no disk drive, just a printer,” recounted Varian. “You did all your analysis in that room, printed out the results and then you could go build out your papers.”
“That’s a low-tech solution, but it’s pretty darn fool-proof for most purposes,” he added.
Despite the challenges facing big data, the panelists were unanimous in praising its promise. All three agreed that this is the most exciting phase of their careers, and that it’s a phenomenal time to be a scientist, economist, or statistician. They also all agreed that it’s important for big data experts and economists to learn from each other and work together in order to achieve new things. As Athey said in her introductory remarks, “This is a moment in time where social science and economics can do so many things we couldn’t do before.”
Wasserman echoed her enthusiasm. “I wake up every morning and I think it’s the most exciting time for so many different sciences, for statistics especially, for computer science, for economics.” 
But while Varian, Athey and Wasserman may currently be the ones on the cutting edge of big data, the future experts on the topic sat in front of the panel.
As Athey said, addressing the undergraduate audience,“You all are the ones who are really going to be a part of how big data changes science, as well as business. Whatever path you follow, you’re the ones who are going to really take these ideas forward.”
The Becker Friedman Institute for Research in Economics facilitated the event with support from the CME Group Foundation.
––William Leddy