In her new book, “Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy,” Cathy O’Neil, Ph.D. '99 argues that the algorithms dictating so many aspects of modern life are encoded with opinions and biases disguised as empirical fact, inflicting harm right under our noses. Credit: iStock Whether we know it or not, complex algorithms make decisions that affect nearly every aspect of our lives, determining whether we can borrow money or get hired, how much we pay for goods online, our TV and music choices, and how closely our neighborhood is policed.
Thanks to the technological advances of big data, businesses tout such algorithms as tools that optimize our experiences, providing better predictive accuracy about customer needs and greater efficiency in the delivery of goods and services. And they do so, the explanation goes, without the distortion of human prejudice because they're calculations based solely on numbers, which makes them inherently trustworthy.
Sounds good, but it's simply not true, says Harvard-trained mathematician Cathy O'Neil, Ph.D. '99. In her new book, "Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy," the data scientist argues that the mathematical models underpinning these algorithms aren't just flawed, they are encoded opinions and biases disguised as empirical fact, silently introducing and enforcing inequities that inflict harm right under our noses.
The Gazette spoke with O'Neil, who once worked as a quantitative analyst and now runs the popular Mathbabe blog, about what she calls the "lie" of mathematics and her push to get data scientists to provide more transparency for an often too-trusting public.
GAZETTE: How did your work as a hedge fund quant prompt you to start thinking about how math is being used today? Had you given it thought before then?
O'NEIL: It absolutely had not occurred to me before I was a quant. I was a very naive, apolitical person going into finance. I thought of mathematics as this powerful tool for clarity and then I was utterly disillusioned and really ashamed of the mortgage-backed securities [industry], which I saw as one of the driving forces for the [2008] crisis and a mathematical lie. They implied that we had some mathematical, statistical evidence that these mortgage-backed securities were safe investments, when, in fact, we had nothing like that. The statisticians who were building these models were working in a company that was literally selling the ratings that they didn't even believe in themselves. It was the first time I had seen mathematics being weaponized and it opened my eyes to that possibility.
The people in charge of these companies, especially Moody's, put pressure on these mathematicians to make them lie, but those mathematicians, at the end of the day, they did that. It was messed up and gross and I didn't want to have anything to do with it. I spent some time in risk, after I left the hedge fund, trying to still kind of naively imagine that with better mathematics we could do a better job with risk. So I worked on the credit-default-swaps risk model. The credit default swaps were one of the big problems [of the 2008 financial crisis] and then once I got a better model, nobody cared. Nobody wanted the better model because nobody actually wants to know what their risk is. I ended up thinking, this is another example of how people are using mathematics, brandishing it as authoritative and trustworthy, but what's actually going on behind the covers is corrupt.
GAZETTE: Big data is often touted as a tool that delivers good things—more accuracy, efficiency, objectivity. But you say not so, and that big data has a "dark side." Can you explain?
O'NEIL: Big data essentially is a way of separating winners and losers. Big data profiles people. It has all sorts of information about them—consumer behavior, everything available in public records, voting, demography. It profiles people and then it sorts people into winners and losers in various ways. Are you persuadable as a voter or are you not persuadable as a voter? Are you likely to be vulnerable to a payday loan advertisement or are you impervious to that payday loan advertisement? So you have scores in a multitude of ways. The framing of it by the people who own these models is that it's going to benefit the world because more information is better. When, of course, what's really going on and what I wanted people to know about is that it's a rigged system, a system based on surveillance and on asymmetry of information where the people who have the power have much more information about you than you have about them. They use that to score you and then to deny you or offer you opportunities.
GAZETTE: How integrated are algorithms in our lives?
O'NEIL: It depends. One of the things that I noticed in my research is that poor people, people of color, people who have less time on their hands to be more careful about how their data are collected are particularly vulnerable to the more pernicious algorithms. But all of us are subject to many, many algorithms, many of which we can't even detect. Whenever we go online, whenever we buy insurance, whenever we apply for loans, especially if we look for peer-to-peer lending loans. We're in election season—political advertising is one of the most aggressive fields of analytics that exist. We often think fondly of political advertising because we know that in fact Obama got a lot of donations and then Get Out the Vote, but it also has a dark side. I think it lowers the ability for people to be well-informed because essentially a lot of campaigns efficiently target people and show them what the campaigns want them to see, which is efficient for campaigns, but inefficient for democracy as a whole.
The real misunderstanding that people have about algorithms is that they assume that they're fair and objective and helpful. There's no reason to think of them as objective because data itself is not objective and people who build these algorithms are not objective. But the most important thing to realize is they are intended to benefit the people who own them. So those people who own them are defining success and they often define success in terms of profit. And profit for that person does not necessarily mean something good for the target of that scoring system.
GAZETTE: Does the public realize how powerful and pervasive the issue is?
O'NEIL: When I started this research four years ago, people seemed to be extremely naive and very, very happy about algorithms. We didn't know how powerful they were; we didn't seem to worry about them at all. I think things have changed somewhat since then. I think one of the reasons my book is getting a very positive reception is because people are starting to realize how extremely influential these algorithms are. … I still don't think that they really quite understand how pernicious they can be and often, that's because we're not typically subject to the worst of the algorithms: the ones that keep people from having jobs because they don't pass the personality test, the ones that sentence criminal defendants to longer in jail if they're deemed a high recidivism risk, or even the ones that are arbitrary punishments for schoolteachers. The people who are building these models, the data scientists, are typically not subject to the worst of these consequences. Somehow we think big data is a great thing partly because it employs us, but also because we just don't have to deal with the worst consequences.
GAZETTE: What's the fatal flaw? The biases of the human modelers, the lack of transparency and outside scrutiny, the apolitical nature of people in math and technology valuing efficiency and profitability over human costs and fairness?
O'NEIL: There are a lot of issues, but the most obvious one is the trust itself: that we don't push back on algorithmic decisioning, and it's in part because we trust mathematics and in part because we're afraid of mathematics as a public. What we need to do is stop trusting these scoring systems. Definitely, the data scientists should know better, but the people that we're scoring should refuse to go along with it.
GAZETTE: You suggest data scientists take a Hippocratic-type oath. How would that help? Do they understand how flawed and dangerous their work is/can be?
O'NEIL: They don't. They never think about it, almost ever. I think some of them are incapable of understanding it even if it was explained to them because they don't want to know. But I think a lot of them are trained to think they're technicians rather than ethicists. They don't see that as part of their job.
GAZETTE: What would an oath do—help bring the issue to their consciousness?
O'NEIL: Yes. It's not just the oath, I want them to read this book, I want them to really have conversations with other data scientists who are also concerned about ethics, about what it means for an algorithm to be racist. It's not even a well-defined term yet. We have to define our terms in order to avoid being racist.
GAZETTE: What else needs to be done?
O'NEIL: The good thing is that algorithms could be really great if we make sure they're fair and legal and we had enough understanding of them to make sure that they weren't doing the wrong thing. So I have hope we can some day use data and algorithms to help us sentence people to prison in a less racist manner. Right now, we just haven't done that. We've just thrown a model at the system and assumed that it was going to be perfect.
We absolutely need to update anti-discrimination laws and data-protection laws, to modify them to be able to deal with the big data era. Because right now, we're way behind with that. Here's one example: The laws that have to do with lending only apply to companies that have direct credit offers to customers. But peer-to-peer lending bypasses them because they basically create a platform to pair lenders and borrowers. They put credit scores on those borrowers and those credit scores don't have to follow anti-discrimination laws because they're not directly lending. We need to update the anti-discrimination laws to make them responsible. It should be illegal for them to use race and gender, for example, in those credit scores and right now, they're using social media data.
Explore further: It's not big data that discriminates – it's the people that use it