It was March 25, 2000. A sea of red filled State Street, and Badger fans around the country were celebrating. The University of Wisconsin-Madison men’s basketball had just defeated Purdue in the NCAA tournament, securing a spot in the Final Four.
This achievement — exciting for any team — was particularly staggering for this group of Badgers. The team had entered the tournament as an 8-seed, but upset No. 1 Arizona, No. 4 LSU and No. 6 Purdue to win the West bracket and advance to the National Semifinals.
In the Final Four, the Badgers took on No. 1 Michigan State and they put up a good fight. But eventually, the Badgers’ Cinderella story came to an end and they fell to the Spartans 53–41.
Despite the loss, when the team returned to Wisconsin, they were heroes. It was a glorious moment — partly because it shouldn’t have happened. It was seemingly a statistical anomaly — an 8-seed shouldn’t beat a 6-seed, or a 4-seed, and certainly not a 1-seed.
But as it turns out, it wasn’t quite as statistically stunning as it seemed. Each year, March Madness sees an average of 12.7 upsets throughout the course of the 67-game tournament. 2007 had as few as four, while 2014 saw a whopping 19, but the figure generally falls somewhere comfortably in the middle.
But this doesn’t mean the rankings are wrong or meaningless. Laura Albert, a UW industrial and systems engineering professor who oversees Badger Bracketology, said they’re rather “tools to help us overcome some of our human biases.”
Albert explained these rankings can’t tell you exactly what’s going to happen — instead, they can give you an idea about what’s likely to happen, or what you might expect to see.
Based on past team performance, score differentials, game location and other sets of data, sports statisticians build models which generate rankings and predictions. These can be helpful, and at the very least, a lot of fun, but Albert cautioned against placing too much weight on any one model.
Beyond basketball and team rankings though, data science is an invaluable tool in educational assessment, criminal justice, politics — everything. And crucially, just as the use of data science extends everywhere, the caution and intentionality required by such use must, too. It mandates a deep understanding of the context in which these algorithms are constructed, as well as their practical limitations.
“We should use a lot of math in our everyday life,” Albert said. “But it’s just a matter of weighing it at the right level.
Garbage in, garbage out
Wisconsin has had long-standing issues with racial disparities in its criminal justice system, in no small part because human bias has permeated throughout.
It’s these disparities which make programs like the Public Safety Assessment, designed to take human bias out of criminal justice practices, so attractive.
Currently, the average incarceration rate for black men in Wisconsin is 12%, whereas the national average stands at about 6.7%. Compared with the national and statewide averages for white people, about 1.3% and 1.2% respectively, these numbers present a troubling trend for the U.S. and particularly Wisconsin. Additionally, black men tend to receive sentences 20% longer than white men — even when committing similar offenses.
The Pretrial Justice Institute, which advocates for “safe, fair and effective” pretrial justice practices, has argued that using algorithms, rather than relying solely on the likely biased opinion of one judge, can “substantially reduce the disparate impact that people of color experience.”
It seems to make sense — by inputting empirical, indisputable evidence, like prior criminal history and defendant’s age, these algorithms should return a fair, unbiased result. The problem, Albert explained, is it’s actually not that simple.
While it may seem natural to simply view data as inputs into a model, “it’s usually the output of another process that produced this data, and that’s one involving human decision-making,” Albert said. “And sometimes you can correct for that pretty easily … but sometimes it’s a little bit trickier.”
In the case of pretrial risk assessment models, the processes which generated these inputs are the same processes which have led Wisconsin to have some of the worst racial disparities in the country.
This is what many statisticians would call “garbage in, garbage out.” The argument against these tools is that an algorithm based on biased data cannot generate an unbiased result. And if the data was generated in a system which has systematically led to disproportionately high arrest rates for black men, an algorithm which uses that data will only serve to perpetuate those inequities.
In other words, the result you yield is only as good as the data used to produce it.
“That’s where you really need to be super critical in understanding where the data come from,” Albert said. “Realize that the arrest rate isn’t exogenous to the system, it’s actually the function of maybe, patrolling rates in certain police beats or neighborhoods. And that’s where you can get into a lot of trouble, if you don’t really understand where that comes from.”
But this can go even a step further. As mathematician Cathy O’Neil explains in her book, “Weapons of Math Destruction,” bias from algorithms might even be worse than human bias because the empirical nature of the data disguises it in an inexorable way. Data codifies our opinions, O’Neil explained, “believing all the while that our tools are not only scientific but fair.”
“The result,” she wrote, “is widespread damage that all too often passes for inevitability.”
Use as instructed
Facing declining enrollment and increasing financial problems, schools around the country, including UW-Stout, are turning to an increasingly common, though largely unknown practice — installing tracking software on the school’s website to learn more about prospective students.
Administrators say this practice, among other similar data-collection practices, are part of an effort to “make better predictions about which students are the most likely to apply, accept an offer and enroll.”
The Washington Post recently reported on these practices, highlighting they are particularly common at cash-strapped colleges. Data collection from prospective students helps colleges figure out which students will be able to pay full tuition rates, said Lloyd Thacker, a former admissions counselor and founder of the Education Conservancy, a nonprofit research group.
“An admission dean is more and more a businessperson charged with bringing in revenue,” Thacker said. “The more fearful they are about survival, the more willing they are to embrace new strategies.”
In recent years, many public universities have ramped up their out-of-state recruitment efforts, hoping to bring in more students whose families will be able to afford the higher tuition rates.
UW-Madison enrollment has been consistent with this trend. Five years ago, 22.7% of the fall 2014 freshman class was from out-of-state. By 2019, that figure had increased to 32.4%.
Recruiting students who can pay tuition isn’t necessarily a bad thing. But it’s a tradeoff — if a university focuses more effort on recruiting wealthier students, that might mean lower-income students are less encouraged to attend.
But these tools weren’t necessarily created to discriminate against low-income students. In many cases, these data-collection tools were designed to help schools effectively reach students who would fit well in their communities.
If a prospective student shows particular interest in mechanical engineering, a school like UW-Madison, with a strong mechanical engineering program, would know to attempt to recruit that student. On the other hand, the New York University Tisch School of the Arts would know not to send materials to that student given their likely lack of interest in their programs — saving both the student and the school time and energy.
These practices weren’t originally created with malice, but it’s when these algorithms are taken out of the context in which they were created that issues arise. Too often, they are used, presented and manipulated for uses they weren’t designed for.
David Kaplan, a UW-Madison educational psychology professor, sees this crop up in his own work. Kaplan’s area of research focuses on large-scale educational assessments like the Programme for International Student Assessment, which is designed to gauge the performance of 15-year-old students in math, science and reading around the world.
PISA was not designed to give countries a course of action or to assess individuals’ academic success. But politicians love these assessments, Kaplan explained, because the rankings they generate are hot talking points. They give politicians a platform to criticize folks on the other end of the spectrum and promote their own agendas.
“The responses are often devoid of what that assessment can actually tell you … it can’t tell you what to do,” Kaplan said.
These assessments can be invaluable tools for understanding longitudinal educational progress, but Kaplan said they’re often not used for the purposes for which they were designed.
That’s why, through his research, Kaplan is trying to bring the assessments back to a place where they are being used for the appropriate purpose.
And in all of this, recognizing the limitations of algorithmic modeling is crucial, he said.
“It’s really important to realize that a model is, by definition … an abstraction of reality, and therefore it’s not true,” Kaplan said. “It’s simply the best humans can do to try to understand a very complex situation.”
In their hands: Wisconsin educators seek solutions to statewide teacher shortage
Regulating the unknown
U.S. Rep. Alexandria Ocasio-Cortez, D-New York, made headlines recently thanks to a viral video of her grilling Facebook CEO Mark Zuckerberg about the social media platform’s fact-checking policies.
“You announced recently that the official policy of Facebook now allows politicians to pay to spread disinformation in 2020 elections and in the future,” Ocasio-Cortez said in the exchange. “So I just want to know how far I can push this.”
Zuckerberg said there is some threshold of fact-checking political advertisements, asserting that “lying is bad,” but suggested it’s not the role of Facebook to take down false advertising.
“I think people should be able to see for themselves what politicians they may or may not vote for are saying,” he said.
This is exactly the type of issue that Young Mie Kim, a UW-Madison journalism and mass communications professor who directs Project DATA: Digital Ad Tracking and Analysis, is working to address.
Kim conducted extensive research on political advertising in the 2016 election and found staggering evidence of Russian involvement, which brought her to testify before the Federal Election Commission. Kim discovered that large groups of voters were targeted with voter suppression messages — primarily black voters, those likely to vote for Hillary Clinton and those who consume liberal media.
Again, this discriminatory suppression was made possible by data-driven research.
During the broadcast era, campaigns had a much harder time targeting specific voters and groups of people with specialized messages.
“But now, with data, it’s completely possible,” Kim said. “And if campaigns can use data to identify, target and mobilize these people, it is much easier for them to persuade people — or even demobilize people.”
Jordan Ellenberg, UW-Madison mathematics professor and author of “How Not To Be Wrong: The Power of Mathematical Thinking,” said this might not have been possible even 20 years ago. But as political partisanship grows starker, it has become easier to predict voter behavior.
“That’s something we might think of as mystical, but it’s actually quite easy,” Ellenberg said. “If I know your income level, and your zip code and your race and your religion, I’m going to guess right a pretty high percentage of the time.”
As a consequence, it has also become easier to target voters based on those predictions. And as Kim explained, this targeting means not all voters are getting the same information.
This is complicated by the fact that researchers like Kim don’t even know the scope of the different kinds of information voters are receiving because specific ads only show up for specific individuals, based on targeting.
“It opens up a lot of opportunity for manipulation,” Kim said. “They can lie, and they can tell different lies to different voters … because there is no monitoring.”
Another consequence of this is the perpetuation of political inequity. Kim said groups of voters with relatively short voting histories, such as new immigrants and young people, often receive no political advertisements whatsoever, discouraging them from participating in electoral processes.
This is where, ideally, tech platforms should come in, but as Zuckerberg made clear, Facebook isn’t exactly interested in strict fact-checking policies. In fact, tech platforms have largely aided this kind of oppressive data-driven political strategy.
Kim explained that even though there is a lot of public pressure on tech platforms to expand monitoring and accuracy standards, they don’t have much incentive to do so because political advertising is so profitable. From a business standpoint, it just wouldn’t make sense.
Researchers like Kim are working to address all these issues, but a lack of available information has stalled progress. For example, Kim said more than half of political advertisement sponsors from the 2016 election are unknown — and you can’t regulate what you don’t know.
Kim is optimistic that more people are aware of this issue now, but acknowledged there still haven’t been any real solutions or substantial changes in the law.
“Under this political environment, I’m not sure when that’s going to be passed,” she said.
You make what you measure
In all kinds of models — in all uses, in all algorithmic applications, in all fields — statistics and data analytics are invaluable tools. The world would be virtually nowhere without powerful mathematical and statistical innovation.
But use of these tools must be intentional. UW-Madison sociology professor Felix Elwert explained that all serious researchers must be — and do tend to be — highly concerned with external influences on research, and ensuring they are actually measuring what they intend to.
Ellenberg said a common cautionary tale among researchers is “you make what you measure.” In other words, oftentimes simply the process of conducting research or collecting data can influence the results.
This appeared in recidivism models — an algorithm based on a system which has disproportionately sent black men to jail will serve to continue that trend.
In education, models seek out and recruit higher-income students, increasing the likelihood that they will continue to have higher incomes — on average, college graduates earn 56% more than high school graduates.
And in the case of elections, politically marginalized folks are suppressed and pushed out, while politically engaged people are encouraged to take on increasingly partisan positions.
As O’Neil wrote, these systems have a tendency to “project the past into the future.”
It’s important for people making use of these models to recognize the imperfections of the algorithms they have designed, and act accordingly. Kaplan said all systems will have some element of statistical “noise” — irregularities, errors and residuals.
“The way to minimize that noise,” Kaplan said, “is by measuring as many things as possible that you think might be contributing to the outcome, with the knowledge that you are never going to measure everything, and there’s still going to be a component of noise at the end, even if you thought you had measured everything.”
All this is to say integrity in research is vital. Results are only as good as the system used to produce them, which means this requires constant systematic improvement.
As researchers, Elwert said, “the one currency we have assigned to us is our credibility. And our credibility increases with our ability to be self-critical.”
Making Connections: The Importance of Relationships at an R1 University
“All models are wrong, but some are useful”
In 1960, George Box, a famous British statistician, moved to Wisconsin to create UW-Madison’s department of statistics. He is often considered one of the greatest statistical minds of the 20th century.
It was during Box’s tenure at UW-Madison when he famously wrote that “all models are wrong, but some are useful.”
This mindset has guided statistical innovation for years.
Data use and statistical modeling became popular because of their efficiency. They have made things once thought inconceivable entirely possible — we can predict the weather with a high level of accuracy, retailers can market their products only to those consumers who might be interested, healthcare can be specialized and optimized, and we have a greater understanding of the ways of the world.
But while algorithms certainly can offer a solution to human biases, Albert said, “they can also introduce a bunch of new ones.”
In all the excitement, it seems data has gotten away from us. Systems are continually optimizing and rearranging and enhancing in the interest of efficiency and profitability, but oftentimes at the expense of morality and common sense.
It’s in criminal justice, education and elections, but also in job hiring and employment processes, credit scores and loans, marketing, lawmaking, housing regulations — virtually everything.
But experts and researchers have hope — largely because more people are interested in understanding the informational infrastructure governing their lives.
“What I want is for people to feel empowered to ask questions about the algorithms that affect aspects of their life,” Ellenberg said. “I think we can get there.”
Algorithms can’t do everything. But in the right hands, in the right context and with the right intentions, they can help us do a lot.
“I think a lot of people are talking about how we can use data science for the common good … it’s a very popular topic right now. I’d like to think that we’ve been trying to address these issues for a long time,” Albert said. “But there’s always more to do.”