Insights / Research BriefJan 22, 2021

Simple and Credible Value-Added Estimation Using Centralized School Assignment • Measuring Racial Discrimination in Algorithms

Based on BFI Working Paper No. 2020-186, “Simple and Credible Value-Added Estimation Using Centralized School Assignment,” by Joshua Angrist, MIT; Peter Hull, Assistant Professor, UChicago’s Kenneth C. Griffin Dept. of Economics; Parag Pathak, MIT; and Christopher Walters, University of California, Berkeley; and based on BFI Working Paper No. 2020-184 "Measuring Racial Discrimination in Algorithms," by David Arnold, University of California, San Diego; Will Dobbie, Harvard Kennedy School; and Peter Hull, Assistant Professor, UChicago’s Kenneth C. Griffin Dept. of Economics.
Key Takeaways
  • Algorithms, by employing large amounts of data, can aid in making fair and equitable decisions.
  • However, the collection and application of data is key to avoiding unintended bias.
  • New research reveals how, on the one hand, algorithms can be used to better measure school quality, while, on the other hand, algorithms can lead to bias in decisions involving bail.
With the aggregation of more and more data, and with improvements in machine learning methods, firms and policymakers have developed algorithms to help them make decisions. For example, banks and credit card companies use algorithms to make decisions relating to a consumer’s creditworthiness. The idea is not only to make accurate assessments but to also remove any prejudice or other qualitative errors that could occur when people make such high-stakes decisions.

This technology is not limited to banking and finance. In two recent working papers, UChicago’s Peter Hull and his colleagues investigate the impact of algorithms on school choice (“Simple and Credible Value-Added Estimation Using Centralized School Assignment”) and on pretrial detention (“Measuring Racial Discrimination in Algorithms”). 

For policymakers and local officials concerned about improving the performance of schools, the authors’ work reveals how existing data can reveal pertinent information about how students choose school, which ultimately affects educational outcomes. And their analysis of algorithmic bail decisions in New York City finds that algorithmic decisions are not free from bias; rather, were they put in place, such algorithmic decisions could lead to discriminatory behavior.

Chart
Figure 1: Discrimination in Algorithmic Bail Decisions
Notes: This figure plots the range of unadjusted racial disparities in algorithmic release rate recommendations, for different average release rates, along with the range of disparities due to racial discrimination. Algorithmic recommendations are from a baseline gradient-boosted decision tree model. Disparities from discrimination are computed as described in the paper. Shaded areas indicate pointwise 95 percent confidence intervals, computed by the bootstrapping procedure described in the working paper.

How school assignment can improve performance

Matching students with schools is of primary interest to families, school districts, and communities. For a growing number of school districts, centralized, algorithmic assignment schemes are the preferred method. Boston, Denver, and New York City, for instance, use a deferred acceptance mechanism to assign students to seats. Many of these centralized assignment systems incorporate random lottery numbers to break ties between otherwise similar students. 

How can such randomness be useful understanding school quality? In “Simple and Credible Value-Added Estimation Using Centralized School Assignment,” the authors introduce two new empirical strategies that exploit randomness in algorithmic school assignments to measure individual school quality. (These strategies are described in great detail in the full working paper and briefly highlighted here.) Two recent trends motivate this work: “grading” and ranking schools by how well their students perform vis a vis standardized testing; and the use of centralized methods to assign students to schools. Interestingly, the authors find that the second phenomenon, via partial randomization in assignment, can be used to more accurately rank school performance.

By isolating the randomness in algorithmic school assignments, policymakers can construct school quality measures that are free from selection bias. In other words, one school will not be ranked higher than another because they receive higher-performing students with different school preferences and priorities.

Take, for example, students who are applying to New York City middle schools, where they list schools in order of preference. The school district, for its part, will have priorities that it will assign to applicants, like their proximity to a school and whether a sibling already attends the same school. The school district’s deferred acceptance algorithm takes those inputs and returns an assignment for each student; if there is a tie, some form of lottery is employed. Similar to a randomized control trial, such lotteries could improve school quality/performance measures by randomly dispersing students across schools.

This work reveals how data produced by algorithms can lead to better, or fairer, measures of quality, inasmuch as differences among schools would not be attributed to the underlying ability of enrolled students to school quality. Schools that, for example, enroll affluent students and which would otherwise outrank other schools, lose that advantage through the application of these new methods. 

Who receives bail?

How could algorithms impact decisions about who makes bail? In “Measuring Racial Discrimination in Algorithms,” the authors investigate whether a risk assessment tool may be viewed as racially discriminatory if it recommends that white defendants be released before trial at a higher rate than Black defendants with equal risk of pretrial criminal misconduct.

How is it that such discrimination can occur through logical, unfeeling, algorithms? The answer is in the data that feed the algorithms. Misconduct potential is only observed among the defendants who a judge chooses to release before trial. Such selection can introduce bias in algorithmic predictions but also complicate the measurement of algorithmic discrimination, since unobserved qualification cannot be conditioned on to compare white and Black treatment.

Chart
Figure 2: Value-Added Estimates for NYC Middle Schools
Note: This figure shows how the RC VAM (risk-controlled value-added model) reduces bias. The overall height of the bars, or the “root mean squared error,” is a measure of overall accuracy of each measure, and the blue as a proportion measures statistical bias. Conventional OLS is akin to a traditional school quality measure. Conventional IV VAM (instrumental-variables value-added model) and RC VAM are the authors’ two new estimators.

The authors develop new tools to overcome this selection challenge and measure algorithmic discrimination in New York City, home to one of the largest pretrial systems in the country. The method builds on previous techniques developed by the authors to measure racial discrimination in actual bail judge decisions and leverages randomness in the assignment of judges to white and Black defendants. Applying their methods, the authors find that a sophisticated machine learning algorithm (which does not train directly on defendant race or ethnicity) recommends the release of white defendants at a significantly higher rate than Black defendants with identical pretrial misconduct potential. 

Specifically, when calibrated to the average NYC release rate of 73 percent, the algorithm recommends an 8-percentage point (11 percent) higher release rate for white defendants than equally qualified Black defendants. This unwarranted disparity explains 77 percent of the observed racial disparity in release recommendations, grows as the algorithm becomes more lenient, and is driven by discrimination among individuals who would engage in pretrial misconduct if released.

CLOSING TAKEAWAY
The authors find that a sophisticated machine learning algorithm (which does not train directly on defendant race or ethnicity) recommends the release of white defendants at a significantly higher rate than Black defendants with identical pretrial misconduct potential.

Conclusion

As these papers reveal, employing data with algorithmic decision-making holds both promise and peril. While policymakers and practitioners should tread carefully when turning important decisions over to machines, they should nonetheless explore options to make decision-making more efficient and bias-free. The authors’ techniques developed in this work have other applications, including in job assignment systems, such as those used by Teach for America to place interns in school, in the measurement of physician and hospital quality, and in the consequences of receiving rationed medical resources like new drugs and mechanical ventilation during the recent pandemic. Future work from these researchers may explore these, and other, applications.