The decisions of judges, lenders, journal editors, and other gatekeepers often lead to significant disparities across affected groups. An important question is whether, and to what extent, these group-level disparities are driven by relevant differences in underlying individual characteristics, or by biased decision makers. Becker (1957, 1993) proposed an outcome test of bias based on differences in post-decision outcomes across groups, inspiring a large and growing empirical literature. The goal of our paper is to offer a methodological blueprint for empirical work that seeks to use outcome tests to detect bias. We show that models of decision making underpinning outcome tests can be usefully recast as Roy models, since heterogeneous potential outcomes enter directly into the decision maker’s choice equation. Different members of the Roy model family, however, are distinguished by the tightness of the link between potential outcomes and decisions. We show that these distinctions have important implications for defining bias, deriving logically valid outcome tests of such bias, and identifying the marginal outcomes that the test requires.