This paper studies inference for the average treatment effect in randomized controlled trials where treatment status is determined according to a “matched pairs” design. By a “matched pairs” design, we mean that units are sampled i.i.d. from the population of interest, paired according to observed, baseline covariates and finally, within each pair, one unit is selected at random for treatment. This type of design is used routinely throughout the sciences, but results about its implications for inference about the average treatment effect are not available. The main requirement underlying our analysis is that pairs are formed so that units within pairs are suitably “close” in terms of the baseline covariates, and we develop novel results to ensure that pairs are formed in a way that satisfies this condition. Under this assumption, we show that, for the problem of testing the null hypothesis that the average treatment effect equals a pre-specified value in such settings, the commonly used two-sample t-test and “matched pairs” t-test are conservative in the sense that these tests have limiting rejection probability under the null hypothesis no greater than and typically strictly less than the nominal level. We show, however, that a simple adjustment to the standard errors of these tests leads to a test that is asymptotically exact in the sense that its limiting rejection probability under the null hypothesis equals the nominal level. We also study the behavior of randomization tests that arise naturally in these types of settings. When implemented appropriately, we show that this approach also leads to a test that is asymptotically exact in the sense described previously, but additionally has finite-sample rejection probability no greater than the nominal level for certain distributions satisfying the null hypothesis. A simulation study confirms the practical relevance of our theoretical results.

More on this topic

BFI Working Paper·Sep 16, 2025

The Promise of Digital Technology and Generative AI for Supporting Parenting Interventions in Latin America

Ariel Kalil, Michelle Michelini, and Pablo Ramos
Topics: Early Childhood Education, Technology & Innovation
BFI Working Paper·Sep 8, 2025

Chat2Learn: A Proof-of-Concept Evaluation of a Technology-Based Tool to Enhance Parent-Child Language Interaction

Linxi Lu and Ariel Kalil
Topics: Early Childhood Education, Technology & Innovation
BFI Working Paper·Sep 2, 2025

Artificial Writing and Automated Detection

Brian Jabarian and Alex Imas
Topics: Technology & Innovation