Empirical studies in the economics of education, the measurement of skill gaps across demographic groups, and the impacts of interventions on skill formation rely on psychometrically validated test scores that record the proportion of items correctly answered. Test scores are sometimes taken as measures of an invariant scale of human capital that can be compared over time and people. We show that for a prototypical test, invariance is violated. We use an unusually rich data set from an early childhood intervention program that measures knowledge of narrowly defined skills on essentially equivalent subsets of tasks. We examine if conventional, broadly-defined measures of skill are the same across people who are comparable on detailed knowledge measures. We reject the hypothesis of aggregate scale invariance and call into question the uncritical use of test scores in research on education and on skill formation. We compare different measures of skill and ability and reject the hypothesis of valid aggregate measures of skill.

More on this topic

BFI Working Paper·Feb 16, 2026

Income Shocks and the Intergenerational Transmission of Executive Function

Ariel Kalil and Mauricio Koechlin
Topics: Early Childhood Education, Economic Mobility & Poverty
BFI Working Paper·Jan 14, 2026

The Economics of Scaling Early Childhood Programs: Lessons from The Chicago School

John List
Topics: Early Childhood Education
BFI Working Paper·Oct 13, 2025

Introducing The SPEAK: A Scalable Computer- Adaptive Tool to Measure Knowledge of Early Human Development

Caroline Gaudreau, Dani Levine, John List, and Dana Suskind
Topics: Early Childhood Education