Empirical studies in the economics of education, the measurement of skill gaps across demographic groups, and the impacts of interventions on skill formation rely on psychometrically validated test scores that record the proportion of items correctly answered. Test scores are sometimes taken as measures of an invariant scale of human capital that can be compared over time and people. We show that for a prototypical test, invariance is violated. We use an unusually rich data set from an early childhood intervention program that measures knowledge of narrowly defined skills on essentially equivalent subsets of tasks. We examine if conventional, broadly-defined measures of skill are the same across people who are comparable on detailed knowledge measures. We reject the hypothesis of aggregate scale invariance and call into question the uncritical use of test scores in research on education and on skill formation. We compare different measures of skill and ability and reject the hypothesis of valid aggregate measures of skill.