Albert Einstein, Andreas Schleicher, British Journal of Mathematical and Statistical Psychology., Christian Bokhove, Complementarity, Diane Ravitch, Dr Hugh Morrison, ETS, Greg Ashman, Item Response Theory, John Jerrim, Matthias von Davier, Measurement of ability, Michael Gove, Michael Oakeshott, Niels Bohr, PIRLS, Pisa 2015, Randy Bennett, Rasch Model, Robbie Meredith, Sean Coughlan, TES, Theresa May, Times Educational Supplement, TIMSS
PISA cannot be rescued by switching IRT model because all IRT modelling is flawed.
Dr Hugh Morrison (The Queen’s University of Belfast [retired])email@example.com
On page 33 of the Times Educational Supplement of Friday 25th November 2016, Andreas Schleicher, who oversees PISA, appears to accept my analysis of the shortcomings of the Rasch model which plays a central role in PISA’s league table. The Rasch model is a “one parameter” Item Response Theory (IRT) model, and Schleicher argues that PISA’s conceptual difficulties can be resolved by abandoning the Rasch model for a two or three parameter model. However, my criticisms apply to all IRT models, irrespective of the number of parameters. In this essay I will set out the reasoning behind this claim.
One can find the source of IRT’s difficulty in Niels Bohr’s 1949 paper entitled Discussion with Einstein on Epistemological Problems in Atomic Physics. Few scientists have made a greater contribution to the study of measurement than the Nobel Laureate and founding father of quantum theory, Niels Bohr. Given Bohr’s preoccupation what the scientist can say about aspects of reality that are not visible (electrons, photons, and so on), one can understand his constant references to measurement in psychology. “Ability” cannot be seen directly; rather, like the microentities that manifest as tracks in particle accelerators, ability manifests in the examinee’s responses to test items. IRT is concerned with “measuring” something which the measurer cannot experience directly, namely, the ability of the examinee.
IRT relies on a simple inner/outer picture for its models to function. In IRT the inner (a realm of timeless, unobserved latent variables, or abilities) is treated as independent of the outer (here examinees write or speak responses at moments in time). This is often referred to as a “reservoir” model in which timeless abilities are treated as the source of the responses given at specific moments in time.
As early as 1929 Bohr rejected this simplistic thinking in strikingly general terms: “Strictly speaking, the conscious analysis of any concept stands in a relation of exclusion to its immediate application. The necessity of taking recourse to a complementary … mode of description is perhaps most familiar to us from psychological problems.” Now what did Bohr mean by these words? Consider, for example, the concept “quadratic.” It is tempting to adopt a reservoir approach and trace a pupil’s ability to apply that concept in accord with established mathematical practice to his or her having the formula in mind. The guidance offered by the formula in mind (Bohr’s reference to “conscious analysis”) accounts for the successful “application,” for example, to the solution of specific items on an algebra test.
However, this temptingly simplistic model in which the formula is in the unobserved mental realm and written or spoken applications of the concept “quadratic” take place in the observed realm, contains a fundamental flaw; the two realms cannot be meaningfully connect. The “inner” formula (in one realm) gets its guidance properties from human practices (in the other realm). A formula as a thing-in-itself cannot guide; one has to be trained in the established practice of using the formula before it has guidance properties. In school mathematics examinations around the world, pupils are routinely issued with a page of formulae relevant to the examination. Alas, it is the experience of mathematics teachers everywhere that simply having access to the formula as a thing-in-itself offers little or no guidance to the inadequately trained pupil. The formula located in one realm cannot connect with the applications in the other.
Wittgenstein teaches that no formula, rule, principle, etc. in itself can ever determine a course of action. The timeless mathematical formula in isolation cannot generate all the complexities of a practice (something which evolves in time); rather, as Michael Oakeshott puts it, a formula is a mere “abridgement” of the practice – the practice is primary, with the formula, rule, precept etc. deriving its “life” from the practice.
Returning to Bohr’s writing, it is instructive to explain his use of the word “complementarity” in respect of psychology and to explain the meaning of the words: “stands in a relation of exclusion.” Complementarity was the most important concept Bohr bequeathed to physics. It involves a combination of two mutually exclusive facets. In order to see its relevance to the validity of IRT modelling, let’s return to the two distinct realms.
We think of the answers to a quadratic equation as being right or wrong (a typical school-level quadratic equation has two distinct answers). In the realm of application this is indeed the case. When the examinee is measured, his or her response is pronounced right or wrong dependent upon its relation to established mathematical practice. However, in the unobserved realm, populated by rules, formulae and precepts (as things-in-themselves), any answer to a quadratic equation is simultaneously right and wrong!
A formula as a thing-in-itself cannot separate what accords with it from what conflicts with it, because there will always exist an interpretation of the formula for which a particular answer is correct, and another interpretation for which the same answer can be shown to conflict with the formula. Divorced from human practices, the distinction between right and wrong collapses. (This is a direct consequence of Wittgenstein celebrated “private language” argument.) This explains Bohr’s reference to a “relation of exclusion.” In simplistic terms, the unobserved realm, in which answers are compared with the formula for solving quadratics, responses are right-and-wrong, while in the observed realm, where answers are compared with the established practice, responses are right-or-wrong.
On this reading, ability has two mutually exclusive facets which cannot meaningfully be separated. The distinguished Wittgenstein scholar, Peter Hacker, captures this situation as follows: “grasping an explanation of meaning and knowing how to use the word explained are not two independent abilities but two facets of one and the same ability.” Ability, construed according to Bohr’s complementarity, is indefinite when unobserved and definite when observed. Moreover, this definite measure is not an intrinsic property of the examinee, but a property of the examinee’s interaction with the measuring tool.
Measurement of ability is not a matter of passively checking up on what already exists – a central tenet of IRT. Bohr teaches that the measurer effects a radical change from indefinite to definite. Pace IRT, measurers, in effect, participate in what is measured. No item response model can accommodate the “jump” from indefinite to definite occasioned by the measurement process. All IRT models mistakenly treat unmeasured ability as identical to measured ability. What scientific evidence could possibly be adduced in support of that claim? No IRT model can represent ability’s two facets because all IRT models report ability as a single real number, construed as an intrinsic property of the measured individual.