The new GCSE grade 5 and the Fundamental Attribution Error
Dr Hugh Morrison, The Queen’s University of Belfast (retired) firstname.lastname@example.org
Holding the new GCSE grade 5 up to ridicule
A highly unusual feature of the new numbered GCSE grade scale is the claim that the new grade 5 will somehow reflect the standards of educational jurisdictions ranked near the top of international league tables. Given the controversy surrounding such tables it will be possible, for the first time, to raise profound technical concerns about a particular grade on the GCSE grade scale. Moreover, it will be impossible to make the case that grade 5 has any technical merit if (as seems likely) Pisa ranks have any role in its determination. Pisa is the acronym for the OECD’s “Programme for International Student Assessment” and a glance at the Times Educational Supplement of 26.07.2013 will reveal that Pisa league tables are fraught with technical difficulties.
The distinguished statistician Svend Kreiner, of the University of Copenhagen, who has carried out a detailed investigation of the Pisa model, concluded: “the best we can say about Pisa rankings is that they are useless.” The British mathematician Tony Gardner, of Birmingham University, has referred to Pisa claims as “snake oil.” In the Times Educational Supplement piece, I argued that the model used by Pisa is flawed because, in order to explain a child’s ability to do simple arithmetic, for example, one must posit exotic inner states which are infinite, timeless and which somehow anticipate every arithmetical problem the child will subsequently encounter in a lifetime. These impossible inner states arise because Pisa models treat “ability” as a state rather than a capacity. How has Pisa managed to survive all these years given such damaging and unequivocal criticism? Its secret is that it appears to enjoy a relationship with Government and the media which, in effect, insulates it from its critics. Kreiner writes: “One of the problems that everybody has with Pisa is that they don’t want to discuss things with people criticising or asking questions concerning the results. They didn’t want to talk to me at all. I am sure it is because they can’t defend themselves.”
For the first time in the history of British examinations, a simple argument that anyone can understand can be deployed to undermine the technical fidelity of a particular examination grade. Mixing the measurement of achievement with the measurement of ability exposes the new grade 5 to ridicule. If grade 5 is to be predicated on Pisa rankings then profound validity shortcomings in respect of the rankings will have implications for grade 5. Consider the arrangement of balls on a snooker table before a game begins. The configuration of balls requires 44 numbers (two per ball, with the front and side rails serving as coordinate axes). While the arrangement of balls on a snooker table cannot be summarised in less than 44 numbers, Pisa claims to represent the state of mathematics education in the USA – with its almost 100,000 schools – in a single number. It would seem that what cannot be achieved for the location of simple little resin balls is nevertheless possible when the entity being “measured” is the mathematical attainment of millions of complex, intentional beings.
The Nobel laureate Sir Peter Medawar labelled such claims “unnatural science.” Citing the research of John R. Philip, he notes that the properties of a simple particle of soil cannot be captured in a single number: “the physical properties and field behaviour of soil depend on particle size and shape, porosity, hydrogen ion concentration, material flora and water content and hygroscopy. No single figure can embody itself in a constellation of values of all these variables.” Once again, what is impossible for a tiny particle of soil taken from the shoe of one of the many millions of pupils who attend school in America, is nevertheless possible when the entity being “measured” is the combined mathematical attainment of a continent’s schoolchildren.
The problem with the new GCSE grade 5: a detailed critique
The OECD has now taken the bold step of analysing measures of “happiness,” “well-being” and “anxiety” for individual countries (see, for example, ‘New Pisa happiness table,’ Times Educational Supplement 19.04.2017). In these tables “life satisfaction,” for example, is measured to two-decimal place accuracy. This begs the question, “Can complex constructs such as happiness or anxiety really be represented by a number such as 7.26?” For two giants of 20th century thought – the philosopher Ludwig Wittgenstein and the father of quantum physics, Niels Bohr – the answer to this question is an unequivocal “no.” The fundamental flaws in Pisa’s approach to measuring happiness will serve to illustrate the folly of linking a particular GCSE grade to Pisa methodology.
Once again, surely common sense itself dictates that constructs such as happiness, anxiety and well-being cannot be captured in a single number? In his book Three Seductive Ideas, the Harvard psychologist Jerome Kagan draws on the writings of Bohr and Wittgenstein to argue that measures of constructs such as happiness cannot be attributed to individuals and cannot be represented as numbers because such measures are context-dependent. He writes: “The first premise is that the unit of analysis … must be a person in a context, rather than an isolated characteristic of that person.” Wittgenstein and Bohr (independently) arrived at the conclusion that what is measured cannot be separated from the measurement context. It follows that when an individual’s happiness is being measured, a complete description of the measuring tool must appear in the measurement statement because the measuring tool helps define what the measurer means by the word happiness.
Kagan rejects the practice of reporting the measurement of complex psychological constructs using numbers: “The contrasting view, held by Whitehead [co-author of the Principia Mathematica] and Wittgenstein, insists that every description should refer to … the circumstances of the observation.” The reason for including a description of the measuring instrument isn’t difficult to see. Kagan points out that “Most investigators who study “anxiety” or “fear” use answers on a standard questionnaire or responses to an interview to decide which of their subjects are anxious or fearful. A smaller number of scientists ask close friends or relatives of each subject to evaluate how anxious the person is. A still smaller group measures the heart rate, blood pressure, galvanic skin response, or salivary level of subjects.” Alas, all these methodologies yield very different “measures” of the anxiety or fear of the subject.
Kagan therefore argues that a change in the measuring tool means a change in the reported measurement; one must include a description of the measuring instrument in order to “communicate unambiguously,” as Bohr expressed it. One can never simply write “happiness = 4.29” (as in Pisa tables) because there is no such thing as a context-independent measure of happiness. We have no idea what happiness is as a thing-in-itself. Kagan notes the implications for psychologists of the measurement principles set out by Niels Bohr: “Modern physicists appreciate that light can behave as a wave or a particle depending on the method of measurement. But some contemporary psychologists write as if that maxim did not apply to consciousness, intelligence, or fear.”
According to Bohr, when one reports psychological measurements, the requirement to describe the measurement situation means that ordinary language must replace numbers. This invalidates the entire Pisa project. Werner Heisenberg summarised his mentor’s teachings: “If we want to say anything at all about nature – and what else does science try to do – we must pass from mathematical to everyday language.” The consequences of accepting this counsel are clear; one cannot rank order descriptions.
(To simplify matters somewhat, while numbers function perfectly well when observing the motion of a tennis ball or a star, the psychologist cannot observe directly the pupil’s happiness. Bohr argued that there is “a deep-going analogy” between measurement in quantum physics and measurement in psychology because both are concerned with measuring constructs which transcend the limits of ordinary experience. According to Bohr, because the physicist, like the psychologist (in respect of attempts to measure happiness), cannot observe electrons and photons directly, “physics concerns what we can say about nature,” and numbers, therefore, must give way to ordinary language.)
The arguments advanced above apply, without modification, to Pisa’s core activity of measuring pupil ability. A simple thought experiment (first reported in the Times Educational Supplement of 26.07.2013) makes this clear. Suppose that a pupil is awarded a perfect score in a GCSE mathematics examination. It seems sensible to conjecture that if Einstein were alive, he too would secure a perfect score on this mathematics paper. Given the title on the front page of the examination paper, one has the clear sense that the examination measures ability in mathematics. Is one therefore justified in saying that Einstein and the pupil have the same mathematical ability?
This paradoxical outcome results from the erroneous treatment of mathematical ability as something entirely divorced from the questions which make up the examination paper (the measurement context). It is clear that the pupil’s mathematical achievements are dwarfed by Einstein’s; to ascribe equal ability to Einstein and the pupil is to communicate ambiguously. To avoid the paradox one simply has to detail the measurement circumstances in any report of attainment and say: “Einstein and the pupil have the same mathematical ability relative to this particular GCSE mathematics paper.” By including a description of the measuring instrument one is, in effect, making clear the restrictive meaning which attaches to the word “mathematics” as it is being used here; school mathematics omits whole areas of the discipline familiar to Einstein such as non-Euclidean geometry, tensor analysis, vector field theory, Newtonian mechanics, and so on. As with the measurement of happiness, when one factors in a description of the measuring instrument, the paradox dissolves away.
Pace Pisa, ability is not an intrinsic property of the person. Rather, it is a joint property of the person and the measuring tool. Ability is the property of an interaction. Alas for Pisa, the move from numbers to language also dissolves away that organisation’s much-lauded rank orders. Little wonder that Wittgenstein described the reasoning which underpins the statistical model (Item Response Theory) at the heart of the Pisa rankings as “a disease of thought.” For the first time, the many profound conceptual difficulties of the Pisa league table now become difficulties for a grade on the GCSE grade scale. Why would anyone agree to predicate a perfectly respectable grade scale on a ranking system with such profound shortcomings?
An article published in 2016 in the USA’s Proceedings of the National Academy of Sciences by Van Bavel, Mende-Siedlecki, Brady and Reinero, serves to emphasise the degree to which Pisa thinking is isolated even in psychology: “Indeed, the insight that behaviour is a function of both the person and the environment – elegantly captured by Lewin’s equation: B = f(P, E) – has shaped the direction of social psychological research for more than half a century. During that time, psychologists and other social scientists have paid considerable attention to the influence of context on the individual and have found extensive evidence that contextual factors alter human behaviour.”
If ability is a joint property of the person and the context in which that ability is manifest, then unambiguous communication demands that a description of the context must be integral to any attempt to represent an individual’s ability. Mainstream psychology rejects the notion that one can ignore context and treat behaviour as wholly analysable in terms of traits and inner processes. Indeed, psychology itself has a name for the error which afflicts the Pisa ranking model. Gerd Gigerenzer of the Max Planck Institute writes: “The tendency to explain behaviour internally without analysing the environment is known as the ‘fundamental attribution error.’”
Three thinkers who stand out among those who argue that ability measures cannot be separated from the context in which they are manifest are the Nobel laureate Herbert A. Simon and two of the 20th century’s greatest intellectuals: the father of quantum theory, Niels Bohr, and the philosopher Ludwig Wittgenstein. First, Herbert Simon uses a scissors metaphor to indicate the degree to which an attribute like ability cannot be disentangled from the context in which it is manifest. (Pursuing questions such as “which blade of the scissors cuts the cloth?” will do little to advance an explanation of how scissors cut; there seems to be little value in seeking to understand the whole (the cutting action) in terms of its parts (the unique contribution of each blade).) Herbert writes: “Human rational behaviour is shaped by a scissors whose blades are the structure of the environment and the computational capabilities of the actor.”
Secondly, Niels Bohr – in his Discussion with Einstein on Epistemological Problems in Atomic Physics – uses quantum “complementarity” to argue that first-person ascriptions [the contribution of the individual] and third-person ascriptions [the contribution of the environment] of psychological attributes form an “indivisible whole.” Finally, on page 143 of his Blue and Brown Books, Wittgenstein highlights the error at the heart of the Pisa project: “There is a general disease of thinking which always looks for (and finds) what would be called a mental state from which all our acts spring as from a reservoir.”
The arguments set out above have serious implications for the technical fidelity of the new GCSE grade 5. The more the general public find out about the modelling which underpins Pisa, the more their faith in the new GCSE grade scale will be undermined. (For example, Kreiner reveals in the Times Educational Supplement piece that, “Most people don’t know that half of the students taking part in Pisa  do not respond to any reading item at all. Despite that, Pisa assigns reading scores to these children.”)
The fact that a switch from numbers to language invalidates entirely the practice of ordering countries according to the efficacy of their education systems has profound implications for the validity of inferences made in respect of the new GCSE grade 5. Given the assertion that grade 5 is designed to reflect the academic standards of high performing educational jurisdictions, as identified by their Pisa ranks, what possible justification can be offered for assigning a privileged role to the GCSE grade 5 in school performance tables?
To date, the profound conceptual difficulties which attend Pisa ranks have not impacted directly on the life chances of particular children in this country. This would change if individual pupils failing to reach the grade 5 standard were construed as having fallen short of international standards (whatever that means). If one accepts the reasoning of Simon, Wittgenstein and Bohr, grade 5 can represent nothing more than a standard somewhere between grade 4 and grade 6. Any attempt to accord it special status, thereby giving it a central role in the EBacc and/or performance tables, for example, risks exposing the new GCSE grading scale to ridicule.