• About

Pace N.Ireland Education Weblog

~ Northern Ireland education analysis

Pace N.Ireland Education Weblog

Tag Archives: Item Response Theory

Time for Ofqual to take back control

13 Sunday Aug 2017

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

ability, Cambridge Assessment, Derrick Lawley, Dr Hugh Morrison, GCSE Grade 5, Gerd Gigerenzer, Harvey Goldstein, Herbert Simon, Item Response Theory, ludwig Wittgenstein, Niels Bohr, OECD, Ofqual, Pisa rankings, Svend Kriener, TES, Tim Oates, William Stewart

The GCSE Grade 5 controversy: why it’s time for Ofqual to “take back control”

 Numbers3

Dr Hugh Morrison, The Queen’s University of Belfast (retired)  drhmorrison@gmail.com

 The GCSE Grade 5 Controversy

A highly unusual feature of the new numbered GCSE grade scale is the claim that the new grade 5 will somehow reflect the standards of educational jurisdictions ranked near the top of international league tables.  Given the controversy surrounding such tables it will be possible, for the first time, to raise profound technical concerns about a particular grade on the GCSE grade scale.  Moreover, it will be difficult to make the case that grade 5 has any technical merit if Pisa ranks have any role in its determination.  Pisa is the acronym for the Paris-based “Programme for International Student Assessment” and a glance at the Times Educational Supplement of 26.07.2013 will reveal that Pisa league tables are fraught with technical difficulties.

 Concerns about Item Response Theory

The methodology which underpins Pisa ranks is called “Item Response Theory” (IRT).  IRT software claims to estimate the ability of individuals based on their responses to test items.  However, while the claim that ability is some inner state or “trait” of the individual from which his or her tests responses flow – the so-called reservoir model – is central to IRT, the claim is rarely supported by evidence.  There is very good reason to reject the inner state approach.  Consider, for example, the child solving simple problems in arithmetic.  To explain this everyday behaviour it transpires that one must invoke inner states with talismanic properties in that the state must be timeless, infinite and future-anticipating!

Rasch

Great caution is needed when using the word “ability” – while test evidence can justify us in saying that an individual has ability, that same evidence can never be used to justify the claim that the word “ability” refers to an inner (quantifiable) state of that individual.  Ability is not an intrinsic property of an individual; rather, it is a property of the interaction between individual and test items.  The individual’s responses to the test items are an inseparable part of that ability.  Indeed, divorced from a measurement context, ability is indefinite.  Individuals have definite ability only relative to a measurement context; even here it is incorrect to suggest that individuals have a quantifiable entity called “ability.”  Abandoning IRT’s appealingly simple picture of ability as an inner (quantifiable) state that individuals carry about with them, renders IRT untenable.  Ability is a two-faceted entity governed by first-person/third-person asymmetry: while we ascribe ability to ourselves without criteria, criteria are an essential prerequisite when ascribing ability to others.

The picture central to all IRT modelling – that ability is something intrinsic to the individual which is definite (and quantifiable) at all times – is rejected by the Nobel laureate Herbert Simon and by two giants of 20th century thought, physicist Niels Bohr and philosopher Ludwig Wittgenstein.  Indeed, Wittgenstein described the rationale which underpins IRT modelling – that test responses can be explained by appealing to inner processes – as a “general disease of thinking.”  Psychologists have a name for this error; Gerd Gigerenzer of the Max Plank Institute writes: “The tendency to explain behaviour internally without analysing the environment is known as the ‘Fundamental Attribution Error’.”

Niels Bohr

The criticisms levelled by Bohr and Wittgenstein are particularly damaging because IRT modellers construe ability as something inner which can be measured.  Few philosophers can match Wittgenstein’s contribution to our understanding of what can be said about the “inner”; and few scientists can match Bohr’s contribution to our understanding of measurement, particularly when the object of that measurement lies beyond direct experience.  (Bohr is listed among the top ten physicists of all time in recognition of his research on the quantum measurement problem.)  Both Bohr and Wittgenstein are concerned with the same fundamental question: how can one communicate unambiguously about aspects of reality which are beyond the direct experience of the measurer?  Just as Bohr rejected entirely the existence of definite states within the atom, Wittgenstein also rejected any claim to inner mental states; potentiality replaces actuality for both men.

For the duration of his professional life, Bohr maintained that quantum attributes have a “deep going” relation to psychological attributes in that neither can be represented as quantifiable states hidden in some inner realm.  We will always be limited to talking about ability; we will never be able to answer the question “what is ability?” let alone quantify someone’s ability.  Bohr believed that “Our task is not to penetrate into the essence of things, the meaning of which we don’t know anyway, but rather to develop concepts which allow us to talk in a productive way about phenomena in nature. …The task of physics is not to find out how nature is, but to find out what we can say about nature. … For if we want to say anything at all about nature – and what else does science try to do? – we must somehow pass from mathematical to everyday language” [italics added].

Given that IRT software is designed to measure ability, it may surprise readers that the claim that ability can be construed as a quantifiable inner state is rarely defended in IRT textbooks and journal articles.  In their article “Five decades of item response modelling,” Goldstein and Wood trace the beginnings of IRT to a paper written in 1943 by Derrick Lawley.  They note: “Lawley, a statistician, was not concerned with unpacking what ‘ability’ might mean.”  Little has changed in the interim.

Why Ofqual must protect GCSE pupils from the OECD’s “sophisticated processes”

These profound conceptual difficulties with the model which underpins Pisa rankings must surely undermine the OECD’s claim that one can rank order countries for the quality of their education systems.  In a detailed analysis of the 2006 Pisa rankings, the eminent statistician Svend Kreiner revealed that “Most people don’t know that half of the students taking part in Pisa [2006] do not respond to any reading item at all.  Despite that, Pisa assigns reading scores to these children.”  Given such revelations, why are governments, the media and the general public not more sceptical about Pisa rankings?  Kreiner offers the following explanation: “One of the problems that everybody has with Pisa is that they don’t want to discuss things with people criticising or asking questions concerning the results.  They didn’t want to talk with me at all.  I am sure it is because they can’t defend themselves.”

Given the depth of the conceptual problems which afflict IRT and, as a consequence, Pisa rankings, it seems to me foolhardy in the extreme to predicate the new GCSE grade 5 on Pisa rankings.  Ofqual have announced that grade 5 will be “broadly in line with what the best available evidence tells us is the average PISA performance in countries such as Finland, Canada, the Netherlands and Switzerland.”  In addition to Ofqual, the Department for Education and Tim Oates, director of research at Cambridge Assessment, appear to endorse a role for Pisa in UK public examinations.  The Department for Education have produced a report – PISA 2009 Study: How big is the gap? – which creates the impression that “gaps” between England and high performing Pisa countries can be represented on a GCSE grade scale designed for reporting achievement rather than ability.

Finally, the director of Cambridge Assessment asserts: “I am more optimistic … than most other analysts, I don’t see too many problems in these kinds of international comparisons.”  Indeed, Mr Oates believes that UK assessment has much to learn from involving Pisa staff directly in solving the grade 5 problem: “If we want to do it formally then we ought to have discussions with OECD. … OECD have some pretty sophisticated processes of equating tests which contain different items in different national settings.”  There is an immediate problem with this claim.  Since the psychometric definition of equity begins with the words: “for every group of examinees of identical ability …,” equity itself is founded on the erroneous assumption that ability can be quantified.

For the first time in the history of public examinations in the UK the technical fidelity of a GCSE grade will be linked to Pisa methodology.  Given the concerns surrounding IRT, is it not time for Ofqual to distance itself from the claim that the grade 5 standard is somehow invested with properties which allow it to track international standards in the upper reaches of the Pisa league tables?  The recent introduction of the rather vague term “strong pass” smacks of desperation; couldn’t a grade 6 also be deemed a strong pass?  Why not stop digging, sever the link with Pisa, and simply interpret grade 5 as nothing more than the grade representing a standard somewhere between grade 4 and grade 6?

Rate this:

The new GCSE grade 5: what Ofqual refuse to tell the public

27 Thursday Jul 2017

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

Adi Bloom, GCSE Grade 5, Happiness, Helen Ward, Herbert A. Simon, Item Response Theory, Jerome Kagan, Jo-Anne Baird, John R. Philip, joint property, ludwig Wittgenstein, Niels Bohr, OECD, PISA, Programme for International Student Assessment, Svend Kriener, TES, Tony Gardner, Werner Heisenberg, William Stewart

The new GCSE grade 5 and the Fundamental Attribution Error

Dr Hugh Morrison, The Queen’s University of Belfast (retired)  drhmorrison@gmail.com

Hilda Ogden

Holding the new GCSE grade 5 up to ridicule

A highly unusual feature of the new numbered GCSE grade scale is the claim that the new grade 5 will somehow reflect the standards of educational jurisdictions ranked near the top of international league tables.  Given the controversy surrounding such tables it will be possible, for the first time, to raise profound technical concerns about a particular grade on the GCSE grade scale.  Moreover, it will be impossible to make the case that grade 5 has any technical merit if (as seems likely) Pisa ranks have any role in its determination.  Pisa is the acronym for the OECD’s “Programme for International Student Assessment” and a glance at the Times Educational Supplement of 26.07.2013 will reveal that Pisa league tables are fraught with technical difficulties.

The distinguished statistician Svend Kreiner, of the University of Copenhagen, who has carried out a detailed investigation of the Pisa model, concluded: “the best we can say about Pisa rankings is that they are useless.”  The British mathematician Tony Gardner, of Birmingham University, has referred to Pisa claims as “snake oil.”  In the Times Educational Supplement piece, I argued that the model used by Pisa is flawed because, in order to explain a child’s ability to do simple arithmetic, for example, one must posit exotic inner states which are infinite, timeless and which somehow anticipate every arithmetical problem the child will subsequently encounter in a lifetime.  These impossible inner states arise because Pisa models treat “ability” as a state rather than a capacity.  How has Pisa managed to survive all these years given such damaging and unequivocal criticism?  Its secret is that it appears to enjoy a relationship with Government and the media which, in effect, insulates it from its critics.  Kreiner writes: “One of the problems that everybody has with Pisa is that they don’t want to discuss things with people criticising or asking questions concerning the results.  They didn’t want to talk to me at all.  I am sure it is because they can’t defend themselves.”

For the first time in the history of British examinations, a simple argument that anyone can understand can be deployed to undermine the technical fidelity of a particular examination grade.  Mixing the measurement of achievement with the measurement of ability exposes the new grade 5 to ridicule.  If grade 5 is to be predicated on Pisa rankings then profound validity shortcomings in respect of the rankings will have implications for grade 5.  Consider the arrangement of balls on a snooker table before a game begins.  The configuration of balls requires 44 numbers (two per ball, with the front and side rails serving as coordinate axes).  While the arrangement of balls on a snooker table cannot be summarised in less than 44 numbers, Pisa claims to represent the state of mathematics education in the USA – with its almost 100,000 schools – in a single number.  It would seem that what cannot be achieved for the location of simple little resin balls is nevertheless possible when the entity being “measured” is the mathematical attainment of millions of complex, intentional beings.

The Nobel laureate Sir Peter Medawar labelled such claims “unnatural science.”  Citing the research of John R. Philip, he notes that the properties of a simple particle of soil cannot be captured in a single number: “the physical properties and field behaviour of soil depend on particle size and shape, porosity, hydrogen ion concentration, material flora and water content and hygroscopy.  No single figure can embody itself in a constellation of values of all these variables.”  Once again, what is impossible for a tiny particle of soil taken from the shoe of one of the many millions of pupils who attend school in America, is nevertheless possible when the entity being “measured” is the combined mathematical attainment of a continent’s schoolchildren.

The problem with the new GCSE grade 5: a detailed critique

The OECD has now taken the bold step of analysing measures of “happiness,” “well-being” and “anxiety” for individual countries (see, for example, ‘New Pisa happiness table,’ Times Educational Supplement 19.04.2017).  In these tables “life satisfaction,” for example, is measured to two-decimal place accuracy.  This begs the question, “Can complex constructs such as happiness or anxiety really be represented by a number such as 7.26?”  For two giants of 20th century thought – the philosopher Ludwig Wittgenstein and the father of quantum physics, Niels Bohr – the answer to this question is an unequivocal “no.”  The fundamental flaws in Pisa’s approach to measuring happiness will serve to illustrate the folly of linking a particular GCSE grade to Pisa methodology.

Once again, surely common sense itself dictates that constructs such as happiness, anxiety and well-being cannot be captured in a single number?  In his book Three Seductive Ideas, the Harvard psychologist Jerome Kagan draws on the writings of Bohr and Wittgenstein to argue that measures of constructs such as happiness cannot be attributed to individuals and cannot be represented as numbers because such measures are context-dependent.  He writes: “The first premise is that the unit of analysis … must be a person in a context, rather than an isolated characteristic of that person.”  Wittgenstein and Bohr (independently) arrived at the conclusion that what is measured cannot be separated from the measurement context.  It follows that when an individual’s happiness is being measured, a complete description of the measuring tool must appear in the measurement statement because the measuring tool helps define what the measurer means by the word happiness.

Kagan rejects the practice of reporting the measurement of complex psychological constructs using numbers: “The contrasting view, held by Whitehead [co-author of the Principia Mathematica] and Wittgenstein, insists that every description should refer to … the circumstances of the observation.”  The reason for including a description of the measuring instrument isn’t difficult to see.  Kagan points out that “Most investigators who study “anxiety” or “fear” use answers on a standard questionnaire or responses to an interview to decide which of their subjects are anxious or fearful.  A smaller number of scientists ask close friends or relatives of each subject to evaluate how anxious the person is.  A still smaller group measures the heart rate, blood pressure, galvanic skin response, or salivary level of subjects.”  Alas, all these methodologies yield very different “measures” of the anxiety or fear of the subject.

Kagan therefore argues that a change in the measuring tool means a change in the reported measurement; one must include a description of the measuring instrument in order to “communicate unambiguously,” as Bohr expressed it.  One can never simply write “happiness = 4.29” (as in Pisa tables) because there is no such thing as a context-independent measure of happiness.  We have no idea what happiness is as a thing-in-itself.  Kagan notes the implications for psychologists of the measurement principles set out by Niels Bohr: “Modern physicists appreciate that light can behave as a wave or a particle depending on the method of measurement.  But some contemporary psychologists write as if that maxim did not apply to consciousness, intelligence, or fear.”

According to Bohr, when one reports psychological measurements, the requirement to describe the measurement situation means that ordinary language must replace numbers.  This invalidates the entire Pisa project.  Werner Heisenberg summarised his mentor’s teachings: “If we want to say anything at all about nature – and what else does science try to do – we must pass from mathematical to everyday language.”  The consequences of accepting this counsel are clear; one cannot rank order descriptions.

(To simplify matters somewhat, while numbers function perfectly well when observing the motion of a tennis ball or a star, the psychologist cannot observe directly the pupil’s happiness.  Bohr argued that there is “a deep-going analogy” between measurement in quantum physics and measurement in psychology because both are concerned with measuring constructs which transcend the limits of ordinary experience.  According to Bohr, because the physicist, like the psychologist (in respect of attempts to measure happiness), cannot observe electrons and photons directly, “physics concerns what we can say about nature,” and numbers, therefore, must give way to ordinary language.)

The arguments advanced above apply, without modification, to Pisa’s core activity of measuring pupil ability.  A simple thought experiment (first reported in the Times Educational Supplement of 26.07.2013) makes this clear.  Suppose that a pupil is awarded a perfect score in a GCSE mathematics examination.  It seems sensible to conjecture that if Einstein were alive, he too would secure a perfect score on this mathematics paper.  Given the title on the front page of the examination paper, one has the clear sense that the examination measures ability in mathematics.  Is one therefore justified in saying that Einstein and the pupil have the same mathematical ability?

This paradoxical outcome results from the erroneous treatment of mathematical ability as something entirely divorced from the questions which make up the examination paper (the measurement context).  It is clear that the pupil’s mathematical achievements are dwarfed by Einstein’s; to ascribe equal ability to Einstein and the pupil is to communicate ambiguously.  To avoid the paradox one simply has to detail the measurement circumstances in any report of attainment and say: “Einstein and the pupil have the same mathematical ability relative to this particular GCSE mathematics paper.”  By including a description of the measuring instrument one is, in effect, making clear the restrictive meaning which attaches to the word “mathematics” as it is being used here; school mathematics omits whole areas of the discipline familiar to Einstein such as non-Euclidean geometry, tensor analysis, vector field theory, Newtonian mechanics, and so on.  As with the measurement of happiness, when one factors in a description of the measuring instrument, the paradox dissolves away.

Pace Pisa, ability is not an intrinsic property of the person.  Rather, it is a joint property of the person and the measuring tool.  Ability is the property of an interaction.  Alas for Pisa, the move from numbers to language also dissolves away that organisation’s much-lauded rank orders.  Little wonder that Wittgenstein described the reasoning which underpins the statistical model (Item Response Theory) at the heart of the Pisa rankings as “a disease of thought.”  For the first time, the many profound conceptual difficulties of the Pisa league table now become difficulties for a grade on the GCSE grade scale.  Why would anyone agree to predicate a perfectly respectable grade scale on a ranking system with such profound shortcomings?

An article published in 2016 in the USA’s Proceedings of the National Academy of Sciences by Van Bavel, Mende-Siedlecki, Brady and Reinero, serves to emphasise the degree to which Pisa thinking is isolated even in psychology: “Indeed, the insight that behaviour is a function of both the person and the environment – elegantly captured by Lewin’s equation: B = f(P, E) –  has shaped the direction of social psychological research for more than half a century.  During that time, psychologists and other social scientists have paid considerable attention to the influence of context on the individual and have found extensive evidence that contextual factors alter human behaviour.”

If ability is a joint property of the person and the context in which that ability is manifest, then unambiguous communication demands that a description of the context must be integral to any attempt to represent an individual’s ability.  Mainstream psychology rejects the notion that one can ignore context and treat behaviour as wholly analysable in terms of traits and inner processes.  Indeed, psychology itself has a name for the error which afflicts the Pisa ranking model.  Gerd Gigerenzer of the Max Planck Institute writes: “The tendency to explain behaviour internally without analysing the environment is known as the ‘fundamental attribution error.’”

Three thinkers who stand out among those who argue that ability measures cannot be separated from the context in which they are manifest are the Nobel laureate Herbert A. Simon and two of the 20th century’s greatest intellectuals: the father of quantum theory, Niels Bohr, and the philosopher Ludwig Wittgenstein.  First, Herbert Simon uses a scissors metaphor to indicate the degree to which an attribute like ability cannot be disentangled from the context in which it is manifest.  (Pursuing questions such as “which blade of the scissors cuts the cloth?” will do little to advance an explanation of how scissors cut; there seems to be little value in seeking to understand the whole (the cutting action) in terms of its parts (the unique contribution of each blade).)  Herbert writes: “Human rational behaviour is shaped by a scissors whose blades are the structure of the environment and the computational capabilities of the actor.”

Secondly, Niels Bohr – in his Discussion with Einstein on Epistemological Problems in Atomic Physics – uses quantum “complementarity” to argue that first-person ascriptions [the contribution of the individual] and third-person ascriptions [the contribution of the environment] of psychological attributes form an “indivisible whole.”  Finally, on page 143 of his Blue and Brown Books, Wittgenstein highlights the error at the heart of the Pisa project: “There is a general disease of thinking which always looks for (and finds) what would be called a mental state from which all our acts spring as from a reservoir.”

Conclusions

The arguments set out above have serious implications for the technical fidelity of the new GCSE grade 5.  The more the general public find out about the modelling which underpins Pisa, the more their faith in the new GCSE grade scale will be undermined.  (For example, Kreiner reveals in the Times Educational Supplement piece that, “Most people don’t know that half of the students taking part in Pisa [2006] do not respond to any reading item at all.  Despite that, Pisa assigns reading scores to these children.”) 

The fact that a switch from numbers to language invalidates entirely the practice of ordering countries according to the efficacy of their education systems has profound implications for the validity of inferences made in respect of the new GCSE grade 5.  Given the assertion that grade 5 is designed to reflect the academic standards of high performing educational jurisdictions, as identified by their Pisa ranks, what possible justification can be offered for assigning a privileged role to the GCSE grade 5 in school performance tables?

To date, the profound conceptual difficulties which attend Pisa ranks have not impacted directly on the life chances of particular children in this country.  This would change if individual pupils failing to reach the grade 5 standard were construed as having fallen short of international standards (whatever that means).  If one accepts the reasoning of Simon, Wittgenstein and Bohr, grade 5 can represent nothing more than a standard somewhere between grade 4 and grade 6.  Any attempt to accord it special status, thereby giving it a central role in the EBacc and/or performance tables, for example, risks exposing the new GCSE grading scale to ridicule.

Rate this:

Why there is little cause to be happy with the new GCSE grade 5

02 Friday Jun 2017

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

Albert Einstein, Alfred North Whitehead, Happiness, Item Response Theory, Jerome Kagan, ludwig Wittgenstein, New GCSE Grade 5, Niels Bohr, OECD Pisa, Principia Mathematica, TES, Times Educational Supplement, William Stewart

The OECD’s Programme for International Student Assessment (Pisa) has now taken the bold step of analysing measures of “happiness,” “well-being” and “anxiety” for individual countries (see New Pisa happiness table, TES 19.04.2017 https://www.tes.com/news/school-news/breaking-news/new-pisa-happiness-table-see-where-uk-pupils-rank).

The claim is made that “life satisfaction,” for example, can be measured to two-decimal place accuracy.  This begs the question, “Can complex constructs such as happiness or anxiety really be represented as a number like 7.26?”  For two giants of 20th century thought – the philosopher Ludwig Wittgenstein and the father of quantum physics, Niels Bohr – the answer to this question is an unequivocal “no.”

 

Surely common sense itself dictates that constructs such as happiness, anxiety and well-being cannot be captured in a single number?  In his book Three Seductive Ideas, the Harvard psychologist Jerome Kagan draws on the writings of Bohr and Wittgenstein to argue that measures of constructs such as happiness cannot be represented as numbers.  He writes: “The first premise is that the unit of analysis … must be a person in a context, rather than an isolated characteristic of that person.”  Wittgenstein and Bohr (independently) arrived at the conclusion that what is measured cannot be separated from the measurement context.  It follows that when an individual’s happiness is being measured, a description of the questions on the Pisa questionnaire must appear in the measurement statement because these questions help define what the measurer means by the word happiness.

Kagan rejects the practice of reporting the measurement of complex psychological constructs using numbers: “The contrasting view, held by Whitehead [co-author of the Principia Mathematica] and Wittgenstein, insists that every description should refer to … the circumstances of the observation.”  The reason for including a description of the measuring instrument isn’t difficult to see.  Kagan points out that “Most investigators who study “anxiety” or “fear” use answers on a standard questionnaire or responses to an interview to decide which of their subjects are anxious or fearful.  A smaller number of scientists ask close friends or relatives of each subject to evaluate how anxious the person is.  A still smaller group measures the heart rate, blood pressure, galvanic skin response, or salivary level of subjects.  Unfortunately, these three sources of information rarely agree.”

 

Given that a change in the measuring tool means a change in the reported measurement, one must include a description of the measuring instrument in order to “communicate unambiguously,” as Bohr expressed it.  One can never simply write “happiness = 4.29” (as in Pisa tables) because there is no such thing as an instrument-independent measure of happiness.  We have no idea what happiness is as a thing-in-itself.  Kagan notes the implications for psychologists of the measurement principles set out by Niels Bohr: “Modern physicists appreciate that light can behave as a wave or a particle depending on the method of measurement.  But some contemporary psychologists write as if that maxim did not apply to consciousness, intelligence, or fear.”  According to Bohr, when one reports psychological measurements, the requirement to describe the measurement situation means that ordinary language must replace numbers.  Werner Heisenberg summarised his mentor’s teachings: “If we want to say anything at all about nature – and what else does science try to do – we must pass from mathematical to everyday language.”

 

(To simplify matters somewhat, while numbers function perfectly well when observing the motion of a tennis ball or a star, the psychologist cannot observe directly the pupil’s happiness.  Bohr argued that there was “a deep-going analogy” between measurement in quantum physics and measurement in psychology because both were concerned with measuring constructs which transcend the limits of ordinary experience.  According to Bohr, because the physicist, like the psychologist (in respect of attempts to measure happiness), cannot directly experience electrons and photons, “physics concerns what we can say about nature,” and numbers must therefore give way to ordinary language.)

 

The arguments advanced above apply, without modification, to Pisa’s core activity of measuring pupil ability.  A simple thought experiment (first reported in the TES of 26.07.2013) makes this clear.  Suppose that a pupil is awarded a perfect score in a GCSE mathematics examination.  It seems sensible to conjecture that if Einstein were alive, he too would secure a perfect score on this mathematics paper.  Given the title on the front page of the examination paper, one has the clear sense that the examination measures ability in mathematics.  Is one therefore justified in saying that Einstein and the pupil have the same mathematical ability?

 

This paradoxical outcome results from the erroneous treatment of mathematical ability as something entirely divorced from the questions which make up the examination paper.  It is clear that the pupil’s mathematical achievements are dwarfed by Einstein’s; to ascribe equal ability to Einstein and the pupil is to communicate ambiguously.  To avoid the paradox one simply has to detail the measurement circumstances in any report of attainment and say: “Einstein and the pupil have the same mathematical ability relative to this particular GCSE mathematics paper.”  By including a description of the measuring instrument one is, in effect, making clear the restrictive meaning which attaches to the word “mathematics” as it is being used here; school mathematics omits whole areas of the discipline familiar to Einstein such as non-Euclidean geometry, tensor analysis, vector field theory, Newtonian mechanics, and so on.

 

As with the measurement of happiness, when one factors in a description of the measuring instrument, the paradox dissolves away.  Alas for Pisa, the move from numbers to language also dissolves away that organisation’s much-lauded rank orders.  Little wonder that Wittgenstein described the reasoning which underpins the statistical model (Item Response Theory) at the heart of the Pisa rankings as “a disease of thought.”

 

This brings us to the very serious implications for the new GCSE grade 5, of the arguments set out above.  The fact that a switch from numbers to language invalidates entirely the practice of ordering countries according to the efficacy of their education systems has profound implications for the validity of claims made concerning the new GCSE grade 5.  Given the assertion that grade 5 reflects the academic standards of high performing international jurisdictions as identified by their Pisa ranks, what possible justification can be offered for assigning a privileged role to the GCSE grade 5 in school performance tables?

 

To date, Pisa rankings have not impacted directly on the life chances of particular children in this country.  This would change if individual pupils failing to reach the grade 5 standard were construed as having fallen short of international standards (whatever that means).  If one accepts the reasoning of Wittgenstein and Bohr, grade 5 can represent nothing more than a standard somewhere between grade 4 and grade 6.  Any attempt to accord it special status, thereby giving it a central role in the EBacc and/or performance tables, risks exposing the new GCSE grading scale to ridicule.

Dr Hugh Morrison, The Queen’s University of Belfast (retired)

 

 

 

Rate this:

Why there is little cause to be happy with the new GCSE grade 5

19 Friday May 2017

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

Albert Einstein, Alfred North Whitehead, Happiness, Item Response Theory, Jerome Kagan, ludwig Wittgenstein, New GCSE Grade 5, Niels Bohr, OECD Pisa, Principia Mathematica, TES, Times Educational Supplement, William Stewart

The OECD’s Programme for International Student Assessment (Pisa) has now taken the bold step of analysing measures of “happiness,” “well-being” and “anxiety” for individual countries (see New Pisa happiness table, TES 19.04.2017 https://www.tes.com/news/school-news/breaking-news/new-pisa-happiness-table-see-where-uk-pupils-rank).

The claim is made that “life satisfaction,” for example, can be measured to two-decimal place accuracy.  This begs the question, “Can complex constructs such as happiness or anxiety really be represented as a number like 7.26?”  For two giants of 20th century thought – the philosopher Ludwig Wittgenstein and the father of quantum physics, Niels Bohr – the answer to this question is an unequivocal “no.”

 

Surely common sense itself dictates that constructs such as happiness, anxiety and well-being cannot be captured in a single number?  In his book Three Seductive Ideas, the Harvard psychologist Jerome Kagan draws on the writings of Bohr and Wittgenstein to argue that measures of constructs such as happiness cannot be represented as numbers.  He writes: “The first premise is that the unit of analysis … must be a person in a context, rather than an isolated characteristic of that person.”  Wittgenstein and Bohr (independently) arrived at the conclusion that what is measured cannot be separated from the measurement context.  It follows that when an individual’s happiness is being measured, a description of the questions on the Pisa questionnaire must appear in the measurement statement because these questions help define what the measurer means by the word happiness.

Kagan rejects the practice of reporting the measurement of complex psychological constructs using numbers: “The contrasting view, held by Whitehead [co-author of the Principia Mathematica] and Wittgenstein, insists that every description should refer to … the circumstances of the observation.”  The reason for including a description of the measuring instrument isn’t difficult to see.  Kagan points out that “Most investigators who study “anxiety” or “fear” use answers on a standard questionnaire or responses to an interview to decide which of their subjects are anxious or fearful.  A smaller number of scientists ask close friends or relatives of each subject to evaluate how anxious the person is.  A still smaller group measures the heart rate, blood pressure, galvanic skin response, or salivary level of subjects.  Unfortunately, these three sources of information rarely agree.”

 

Given that a change in the measuring tool means a change in the reported measurement, one must include a description of the measuring instrument in order to “communicate unambiguously,” as Bohr expressed it.  One can never simply write “happiness = 4.29” (as in Pisa tables) because there is no such thing as an instrument-independent measure of happiness.  We have no idea what happiness is as a thing-in-itself.  Kagan notes the implications for psychologists of the measurement principles set out by Niels Bohr: “Modern physicists appreciate that light can behave as a wave or a particle depending on the method of measurement.  But some contemporary psychologists write as if that maxim did not apply to consciousness, intelligence, or fear.”  According to Bohr, when one reports psychological measurements, the requirement to describe the measurement situation means that ordinary language must replace numbers.  Werner Heisenberg summarised his mentor’s teachings: “If we want to say anything at all about nature – and what else does science try to do – we must pass from mathematical to everyday language.”

 

(To simplify matters somewhat, while numbers function perfectly well when observing the motion of a tennis ball or a star, the psychologist cannot observe directly the pupil’s happiness.  Bohr argued that there was “a deep-going analogy” between measurement in quantum physics and measurement in psychology because both were concerned with measuring constructs which transcend the limits of ordinary experience.  According to Bohr, because the physicist, like the psychologist (in respect of attempts to measure happiness), cannot directly experience electrons and photons, “physics concerns what we can say about nature,” and numbers must therefore give way to ordinary language.)

 

The arguments advanced above apply, without modification, to Pisa’s core activity of measuring pupil ability.  A simple thought experiment (first reported in the TES of 26.07.2013) makes this clear.  Suppose that a pupil is awarded a perfect score in a GCSE mathematics examination.  It seems sensible to conjecture that if Einstein were alive, he too would secure a perfect score on this mathematics paper.  Given the title on the front page of the examination paper, one has the clear sense that the examination measures ability in mathematics.  Is one therefore justified in saying that Einstein and the pupil have the same mathematical ability?

 

This paradoxical outcome results from the erroneous treatment of mathematical ability as something entirely divorced from the questions which make up the examination paper.  It is clear that the pupil’s mathematical achievements are dwarfed by Einstein’s; to ascribe equal ability to Einstein and the pupil is to communicate ambiguously.  To avoid the paradox one simply has to detail the measurement circumstances in any report of attainment and say: “Einstein and the pupil have the same mathematical ability relative to this particular GCSE mathematics paper.”  By including a description of the measuring instrument one is, in effect, making clear the restrictive meaning which attaches to the word “mathematics” as it is being used here; school mathematics omits whole areas of the discipline familiar to Einstein such as non-Euclidean geometry, tensor analysis, vector field theory, Newtonian mechanics, and so on.

 

As with the measurement of happiness, when one factors in a description of the measuring instrument, the paradox dissolves away.  Alas for Pisa, the move from numbers to language also dissolves away that organisation’s much-lauded rank orders.  Little wonder that Wittgenstein described the reasoning which underpins the statistical model (Item Response Theory) at the heart of the Pisa rankings as “a disease of thought.”

 

This brings us to the very serious implications for the new GCSE grade 5, of the arguments set out above.  The fact that a switch from numbers to language invalidates entirely the practice of ordering countries according to the efficacy of their education systems has profound implications for the validity of claims made concerning the new GCSE grade 5.  Given the assertion that grade 5 reflects the academic standards of high performing international jurisdictions as identified by their Pisa ranks, what possible justification can be offered for assigning a privileged role to the GCSE grade 5 in school performance tables?

 

To date, Pisa rankings have not impacted directly on the life chances of particular children in this country.  This would change if individual pupils failing to reach the grade 5 standard were construed as having fallen short of international standards (whatever that means).  If one accepts the reasoning of Wittgenstein and Bohr, grade 5 can represent nothing more than a standard somewhere between grade 4 and grade 6.  Any attempt to accord it special status, thereby giving it a central role in the EBacc and/or performance tables, risks exposing the new GCSE grading scale to ridicule.

Dr Hugh Morrison, The Queen’s University of Belfast (retired)

 

 

 

Rate this:

Peter Tymms misunderstands the nature of measurement in psychology and education

22 Wednesday Feb 2017

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

academic selection at 11, AQE Transfer Test, CEM Durham University, Democratic Unionist Party, Dr Hugh Morrison, GL Assessment tests, Item Response Theory, Joel Michell, Northern Ireland Education Minister, pathological science, Peter Weir MLA, Professor Peter Tymms, psychometrics, Rasch Model

Why Peter Tymms’ grasp of the fundamentals of measurement in psychology/education disqualifies him from any role in determining the future of transfer testing in Northern Ireland.

Dr Hugh Morrison

The case that Professor Tymms misunderstands the nature of measurement in psychology and education

Professor Peter Tymms is a long-time proponent of the central role that latent variables play in modern psychometrics.  The Item Response model advanced by Georg Rasch has an important place in his research.  I will argue in what follows that those who advance Item Response Theory approaches in general, and Rasch modelling in particular, have failed to understand the true complexity of the central predicate “ability.”  Wittgenstein stressed that ability is something potential, a capacity rather than a state.  Individuals are carriers of potentiality and not states.  Psychology is concerned interactions and not the intrinsic properties of the relata involved; relations have definite properties while the relata themselves are indefinite.  Peter Tymms has failed to appreciate the indeterminacy of the mental.

Measurement in psychology/education is never a process of “checking up” on what is already in the mind/brain of the individual.  Rather, unlike measurement in Newtonian physics, the act of measuring transforms a potentiality to a definite state.  Measurement in psychology and education should not be concerned with what ability is, but must settle for what can be said about ability.  Michell is right to claim that psychometrics is “pathological science,” and that measurement in psychology is “at best speculation and, at worst, a pretence at science.”  Trendler’s (2011) claim that measurement theorists should abandon all attempts to repair psychometrics is surely justified.  All proponents of Item Response Theory, including Professor Tymms, subscribe to the “reflective model” in which variation in the latent variables is viewed as prior to variation in the manifest variables.  Alas, the reverse is true.

Borsboom, Mellenbergh & van Heerden (2003, p. 217), writing in one of psychology’s most respected journals, highlight the incoherence of this entire approach to measurement: “It will be felt that there are certain tensions in this article.  We have not tried to cover these up, because we think they are indicative of some fundamental problems in psychological measurement and require clear articulation. … And although the boxes, circles, and arrows in the graphical representation of the model suggest that the model is dynamic and applies to the individual, on closer scrutiny no such dynamics are to be found.  Indeed, this has been pinpointed as one of the major problems of mathematical psychology by Luce (1997): Our theories are formulated in a within-subjects sense, but the models we apply are often based solely on between-subjects comparisons.”

Item Response Theory omits entirely the human practices (reading, arithmetic, and so on) into which the child is enculturated by teachers and parents.  This is the all-important “environment” in which the child participates, an environment which Item Response Theory is powerless to represent.  Item Response Theorists posit “abilities” hidden in the mind/brain which are the source of the child’s test responses.  However, the Nobel laureate Herbert Simon dismissed such reasoning: “Human rational behaviour is shaped by a scissors whose blades are the structure of task environments and the computational capabilities of the actor.”  The scissor metaphor is a reference to Alfred Marshall’s puzzlement over which scissor blade actually cuts a piece of cloth – the top blade or the bottom.  The lesson for psychometrics is that omitting the environment of academic practices in which the child participates will produce nonsense.

One can find the source of Item Response Theory’s difficulty in Niels Bohr’s 1949 paper entitled Discussion with Einstein on Epistemological Problems in Atomic Physics.  Few scientists have made a greater contribution to the study of measurement than the Nobel laureate and founding father of quantum theory, Niels Bohr.  Given Bohr’s preoccupation what the scientist can say about aspects of reality which are not visible (electrons, photons, and so on), one can understand his constant references to measurement in psychology; “ability” cannot be seen directly, rather, like the microentities that manifest as tracks in particle accelerators, ability manifests in the individual’s responses to test items.  Assessment is concerned with “measuring” something which the measurer cannot experience directly, namely, the ability of the examinee.

Quantum theory and psychology have not shown the same willingness to acknowledge the limitations of measurement in their respective disciplines.  While physics has made no attempt to disguise its “measurement problem” (it is acknowledged in every undergraduate textbook), Michell (1997) has accused psychometricians of simply closing down all debate on measurement and suffering from a “methodological thought disorder.”  Michell’s concerns about the reluctance of psychometricians to engage in debate about the fundamentals of measurement, when set alongside the clear acknowledgement of a measurement problem in physics, bring to mind the words of the French moralist Joseph Joubert: “It is better to debate a question without settling it than to settle it without debating it.”

Item Response Theory relies on a simple inner/outer picture for its models to function.  The inner (a realm of timeless, unobserved latent variables, or abilities) is treated as independent of the outer (here examinees write or speak responses at moments in time).  This is often referred to as a “reservoir” model in which timeless (hidden) abilities are treated as the source of the individual’s (public) responses given at specific moments in time.

As early as 1929 Bohr rejected this simplistic thinking in strikingly general terms: “Strictly speaking, the conscious analysis of any concept stands in a relation of exclusion to its immediate application.  The necessity of taking recourse to a complementary … mode of description is perhaps most familiar to us from psychological problems.”  Now what did Bohr mean by these words?  Consider, for example, the concept “quadratic.”  It is tempting to adopt a reservoir approach and trace a pupil’s ability to apply that concept in accord with established mathematical practice to his or her having the formula in mind.  The guidance offered by the formula in mind (Bohr’s reference to “conscious analysis”) accounts for the successful “application,” for example, to the solution of specific items on an algebra test.

However, this temptingly simplistic model in which the formula is in the unobserved mental realm and written or spoken applications of the concept “quadratic” take place in the observed public realm, contains a fundamental flaw; the two realms cannot be meaningfully linked up.  The “inner” formula (in one realm) gets its guidance properties from human practices (in the other realm).  A formula as a thing-in-itself cannot guide; one has to be trained in the established practice of using the formula before it has guidance properties.  In school mathematics examinations around the world, pupils are routinely issued with a page of formulae relevant to the examination.  Alas, it is the experience of mathematics teachers everywhere that simply having access to the formula as a thing-in-itself offers little or no guidance to the inadequately trained pupil.  The formula located in one realm cannot connect with the applications in the other.

Wittgenstein teaches that no formula, rule, principle, etc. in itself can ever determine a course of action.  The timeless mathematical formula in isolation cannot generate all the complexities of a practice (something which evolves in time); rather, as Michael Oakeshott puts it, a formula is a mere “abridgement” of the practice – the practice is primary, with the formula, rule, precept etc. deriving its “life” from the practice.

Returning to Bohr’s writing, it is instructive to explain his use of the word “complementarity” in respect of psychology and to interpret the meaning of the words: “stands in a relation of exclusion.”  Complementarity is the most important concept Bohr bequeathed to physics.  It involves a combination of two mutually exclusive facets.  In order to see its relevance to the validity of IRT modelling, let’s return to the two distinct realms.

We think of the answers to a quadratic equation (of course, a typical school-level quadratic equation has two distinct answers) as being right or wrong.  In the realm of application this is indeed the case; when the examinee is measured, his or her response is pronounced right or wrong dependent upon its relation to established mathematical practice.  However, in the unobserved realm, populated by rules, formulae and precepts (as things-in-themselves), any answer to a quadratic equation is simultaneously right and wrong!

A formula as a thing-in-itself cannot separate what accords with it from what conflicts with it, because there will always exist an interpretation of the formula for which a particular answer is correct, and another interpretation for which the same answer can be shown to conflict with the formula.  Divorced from human practices, the distinction between right and wrong collapses.  (This is a direct consequence of Wittgenstein celebrated “private language” argument.)  This explains Bohr’s reference to a “relation of exclusion.”  In simplistic terms, the unobserved realm, in which answers are compared with the formula for solving quadratics, responses are right-and-wrong, while in the observed realm, where answers are compared with the established practice, responses are right-or-wrong.

On this reading, ability has two mutually exclusive facets which cannot meaningfully be separated.  The distinguished Wittgenstein scholar, Peter Hacker (1997, p. 250), captures this situation as follows: “grasping an explanation of meaning and knowing how to use the word explained are not two independent abilities but two facets of one and the same ability.”  Ability, construed according to Bohr’s complementarity, is indefinite when unobserved and definite when observed.  Moreover, this definite measure is not an intrinsic property of the examinee, but a property of the examinee’s interaction with the measuring tool.  According to complementarity, the “inner” and the outer are not two separate localities which somehow connect.  As Herbert A. Simon realized, one cannot dispense with either of Hacker’s facets and hope to construe ability correctly; both are vital to the predicate “ability.”  Whitaker’s (1996, p. 184) definition of complementarity captures this situation: “mutual exclusiveness and joint completion.”

Measurement of ability is not a matter of passively checking up on what already exists – a central tenet of Item Response Theory.  Bohr teaches that the measurer effects a radical change from indefinite to definite.  Pace Item Response Theory, measurers, in effect, participate in what is measured.  No item response model can accommodate the “jump” from indefinite to definite occasioned by the measurement process.  All IRT models mistakenly treat unmeasured ability as identical to measured ability.  What scientific evidence could possibly be adduced in support of that claim?  No Item Response model can represent ability’s two facets because all such models report ability as a single real number, construed as an intrinsic property of the measured individual.

Finally, in order to highlight the incoherence of the type of measurement model advocated by Peter Tymms, it is instructive to consider a thought experiment in which a primary school child responds to the addition problem: “68 + 57 = ?”  In the appendix below the erroneous thinking of the psychometrician is adopted in that the child’s ability is considered to be a mental state which is the source of his or her response.  It is demonstrated that all of the facts about the child (his or her complete history of responses to addition problems and complete information about the contents of his or her mind) are in keeping with the answer “68 + 57 = 125.”  Unfortunately, all of the facts are also in keeping with the answer “68 + 57 = p” where p is ANY number; someone with complete information has to conclude that the child is right and wrong at the same time.

Appendix

Setting the Scene

Consider the simplest of measurement situations encountered in psychology and education.  How can one establish if a student in the early years of formal schooling understands how to use the “+” sign?  The student whose grasp of the “+” sign is being measured has been taught to add, but has not yet encountered the problem “68 + 57 = ?” This problem has been selected at random but the argument generalises to any rule-governed activity (Kripke, 1982).  The measurement situation is broken into two phases: the phase immediately before the student answers and the phase during which the answer is spoken or written.

This offers two perspectives on the student’s understanding of the “+” sign.  The idea that understanding how to use the “+” sign is an “inner” mental state, activity or process has enormous appeal.  The temptation to reason that the student’s first-person perspective on his or her understanding is superior to the measurer’s third-person perspective is almost irresistible, since the measurer has to settle for mere behaviour.  It is difficult to escape the impression that the student has privileged access to his or her grasp of the meaning of “+,” because to mean is surely to have something in mind? (Putnam, 1988).  On the other hand, the measurer must settle for the mere manifestations of that understanding.  The student seems to have first-person direct access to the thing-in-itself, namely, his or her understanding of “+” while the third-person perspective is associated with indirect access.  The third-person perspective involves observation of the student exercising his or her understanding rather than the understanding itself; it would seem that the student alone can “observe” understanding because it is a mental process.  This enticing simple Cartesian picture of the “inner world” is clearly compelling.

This idealised measurement situation will be used to argue that, before the student answers the problem “68 + 57 = ?” (immediately prior to measurement) the totality of facts about the student are in keeping with the student intending to give the right answer and with the intention to give one of an indefinite number of wrong answers.  It is established that the state “understands the ‘+’ sign” and the state “doesn’t understand the ‘+’ sign” both can be simultaneously ascribed to the unmeasured student.  In short, it is meaningless to assign a definite grasp of the “+” sign to an unmeasured individual.

In quantum theory, unmeasured quantum entities are characterised by superpositions which are portrayed as being “here” and “there” simultaneously.  For unmeasured quantum entities, the notion of a definite location is unintelligible.  However, when the quantum entity is measured it assumes a definite position and is characterized as either here or there.  Consider the measurement of an individual’s ability to respond to the addition problem: “68 + 57 = ?”  The case will be made that when the psychologist focuses on the unmeasured ability of the individual, all the facts about that individual can be shown to be in keeping with the individual being both “right” and “wrong” (with respect to the addition problem) at the same time.  It will be demonstrated that, immediately prior to the act of measurement, an individual’s understanding of the “+” sign is entirely indeterminate, with the categories “right” and “wrong” being applicable simultaneously.

It will be argued that someone with complete information about the student’s past achievements in addition, together with complete information about his or her mental states, would find it impossible to use this information to predict the student’s answer to any simple addition problem in an infinity of cases.  It will be demonstrated below that before a measurement is made – for example, before a student says or writes the answer to the question “68 + 57 = ?” – at that moment, all of the known facts about the student are in keeping with the correct answer “125” and an incorrect answer – “5,” – for example.  A rule as a thing-in-itself can never determine an action.  The student’s mathematical ability with respect to the question “68 + 57 = ?” is completely indeterminate prior to the statement of the answer.

At the moment prior to answering the question “68 + 57 = ?” all of the facts about the student accord with a correct response and an infinity of incorrect responses.  Before the act of measurement, if one restricts oneself to the totality of facts about the individual (inner and outer), then the notion of accord or conflict with a mathematical rule breaks down entirely.  Clearly, in conditions where responses can be deemed simultaneously right and wrong, the very notion of correctness has become unintelligible.  While the student is characterised as both right and wrong with respect to the question

“68 + 57 = ?” immediately before he or she responds, at the instant of responding the student is deemed correct if the answer is 125, and incorrect if the student answers 5.  Immediately before answering the student is right and wrong.  The moment the answer is articulated the student is right or wrong.  In short, measurement isn’t a matter of checking up an existing attribute (as in Newtonian physics); measurement effects radical change.

Having set the scene for what is to come, the case will now be made that it is meaningless to ascribe a definite ability to an unmeasured individual; the ascription of a definite ability is only meaningful in a measurement context.  The idea that psychological measurement owes more to quantum measurement principles than to Newtonian mechanics depends on this case being made.  This is achieved by calling on Wittgenstein’s later philosophy and the remainder of this section is given over entirely to this single task.  Wittgenstein’s writings on first-person ascription of ability are essential to developing a measurement model with the individual at its core.

No Rule or Formula can Determine its own Continuation

An idea that Anscombe (1985) traces back to Leibniz (1646-1716) is instructive for preparing the reader for Wittgenstein’s insights into the role rule-following plays in thinking about psychological measurement.  Leibniz noticed that no formula or rule can fix its own continuation: any number can be regarded as the correct continuation of a rule on some interpretation.  He pointed out that an indefinite number of rules are consistent with any finite segment of a series.  Anscombe illustrates Leibniz’s thinking using the extension of a simple series such as ‘2, 4, 6, 8, …’

[A]lthough an intelligence tester may suppose that there is only one possible continuation to the sequence 2, 4, 6, 8, … , mathematical and philosophical sophisticates know that an indefinite number of rules (even rules stated in terms of mathematical functions as conventional as polynomials) are compatible with any such finite initial segment.  So, if the tester urges me to respond, after, 2, 4, 6, 8, … , with the unique appropriate next number, the proper response is that no such unique number exists. … The intelligence tester has arbitrarily fixed on one answer as the correct one. (Anscombe, 1985, pp. 342-343)

Consider the series completion problem Anscombe (1985) proposes.  The student is presented with the first four terms of an infinite series: 2, 4, 6, 8 …  He or she is then required to “go on in the same way” by the teacher.  An infinite number of formulations will generate the four numbers 2, 4, 6 and 8 but differ on the fifth term (and all terms thereafter).  For example, the formula:

Un = 2n – (1/24)(n – 1)(n – 2)(n – 3)(n – 4)

generates: 2, 4, 6, 8, 9, …

while

Un = 2n + 45(n – 1)(n – 2)(n – 3)(n – 4)

generates  2, 4, 6, 8, 1090, …

and finally

Un = 2n – 3(n – 1)(n – 2)(n – 3)(n – 4)

generates  2, 4, 6, 8, -62, … .

 

In summary, an indefinite number of different continuations can be shown to accord with any finite segment of an arithmetical series.  By careful selection, any number can be offered for the fifth term of the series.  One’s immediate reaction to the final series given above is that, in writing -62, the student has made a mistake.  In writing 2, 4, 6, 8 the student is following the correct rule but it appears that in writing -62 he or she has erroneously switched to a new rule.  But it is also possible that the student acted consistently throughout, always applying the same formula, namely,

Un = 2n – 3(n – 1)(n – 2)(n – 3)(n – 4)

to generate all five terms.  This (albeit highly unusual) student could rightly claim to be “going on in the same way” when he or she wrote down -62 as the fifth term.

The student’s claim that he or she was simply continuing the rule exhibited by the first four terms is completely defensible since there are an infinite number of rules which begin ‘2, 4, 6, 8’ but diverge on the next term and all terms thereafter.  It can be claimed that the student did continue in the same way but the student’s way was at odds with the teacher’s intention when the teacher instructed the student to “go on in the same way.”  Unfortunately, “Finite behaviour cannot constrain its interpretation to within uniqueness” (Wright, 2001, p. 98), so what makes the student’s continuation wrong and the teacher’s right?

Wittgenstein’s writings on rule-following do not, for a moment, imagine that real children in real classrooms extend this series of four even numbers as “2, 4, 6, 8, 1090, …” or “2, 4, 6, 8, -62,” for example.

It is a conspicuous feature of these case-histories that the misunderstandings are often widely improbable, and we may wonder why this is so.  Evidently, the reason cannot be that Wittgenstein believed that such extreme misunderstandings are at all likely or that a teacher would need to guard against them in real life.  So what is the explanation of his preoccupation with improbable misunderstandings? Wittgenstein’s point is not that such misunderstandings are probable, but only that they are possible.  They are possible because, if the lesson only proceeds by examples, there will always be many different specifications of the meaning of the word that are satisfied by any finite sequence of examples, and so the student can always pick a specification that was not intended by the teacher.  However, if the lesson has been well designed with carefully chosen examples, there will only be one natural way of interpreting them – or perhaps there will be minor variations, to be excluded by further examples.  If, on the other hand, the teacher tries to close the gap by offering a definition of a problematic word, the words used in the definition will present the same problem again. (Pears, 2006, p. 18)

“The idea here is that instructions for following a rule underdetermine the correct way to follow the rule … if we consider instructions and explanations as involving the provision of a finite number of examples then there are indefinitely many compatible functions or ways of going on from those examples” (Panjvani, 2008, p. 307).  Schroeder (2006, p. 189) concludes: “So, any rule, even the most explicit one, can be misunderstood; and in endless ways too: whichever way the student continues the series, his writing can always be regarded as in accordance with the rule – on a suitable interpretation.”

This problem extends beyond mathematics to all rule following.  Bloor (1997, p. 10) stresses that: “This does not just apply to number sequences.  Teaching someone the word ‘red’ is, in a sense, teaching them the rule for using the word.  This too involves moving from a finite number of examples to an open-ended, indefinitely large range of future applications”.  The problems associated with infinite rules also apply to rules with a finite number of applications.  Kripke (1982, p. 7) comments: “Following Wittgenstein, I will develop the problem initially with respect to a mathematical example, though the relevant sceptical paradox applies to all meaningful uses of language.”  Finally, McGinn (1997, p. 77) notes: “Nor is this problem restricted to the mathematical case.  For any word in my language, we can come up with alternative interpretations of what I mean by it that are compatible with both my past usage and any explicit instruction that I might have given myself.”

Every teacher is aware that students do succeed in moving from a finite set of examples to apply the rule in new cases and so it is tempting to conclude that successful understanding somehow results in the student having in mind a rule which accords with the examples but somehow transcends them.  Furthermore, it would seem that the student’s ability to apply the rule beyond the finite example set is explained by positing that the student, when confronted with novel applications, is guided by the rule that he or she has “in mind.”  Bloor (1997, p. 11) summarises this seductively simple model of how rules are grasped:

It is tempting to suppose that when a teacher is using examples to convey the meaning of a word, the teacher has something ‘in mind,’ and the finite number of examples are just a fragmentary substitute for what is really meant.  If only the student could look directly into the mind of the teacher then how simple life would be: they too would have access to the state of understanding that is the source of the teacher’s ability to follow the rule.  Of course, they can’t look into the teacher’s mind but, the argument goes, they will only have established an understanding when they can reach beyond all the examples. (Bloor, 1997, p. 12)

 

‘Understanding’ is a Vague Concept (Wittgenstein, 1983, VI, §13)

Consider the case of simple addition of two natural numbers.  Almost everyone who has been to school for a few years is confident that they have understood the meaning of the sign “+.”  They feel sure they understand how to use the rule for the “+” sign.  This confidence exists despite the fact that no individual has carried out the infinitely long list of computations associated with the “+” sign.  Given that individuals live for only a finite number of years and there are an infinite number of natural numbers x and y for which x + y can be computed, it follows that for any individual it will always be possible to identify values of x and y for which they have not yet computed x + y.

The student in the early-years of primary school, and just coming to terms with addition, may have computed x + y for all values of x and y less than 57, for example, so that the computation “68 + 57”, say, is not part of the student’s short computational history.  The calculation “68 + 57” would be a novel computation for this student.  The point to be emphasised is that for every individual it will always be possible to identify an addition problem that this individual has not previously encountered.

Kripke (1982) considers the case of a student who has not yet performed the computation “68 + 57.”  The student has mastered additions with smaller arguments such as “23 + 34,” “19 + 27” and “50 + 51” but hasn’t carried out the calculation “68 + 57.”  Kripke (1982) conjures up a “bizarre sceptic” who suggests to the student that, based on the examples the teacher used when instructing him or her to add and all the addition questions the student has completed to date, he or she should now give the answer “5” in response to the question “68 + 57 = ?”  Kripke (1982, p. 8) has his bizarre sceptic pose a simple question:

This sceptic questions my certainty about my answer … Perhaps, he suggests, as I used the term ‘plus’ in the past, the answer I intended for ’68 + 57’ should have been ‘5’!  Of course the sceptic’s suggestion is obviously insane.  My initial response to such a suggestion might be that the challenger should go back to school and learn to add.

The sceptic’s case is based on the fact that the student (in keeping with the rest of humanity) has only ever carried out a finite number of computations.  Needless to say, seeing off the claim that, given the student’s computation history, he or she should answer “5” to the question “68 + 57 = ?” the student will claim that the function he or she associated with “+” sign in all past computations requires that the answer “125” be given.  But Kripke (1982, pp. 8-9) points out that the finite set of addition problems completed in the past can be variously interpreted:

But who is to say what function this was?  In the past I gave myself only a finite number of examples instantiating this function.  All, we have supposed, involved numbers smaller than 57.  So perhaps in the past I used ‘plus’ and ‘+’ to denote a function I will call ‘quus’ and symbolise by ‘Å.’  It is defined by:

x Å y = x + y   if x, y < 57

x Å y = 5                     otherwise

Who is to say that this is not the function I previously meant by ‘+’?

The individual’s difficulty in seeing off the sceptic is that all of his or her past computations are for values of x and y less than 57 and for these values ‘+’ and ‘Å’ yield identical values.  It is only for values of x and y greater than or equal to 57 that differences occur.  It is only for numbers greater than or equal to 57 that addition and quaddition (someone using the quus function is said to be engaged in quaddition) give different results.

Kripke (1982) points out that the facts of interest to the sceptic are to be found in two distinct realms: the “outer” realm of past computations, and the “inner” realm of the mind.  If all the facts from the student’s history of past computations are consistent with the function ‘plus’ and with the function ‘quus,’ then maybe an examination of the student’s mental history (facts about the contents of the student’s mind) will decide whether he or she should answer “125” or “5” in order to be consistent with his or her past history of computation.  Kripke (1982) permits the individual responding to the sceptic’s challenge to have unlimited and infallible access to past computations (outer) and past mental states and processes (inner). Kripke (1982) frequently makes reference to what an omnipotent, omniscient, all-seeing God – who has access to every aspect of the individual’s computational history and to his or her thought processes – would see if He were to look into the student’s mind.

The evidence is not to be confined to that available to an external observer, who can observe my overt behaviour but not my internal mental state.  It would be interesting if nothing in my external behaviour could show whether I meant plus or quus, but something about my inner state could.  But the problem here is more radical. … whatever ‘looking into my mind’ may be, the sceptic asserts that even if God were to do it, he still could not determine that I meant addition by ‘plus.’ (Kripke, 1982, p. 14)

Kripke (1982) argues that all of the facts about the student – all the (outer) facts about the student’s computational history and all the (inner) facts about the student’s mental states and processes – are consistent both with that individual understanding the orthodox addition function and the contrived “quus” function by “+” sign.

The sceptic does not argue that our own limitations of access to the facts prevent us from knowing something hidden.  He claims that an omniscient being, with access to all available facts, still would not find any fact that differentiates between the plus and the quus hypotheses. (Kripke, 1982, p. 39)

The temptation, of course, is to accept that while any finite series of computations can be made to accord with both ‘plus’ and ‘quus’ functions, an examination of the individual’s mind would turn up a fact or facts that would discriminate between the functions.  It is instructive to illustrate how the sceptic refutes such arguments.

When individuals learn to add, the following notion has considerable appeal: if they grasp the meaning of the “+” sign then this understanding of the addition rule fixes unique responses to addition problems they will encounter in the future.  The notion of understanding seems to function in “contractual” terms (McDowell, 1998, p. 221).  Given the student’s past grasp of the “+” sign and history of successful problem-solving in respect of that symbol, he or she seems contracted to reply in accord with this understanding.  In this model, to understand is to have something in mind which is the source of subsequent behaviour in that it contracts the individual who understands addition to reply “125” to the question “68 + 57 = ?”

But what makes the individual’s understanding with respect to the ‘+’ sign, a grasp of adding rather than quadding?  Consider an individual who attaches the aberrant interpretation (quus) to his or her past computations.  If the student’s understanding of the “+” sign contracts him or her to use it according to the quaddition rule then the individual should reply “5” when asked to compute “68 + 57.”  In this case, to reply “125” is to fail to go on in the same way.  But what fact about the student’s understanding could be produced to convince the sceptic that he or she is contracted to follow the quaddition rule rather than the addition rule?  What fact about a student’s past grasp of the ‘+’ sign makes his or her present response of “5,” for example, right or wrong.  The finite number of additions the individual has completed to date – all involving arguments less than 57 – are consistent with understanding the ‘+’ sign in terms of the addition function and the quaddition function.

For the sceptic can point out that I have only ever given myself a finite number of examples manifesting this function, which all involved numbers less than 57, and that this finite number of examples is compatible with my meaning any one of an infinite number of functions by ‘+.’  … If the sceptic is right, then there is no fact about my past intention, or about my past performance, that establishes, or constitutes, my meaning one function rather than another by ‘+.’ (McGinn, 1997, p. 76)

Having a Formula Before one’s Mind

The common-sense view that understanding is a process or activity which happens in the mind has enormous appeal.  The notion that to understand is to have something in mind and that this understanding somehow fixes future behaviour in respect of how that understanding is exercised seems beyond challenge.  At the same time, it also seems obvious that the future behaviour referred to is somehow inferior to understanding as a thing-in-itself; understanding is the real thing, whereas behaviour is merely a particular manifestation of that understanding.  Understanding is construed as “inner” while behaviour is construed as outer.  Few dispute the thesis that while the individual somehow has direct access to his or her understanding, the person measuring that understanding, for example, has to settle for indirect access in the form of the individual’s behaviour.

First-person access to understanding seems superior to mere third-person access.  Furthermore, the inner and outer seem to be entirely independent realms; after all, someone who has understood addition has the clear sense that he or she will provide the correct answer to the question “2 + 2 = ?” in advance of writing or saying their answer.  This person doesn’t have to wait until they’ve responded in order to confirm their understanding to themselves.  Understanding (viewed as a property of the inner), in this instance, at least, seems quite independent of the subsequent behaviour in which that understanding is exercised.

And this seems obvious for, to be sure, other people must rely on my behaviour, on what I do and say, in order to discern what I am feeling or thinking.  So, it seems that they know how things are with me indirectly.  What they directly perceive is merely outward behaviour.  But I have direct access to what is inner, to my own mind.  I am conscious of how things are with me.  The faculty whereby I have such direct access to mental states, events and processes is introspection – and it is because I can introspect that I can say how things are with me without observing what I do and say. (Bennett & Hacker, 2003, pp. 84-85)

Kripke (1982) considers the case of having a formula in mind which can be introspected.  Could this provide the elusive fact that distinguishes the individual’s understanding of “+” as a grasp of the addition function rather than a grasp of the quaddition function?  Could this be the “inner fact” that determines “125” as the answer the student should give to “68 + 57 = ?” in order to keep faith with the student’s understanding of “+”?  This is an attractive option because one can conceive of a formula as something finite but which nevertheless has the capability of generating an infinity of responses.  If God were to look into the individual’s mind and spotted, say, the quus function defined above, He could predict with certainty that the individual is contracted to reply “5” to the question “68 + 57 = ?” in order to keep faith with his or her understanding.  One can also appreciate the popular appeal of this approach for one does feel that in solving quadratic equations, for example, one “calls to mind” the quadratic formula.  One can almost “see” the formula in one’s mind’s eye when solving quadratic equations.

The case against the notion of a formula as the sought-after fact which distinguishes adding from quadding will now be set out.  Alas, having a formula in mind will not satisfy the sceptic because a mental representation of the formula in itself cannot determine the individual’s response to any given problem.  A student who has not been instructed in the solution of quadratic equations but who has merely memorised the formula for solving quadratic equations will not be able to use it to solve algebraic problems involving quadratic equations.  Merely having the formula in mind is not enough to determine use.  The student must be trained in the use of the formula; simply having access to the formula in itself – whether in mind or on paper – doesn’t fix the response the student makes when presented with a problem requiring the use of the quadratic formula.

It is the practice in school mathematics examinations around the world to issue a booklet of formulae to examinees.  It is the near universal experience of mathematics teachers that students with inadequate training in the solution of quadratic equations may derive little value from having access to the relevant formula booklet.  This is because the formulae in themselves don’t fix behaviour.  If the formula in isolation fixed the student’s response, then every student issued with a formula booklet in a mathematics test would answer the quadratic question correctly.  To underline the limitations of having formulae in mind, Wittgenstein explores the circumstances under which sign-posts offer guidance.  Wittgenstein (1953, §85) states that in the Cartesian picture “A rule stands there like a sign-post.”  In this statement he is asking the reader to reflect on the property of a wooden sign-post which enables it to serve as a guide to behaviour.

Considered in itself a sign-post is just a board or something similar, perhaps bearing an inscription, on a post.  Something so described does not, as such, sort behaviour into correct and incorrect – behaviour that counts as following the sign-post and behaviour that does not” (McDowell, 1992, p. 41).

One is forced to conclude that, despite having all the facts (inner and outer) about the student, these alone can’t determine in advance that his or her response will be “125” or “5.”  Causality has broken down.  A defining principle of Newtonian mechanics is that if one has complete information about any system one can always predict what will happen next with certainty.  Newtonian determinism fails in respect of elementary rule-following: it seems that there are matters which influence the student’s response which are beyond the totality of inner and outer facts.

A quantum physicist would feel entirely at home with Kripke’s (1982) way of expressing his interpretation of Wittgenstein’s philosophy: “If even God, who can see all the facts about the past (and into your mind), could not know that you meant addition then that doesn’t illustrate limitations on God’s knowledge.  It shows that there is in this case no fact for him to know” (Ahmed, 2007, p. 102).

Also, since one can look back upon the individual’s present understanding of “+” from a vantage point in the future, it follows that there is no (inner or outer) fact about what he or she currently understands by the “+” sign.  It follows that before this student is measured (before he or she responds to the question “68 + 57 = ?”) the totality of facts (inner and outer) are in keeping with the conclusion that the student’s state is a superposition of right and wrong.  The student is in an indefinite state with respect to his or her grasp of “+” because all the facts from the two relevant provinces (the inner and outer) are in keeping with the answers “125” and “5.”

Before continuing, it is important to realise that Wittgenstein does not deny that when solving mathematical problems one often has the sense that the relevant formula is “before one’s mind.”  Rather, he’s pointing out that this formula before one’s mind can’t be the source of one’s ability to solve the problems – it’s merely a by-product of one’s instruction in addition.  It will become clear in the paragraphs below that an introspected formula cannot fix how one solves problems requiring the use of the formula.

Wittgenstein is not here denying that there are characteristic experiential accompaniments to meaning and understanding – for images and the like do sometimes come before our minds when we utter or understand words – but he is denying that such experiential phenomena could constitute understanding.  Experiences are at most a symptom or sign of understanding; they are not the understanding itself.  The mistake of the traditional empiricist conception of meaning was thus to take as constitutive what is in reality only symptomatic. (McGinn, 1984, p. 4)

It should be stressed, once again, that this argument generalises to all rule-following: “Of course, these problems apply throughout language and are not confined to mathematical examples, though it is with mathematical examples that they can be most smoothly brought out” (Kripke, 1982, p. 19).  Ahmed (2007, p. 103) points out that “there is nothing special about ‘plus’ – if scepticism about ‘plus’ is irrefutable then so is scepticism about any word in any language.”

Having a Formula (and its Interpretation) Before one’s Mind

Kripke (1982) now searches for some mechanism in the student’s mind which will obviate the need to conclude that the unmeasured student is in an indefinite state.  He reasons that the formula in mind is just like a sign post – it cannot in itself fix the student’s response.   It is neither sufficient nor necessary to the student’s exercise of understanding.  As a consequence, it might be argued that having a formula in mind is of little value unless one is equipped to interpret it.  McDowell (1992, p. 41) makes the case using Wittgenstein’s sign-post metaphor:

What does sort behaviour into what counts as following the sign-post and what does not is not an inscribed board affixed to a post, considered in itself, but such an object under a certain interpretation – such an object interpreted as a sign-post pointing the way to a certain destination. (McDowell, 1992, p. 41)

Kripke (1982) is trying to explain how students follow the rule for the use of the ‘+’ sign given appropriate teaching and a finite number of illustrations of that rule.  The idea that they are guided by a mental image of a rule proves unworkable.  Could it be that one needs further evidence, namely, evidence that the student can interpret the rule correctly?  It follows that there must be evidence in mind that the student has correctly interpreted the rule.  However, this is of scant assistance, for the rule is capable of multiple interpretations.  It seems, therefore, that one needs to have a rule in mind for selecting the correct interpretation.  In order to explain the student’s ability to follow the rule for the use of the “+” sign, one must invoke a further rule – but this time the rule is in mind – for selecting the correct interpretation.  One has now fallen into an infinite regress.

An argument centred on the right interpretation will not work for the student must then have access to the rule for selecting this correct interpretation and one has a circular argument, because it is rule-following one is seeking to explain in the first place.  It instructive to remind the reader of the central idea here.  Facts are being sought about an individual which would determine in advance what response he or she should make to a novel problem which requires the use of a rule.  It has been decided that no outer facts fix what the individual does next because any response can be shown to accord with the rule exemplified by a finite set of illustrative examples offered by way of instruction.  The search then switched to facts about the individual’s mind.

If God looked into the individual’s mind and spotted a representation of the “quus” rule identified earlier then, at first sight, it seems He could predict with certainty that if asked the question “68 + 57 = ?” the individual must answer “5.”  The introduction of a finite entity in the mind (a mental representation of the formula) explaining the individual’s potentially infinite capacity for applying the rule is appealing until one notes that it is possible to have a formula in mind and yet not know how to apply it.  It is clear that the formula in itself is neither sufficient nor necessary.  It is “normatively inert” (McDowell, 1992, p. 42) because it cannot be used in isolation to pronounce the individual’s future response to the novel problem to be “125” or “5,” or any other number.  As McDowell puts it, the formula just “stands there” in need of interpretation.

McDowell (1992, p. 42) points out that attempting to locate the sought-after facts in the individual’s mind is fraught with problems because this is “a region of reality populated by items that, considered in themselves, just ‘stand there’.’’  Mental representations have to be interpreted.  McDowell argues that whatever attaching the correct interpretation to the formula might consist in, it is nevertheless an element of a region of reality (the mind) populated by items that just stand there like sign posts.  It follows that the interpretation itself has to be interpreted, and so on, in an infinite regress.

Wright (2001, pp. 162-163) makes such a powerful and pithy argument demonstrating the futility of interpretations that it is quoted here in full:

Suppose I undergo some process of explanation – for instance, a substantial initial segment of some arithmetical series is written out for me – and as a result I come to have the right rule ‘in mind.’  How, when it comes to the crunch – at the nth place which lies beyond the demonstrated initial segment, and which I have previously never thought about – does having the rule ‘in mind’ help?  Well, with such an example one tends to think of having the rule ‘in mind’ on the model of imagining a formula, or something of that sort.  And so it is natural to respond by conceding that, strictly merely having the rule in mind is no help.  For I can have a formula in mind without knowing what it means.  So – the response continues – it is necessary in addition to interpret the rule. … An interpretation is of help to me, therefore, in my predicament at the nth place only if it is correct. … So how do I tell which interpretation is correct?  Does that, for instance, call for a further rule – a rule for determining the correct interpretation of the original – and if so, why does it not raise the same difficulty again, thereby generating a regress?

Selecting the Simplest Rule

It may strike the reader that a criterion based on simplicity may distinguish understanding the “+” sign in terms of the addition rather the quaddition function.  The quus function, with its differing approach for numbers less than 57, and those greater than or equal to 57, seems a particularly unwieldy function when compared with the simple plus function.  Its mathematical symbolism would also be alien to any student in the primary phase of education.  Could it be that the student simply selects a unique interpretation (the “correct” interpretation) from the infinity on offer by simply choosing the interpretation with the simplest associated function?

Chaitin (2007) has extended Gödel’s incompleteness theorem (1931) and Turing’s halting problem (1950) to develop Algorithmic Information Theory.  He demonstrates that the search for the simplest rule (or most “elegant” rule in Chaitin’s parlance) which generates a sequence of numbers is equivalent to the search for the shortest computer program which can generate the sequence.  Unfortunately, Chaitin (2007, pp. 120-121) confirms that for any finite sequence of numbers, the identification of the simplest interpretation also presents intractable problems.

Let’s say I have a particular calculation, a particular output, that I’m interested in, and that I have this nice, small computer program that calculates it, and think that it’s the smallest possible program, the most concise one that produces this output.  Maybe a few friends of mine and I were trying to do it, and this was the best program that we came up with; nobody did any better.  But how can you be sure?  Well, the answer is that you can’t be sure.  It turns out you can never be sure!  You can never be sure that a computer program is what I like to call elegant, namely that it’s the most concise one that produces the output that it does.  Never, ever!

This paper details only some of Kripke’s (1982) attempts to escape the conclusion that if one restricts oneself to the totality of facts about the individual – outer facts about past practice and inner facts about mental contents (the two sets of facts treated as separately analysable) – one cannot predict the response the individual will make to the simple addition question: “68 + 57 = ?”  Since all the facts are in keeping with an infinity of answers, one correct and the rest incorrect, Kripke (1982, p. 17) is compelled to conclude that, in providing a response to this question, the rule-follower must be characterised as making “an unjustified stab in the dark.”  The student has no criterion for preferring 125 over 5; all the facts are in keeping with the correct answer and any incorrect answer.  The student is in an indeterminate state with respect to an understanding of the “+” sign.

So it seems that from a first-person perspective, individuals who have been taught to add using a finite number of examples, offer the first answer that comes into their heads when required to extend the addition rule to unseen computations.  They have no criterion which guides their selection of 125 as the correct answer to the problem “68 + 57 = ?”  There must be an error in the reasoning that produces such a counter-intuitive conclusion.  According to the logic presented above a student who, having been taught to add via a series of examples and then instructed to “go on in the same way,” subsequently encounters the addition problem “68 + 57 = ?” and responds by writing “5,” can protest that he or she did go on in the same way; the student just didn’t go on in the same as the teacher who issued the instruction.  There exists an interpretation which brings the answer “5” into accord with the teacher’s examples.  Indeed, this is true of any answer the student offers.  These two answers (125 and 5) and an infinity of other answers, are all in keeping with the totality of facts about the student.

Latent Variables Modelling Misrepresents Ability

The student’s interaction with the sceptic shows that the student has no criterion which can be used to differentiate a correct from an incorrect response.  The very concepts of right and wrong don’t seem to apply here.  This invites the obvious question: In respect of the problem “68 + 57 = ?” what makes “125” the correct answer and “5” the wrong answer?  What makes the teacher right in thinking the student should answer “125” and the student wrong in answering “5”?  This issue is resolved by bringing in the human practice of mathematics.  When we enter the picture, psychological measurement is bound to lose some of its Newtonian objectivity, an objectivity that quantum theory teaches is unattainable.

Heisenberg (1958, pp. 55-56) stresses that the cost of the participant’s inclusion is reduced objectivity in scientific measurement: The “reference to ourselves” means that “our description is not completely objective.”  Objectivity in the Newtonian sense is no longer the hallmark of science because it fails to account for the participative element of measurement.  Beyond classical physics, measurement models with no place for human practices are of questionable scientific validity: “When we speak of the picture of nature in the exact science of our age, we do not mean a picture of nature so much as a picture of our relationships with nature.  The old division of the world into objective processes in space and time and the mind in which these processes are mirrored … is no longer a suitable starting point for our understanding of modern science” (Heisenberg, 1962, pp. 28-29).

Wittgenstein’s writings make clear that, divorced from human practices, the descriptors “right” and “wrong” lose their meaning, even in disciplines like mathematics and logic.  Quantum theoretical “weak objectivity” (d’Espagnat, 1983) has replaced the strong objectivity of Newtonian mechanics because Newtonian objectivity misrepresents the psychometrician’s task.

[A]lmost all of us, after sufficient training, respond with roughly the same procedures to concrete addition problems.  We respond unhesitatingly to such problems as ’68 + 57,’ regarding our procedure as the only comprehensible one (see Wittgenstein, 1953, §§219, 231, 238), and we agree in the unhesitating responses we make.  On Wittgenstein’s conception, such agreement is essential for our game of ascribing rules and concepts to each other (see Wittgenstein, 1953, §240). (Kripke, 1982, p. 96)

Wittgenstein (1975, p. 58) writes: “The only criterion for his multiplying 113 by 44 in a way analogous to the examples is his doing it in the way in which all of us, who have been trained in a certain way, would do it.”  It follows that third-person ascriptions of the ability to add are based on the criteria afforded by the practice of mathematics, a practice into which the teacher has been enculturated.  Wittgenstein notes that “Indefinitely many other ways of acting are possible: but we do not call them ‘following the rule’” (Malcolm, 1986, p. 155).

Criteria hover somewhere between deductive and inductive grounds (Grayling, 1977) and their nature can be traced back to the introduction of the participating psychologist.  For example, there is an intrinsic “vagueness,” to borrow Wittgenstein’s term, in the accepted number of particular additions one ought to compute correctly before having the ability to add ascribed to one.  There is no fixed number of even numbers a student should write down before being regarded as someone who “understands” or “has mastered” or “has grasped the meaning of” the even numbers.  This vagueness is a constitutive property of psychological predicates; it isn’t a shortcoming.

Suppose we are teaching a student how to construct different series of numbers according to particular formation rules.  When will we say that he has mastered a particular series, say, the series of natural numbers?  Clearly, he must be able to produce this series correctly: ‘that is, as we do it’ (Wittgenstein, 1953, §145).  Wittgenstein points here to a certain vagueness in our criteria for judging that he has mastered the system, in respect of how often he must get it right and how far he must develop it.  This vagueness is something that Wittgenstein sees as a distinctive characteristic of our psychological language game, one that distinguishes it from the language-game in which we describe mechanical systems. (McGinn, 1997, p. 89)

Gilbert Ryle (1949, p. 164) writes: “To settle whether a boy can do long division, we do not require him to try out his hand on a million, a thousand, or even a hundred different problems in long division.  We should not be quite satisfied after one success, but we should not remain dissatisfied after twenty, provided that they were judiciously variegated and that he had not done them before.  A good teacher, who also watched his procedure in reaching them, would be satisfied much sooner, and he would be satisfied sooner still if he got the boy to describe and justify the constituent operations that he performed.”

Hence, third-person ascriptions of ability are based on criteria while first-person ascriptions are not; in the first-person case there are no criteria for attaching the correct interpretation to the mental image of a rule.  In short, first-person and third-person ascriptions of ability are mutually exclusive; the former do not require criteria for an ascription of ability (the individual acts for no reason) but the latter do.  First-person ascriptions of ability are associated with being right and wrong, while third-person ascriptions are associated with being right or wrong.

Returning to the counterintuitive conclusion drawn by Kripke (1982), must it be accepted that individuals respond to novel addition problems by offering capricious answers?  This conclusion needn’t be drawn because there’s an error in Kripke’s premise, namely, that his analysis treats inner facts (associated with first-person ascriptions of ability) as entirely independent of outer facts (associated with third-person ascriptions of ability).  The idea that the inner stands in a deterministic relation to the outer has been challenged earlier.  Wittgenstein considered first-person and third person ascriptions as forming an indivisible whole; they cannot be meaningfully separated (Malcolm, 1971, pp. 87-91).

In summary when the measurement process is divided into the situation immediately before the individual responds to the question “68 + 57 = ?” (the individual’s ascription of ability to himself or herself) and the situation immediately afterwards (the ascription of ability to the individual by the measurer), the relation is not one of Newtonian determinism between two independent situations.  Rather, it’s one of quantum complementarity where complementarity is the more general concept which replaced Newtonian causality.  The difficulties identified by Kripke (1982) in respect of causality at the level of the individual can be seen in a new light by eschewing causality for complementarity.

Polkinghorne (1996, p. 70) defines complementarity as a “combination of apparent opposites” and Whitaker (1996, p. 184) describes it as “mutual exclusion and joint completion.”  In psychological predicates first-person ascriptions are made without criteria while third-person ascriptions require criteria.  This is the mutual exclusiveness facet of complementarity in respect of psychological predicates.  But these two very different ascriptions cannot be separated on pains of accepting Kripke’s conclusion that rule-following in mathematics is capricious.  This is the joint completion facet.

While first-person/third-person ascriptions of psychological predicates appear to stand in a complementary relationship, this asymmetry is entirely absent in Newtonian physics.  Suter (1989, pp. 152-153) writes: “This asymmetry in the use of psychological and mental predicates – between the first-person present-tense and second- and third-person present-tense – we may take as one of the special features of the mental.  Physical predicates display no such asymmetry.”

References

 Ahmed, A. (2007).  Saul Kripke.  London: Continuum.

Anscombe, G.E.M. (1985).  Wittgenstein on rules and private language.  Ethics, 95, 342-352.

Barrett, P. (2008).  The consequence of sustaining a pathology: Scientific stagnation – a commentary on the target article “Is psychometrics a pathological science” by Joel Michell.  Measurement, 6, 78-83.

Battig, W.F. (1978).  Parsimony or psychology.  Presidential Address, Rocky Mountain Psychological Association, Denvir, CO.

Bennett, M.R., & Hacker, P.M.S. (2003).  Philosophical foundations of neuroscience.  Oxford: Blackwell Publishing.

Blinkhorn, S. (1997).  Past imperfect, future conditional: Fifty years of test theory.  British Journal of Mathematical and Statistical Psychology, 50(2), 175-186.

Bloor, D. (1997).  Wittgenstein: Rules and institutions.  London: Routledge.

Bohr, N. (1934/1987).  The philosophical writings of Niels Bohr: Volume 1 – Atomic theory and the description of nature.  Woodbridge: Ox Bow Press.

Bohr, N. (1958/1987).  The philosophical writings of Niels Bohr: Volume 2 – Essays 1933 – 1957 on atomic physics and human knowledge.  Woodbridge: Ox Bow Press.

Borsboom, D., Mellenbergh, G.J., & van Heerden, J. (2003).  The theoretical status of latent variables.  Psychological Review, 110 (2), 203-219.

Borsboom, D. (2005).  Measuring the mind.  Cambridge: Cambridge University Press.

Bridgman, P. W. (1927).  The logic of modern physics.  New York: Macmillan.

Bruner, J.S. (1990).  Acts of meaning.  Cambridge, MA: Harvard University Press.

Byrne, B.M. (1989).  A primer of LISREL.  New York: Springer-Verlag.

Chaitin, G.J. (2007).  Thinking about Gödel and Turing.  Hackensack, NJ: World Scientific.

d’Espagnat, B. (1983).  In search of reality.  New York: Springer-Verlag.

Elliot, C.D., Murray, D., & Pearson, L.S. (1978).  The British ability scales.  Windsor: National Foundation for Educational Research.

Ellis, J.L., & Van den Wollenberg, A. L. (1993).  Local homogeneity in latent trait models: A characterization of the homogeneous monotone IRT model.  Psychometrika, 58, 417-429.

Favrholdt, D. (Ed.). (1999).  Niels Bohr collected works (Volume 10).  Amsterdam: Elsevier Science B.V.

Feynman, R.P. (1985).  QED: The strange theory of light and matter.  Princeton, NJ: Princeton University Press.

Finch, H.L. (1977).  Wittgenstein – the later philosophy.  Atlantic Highlands, NJ: Humanities Press.

Gieser, S. (2005).  The innermost kernel.  Berlin: Springer-Verlag.

Gigerenzer, G. (1987).  Probabilistic thinking and the fight against subjectivity.  In L. Kruger, G. Gigerenzer, & M.S. Morgan (Eds.), The probabilistic revolution – Volume 2: Ideas in the sciences (pp. 11-33).  Cambridge, MA: The Massachusetts Institute of Technology Press.

Gödel, K. (1931).  Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I.  Monatshefte für Mathematik und Physik, 38, 173-198.

Goldstein, H., & Blinkhorn, S. (1997).  Monitoring educational standards – an inappropriate model.  Bulletin of the British Psychological Society, 30, 309-311.

Hacker, P.M.S. (1997).  Insight and illusion: Themes in the philosophy of Wittgenstein.  Bristol: Thoemmes Press.

Heisenberg, W. (1958).  Physics and philosophy.  New York: Prometheus Books.

Heisenberg, W. (1962).  The physicist’s conception of nature.  London: The Scientific Book Guild.

Hertz, H. (1956).  The principles of mechanics presented in a new form.  New York: Dover Publications Inc.

Hood, S.B. (2008).  Latent variable realism in psychometrics.  Unpublished doctoral dissertation, Indiana University.

Honner, J. (2002).  The description of nature: Niels Bohr and the philosophy of quantum physics.  Oxford: Clarendon Press.

Jammer, M. (1974).  The philosophy of quantum mechanics.  New York: John Wiley & Sons.

Jammer, M. (1999).  Einstein and religion.  Princeton, NJ: Princeton University Press.

Jenkins, J.J. (1979).  Four points to remember: a tetrahedral model of memory experiments.  In L.S. Cremak, & F.I.M. Craik (Eds.), Levels of processing in human memory (pp. 429-446).  Hillsdale, NJ: Lawrence Erlbaum.

Jöreskog, K.G., & Sörbom, D. (1993).  LISREL 8 user’s reference guide.  Chicago: Scientific Software International.

Kalckar, J. (Ed.). (1985).  Niels Bohr collected works (Volume 6).  Amsterdam: Elsevier Science B.V.

Kripke, S.A. (1982).  Wittgenstein on rules and private language.  Oxford: Blackwell.

Luce, R.D. (1997).  Several unresolved conceptual problems of mathematical psychology.  Journal of Mathematical Psychology, 41, 79-87.

Malcolm, N. (1971).  Problems of mind.  New York: Harper Torchbooks.

Malcolm, N. (1986).  Wittgenstein: Nothing is hidden.  Oxford: Blackwell.

McDowell, J. (1992).  Meaning and intentionality in Wittgenstein’s later philosophy.  In P.A. French, T.E. Uehling, & H.K. Wettstein (Eds.), Midwest Studies in Philosophy Volume XVII: The Wittgenstein legacy (pp. 40-52).  Notre Dame, Indiana: University of Notre Dame Press.

McDowell, J. (1998).  Mind, value and reality.  Cambridge, MA: Harvard University Press.

McGinn, C. (1984).  Wittgenstein on meaning: Oxford: Blackwell.

McGinn, M. (1997).  Wittgenstein and the Philosophical Investigations.  London: Routledge.

Mermin, D. (1993).  Lecture given at the British Association Annual Science Festival.  London: British Association for the Advancement of Science.

Michell, J. (1990).  An introduction to the logic of psychological measurement.  Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.

Michell, J. (1997).  Quantitative science and the definition of measurement in psychology.  British Journal of Psychology, 88, 355-383.

Michell, J. (1999).  Measurement in psychology: A critical history of a methodological concept.  New York: Cambridge University Press.

Michell, J. (2000).  Normal science, pathological science, and psychometrics.  Theory & Psychology, 10(5), 639-667.

Nagel, T. (1986).  The view from nowhere.  New York: Oxford University Press.

Omnès, R. (1999a).  Understanding quantum mechanics.  Princeton, NJ: Princeton University Press.

Omnès, R. (1999b).  Quantum philosophy.  Princeton, NJ: Princeton University Press.

Oppenheimer, R. (1955, September 4).  Analogy in science.  Paper presented at the 63rd Annual Meeting of the American Psychological Association, San Francisco, CA.

Panjvani, C. (2008).  Rule-following, explanation-transcendence, and private language.  Mind, 117, 303-328.

Pears, D. (2006).  Paradox and platitude in Wittgenstein’s philosophy.  Oxford: Clarendon Press.

Polkinghorne, J. (1996).  Beyond science.  Cambridge: Cambridge University Press.

Polkinghorne, J. (2002).  Quantum theory: A very short introduction.  Oxford: Oxford University Press.

Putnam, H. (1988).  Representation and reality.  Cambridge, MA: The MIT Press.

Rasch, G. (1960).  Probabilistic models for some intelligence and attainment tests.  Copenhagen, Denmark: Paedagogiske Institut.

Richardson, K. (1999).  The making of intelligence.  London: Weidenfeld & Nicolson.

Roediger III, H.L. (2008).  Relativity of remembering: Why the laws of memory vanished.  Annual Review of Psychology, 59, 225-254.

Schroeder, S. (2006).  Wittgenstein.  Cambridge: Polity Press.

Shimony, A. (1997).  On mentality, quantum mechanics and the actualization of potentialities.  In R. Penrose, The large, the small and the human mind (pp. 144-160).  Cambridge: Cambridge University Press.

Sobel, M.E. (1994).  Causal inference in latent variable models.  In A. von Eye & C.C. Clogg (Eds.), Latent variable analysis (pp. 3-35).  Thousand Oakes: Sage.

Stapp, H.P. (1972).  The Copenhagen interpretation.  American  Journal of Physics, 40, 1098-1116.

Stapp, H.P. (1993).  Mind, matter, and quantum mechanics.  Berlin: Springr-Verlag.

Stent, G.S. (1979).  Does God play dice?  The sciences, 19, 18-23.

Stevens, S.S. (1946).  On the theory of scales of measurement. Science, 103, 667-680.

Suen, H.K. (1990).  Principles of test theories.  Hillsdale, NJ: Erlbaum.

Suter, R. (1989).  Interpreting Wittgenstein: A cloud of philosophy, a drop of grammar.  Philadelphia: Temple University Press.

Thorndike, R.L. (1982).  Educational measurement: Theory and practice.  In D. Spearritt (Ed.), The improvement of measurement in education and psychology: Contributions of latent trait theory (pp. 3-13).  Melbourne: Australian Council for Educational Research.

Trendler, G. (2011).  Measurement theory, psychology and the revolution that cannot happen.  Theory and Psychology, 19(5), 579-599.

Tulving, E. (2007).  Are there 256 different kinds of memory?  In J.S. Nairne (Ed.), The foundations of remembering: Essays in honour of Henry L. Roediger III (pp. 39-52).  New York: Psychological Press.

Turing, A.M. (1950).  Computing machinery and intelligence.  Mind, 59, 433-460.

Whitaker, A. (1996).  Einstein, Bohr and the quantum dilemma.  Cambridge: Cambridge University Press.

Wick, D. (1995).  The infamous boundary.  New York: Copernicus.

Williams, M. (1999).  Wittgenstein, mind and meaning.  London: Routledge.

Willmott, A.S., & Fowles, D.E. (1974).  The objective interpretation of test performance: The Rasch model applied.  Windsor: National Foundation for Educational Research.

Wittgenstein, L. (1953).  Philosophical Investigations.  G.E.M. Anscombe, & R. Rhees (Eds.), G.E.M. Anscombe (Tr.).  Oxford: Blackwell.

Wittgenstein, L. (1975).  Wittgenstein’s lectures on the foundations of mathematics: Cambridge 1939.  Chicago: University of Chicago Press.

Wittgenstein, L. (1980).  Cambridge lectures (1930-1932).  From the notes of John King and Desmond Lee, edited by Desmond Lee.  Totowa, NJ: Rowman and Littlefield.

Wittgenstein, L. (1983).  Remarks on the foundations of mathematics.  Cambridge, MA: MIT Press.

Wright, B.D. (1997).  A history of social science measurement.  Educational Measurement: Issues and Practice, 16(4), 33-52

Wright, C. (2001).  Rails to infinity.  Cambridge, MA: Harvard University Press.

 

 

 

Rate this:

Why OECD Pisa cannot be rescued

04 Sunday Dec 2016

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

Albert Einstein, Andreas Schleicher, British Journal of Mathematical and Statistical Psychology., Christian Bokhove, Complementarity, Diane Ravitch, Dr Hugh Morrison, ETS, Greg Ashman, Item Response Theory, John Jerrim, Matthias von Davier, Measurement of ability, Michael Gove, Michael Oakeshott, Niels Bohr, PIRLS, Pisa 2015, Randy Bennett, Rasch Model, Robbie Meredith, Sean Coughlan, TES, Theresa May, Times Educational Supplement, TIMSS

PISA cannot be rescued by switching IRT model because all IRT modelling is flawed.

Dr Hugh Morrison (The Queen’s University of Belfast [retired])drhmorrison@gmail.com

On page 33 of the Times Educational Supplement of Friday 25th November 2016, Andreas Schleicher, who oversees PISA, appears to accept my analysis of the shortcomings of the Rasch model which plays a central role in PISA’s league table.  The Rasch model is a “one parameter” Item Response Theory (IRT) model, and Schleicher argues that PISA’s conceptual difficulties can be resolved by abandoning the Rasch model for a two or three parameter model.  However, my criticisms apply to all IRT models, irrespective of the number of parameters.  In this essay I will set out the reasoning behind this claim.

 

One can find the source of IRT’s difficulty in Niels Bohr’s 1949 paper entitled Discussion with Einstein on Epistemological Problems in Atomic Physics.  Few scientists have made a greater contribution to the study of measurement than the Nobel Laureate and founding father of quantum theory, Niels Bohr.  Given Bohr’s preoccupation what the scientist can say about aspects of reality that are not visible (electrons, photons, and so on), one can understand his constant references to measurement in psychology.  “Ability” cannot be seen directly; rather, like the microentities that manifest as tracks in particle accelerators, ability manifests in the examinee’s responses to test items.  IRT is concerned with “measuring” something which the measurer cannot experience directly, namely, the ability of the examinee.

 

IRT relies on a simple inner/outer picture for its models to function.  In IRT the inner (a realm of timeless, unobserved latent variables, or abilities) is treated as independent of the outer (here examinees write or speak responses at moments in time).  This is often referred to as a “reservoir” model in which timeless abilities are treated as the source of the responses given at specific moments in time.

 

As early as 1929 Bohr rejected this simplistic thinking in strikingly general terms: “Strictly speaking, the conscious analysis of any concept stands in a relation of exclusion to its immediate application.  The necessity of taking recourse to a complementary … mode of description is perhaps most familiar to us from psychological problems.”  Now what did Bohr mean by these words?  Consider, for example, the concept “quadratic.”  It is tempting to adopt a reservoir approach and trace a pupil’s ability to apply that concept in accord with established mathematical practice to his or her having the formula in mind.  The guidance offered by the formula in mind (Bohr’s reference to “conscious analysis”) accounts for the successful “application,” for example, to the solution of specific items on an algebra test.

 

However, this temptingly simplistic model in which the formula is in the unobserved mental realm and written or spoken applications of the concept “quadratic” take place in the observed realm, contains a fundamental flaw; the two realms cannot be meaningfully connect.  The “inner” formula (in one realm) gets its guidance properties from human practices (in the other realm).  A formula as a thing-in-itself cannot guide; one has to be trained in the established practice of using the formula before it has guidance properties.  In school mathematics examinations around the world, pupils are routinely issued with a page of formulae relevant to the examination.  Alas, it is the experience of mathematics teachers everywhere that simply having access to the formula as a thing-in-itself offers little or no guidance to the inadequately trained pupil.  The formula located in one realm cannot connect with the applications in the other.

 

Wittgenstein teaches that no formula, rule, principle, etc. in itself can ever determine a course of action.  The timeless mathematical formula in isolation cannot generate all the complexities of a practice (something which evolves in time); rather, as Michael Oakeshott puts it, a formula is a mere “abridgement” of the practice – the practice is primary, with the formula, rule, precept etc. deriving its “life” from the practice.

 

Returning to Bohr’s writing, it is instructive to explain his use of the word “complementarity” in respect of psychology and to explain the meaning of the words: “stands in a relation of exclusion.”  Complementarity was the most important concept Bohr bequeathed to physics.  It involves a combination of two mutually exclusive facets.  In order to see its relevance to the validity of IRT modelling, let’s return to the two distinct realms.

 

We think of the answers to a quadratic equation as being right or wrong (a typical school-level quadratic equation has two distinct answers).  In the realm of application this is indeed the case.  When the examinee is measured, his or her response is pronounced right or wrong dependent upon its relation to established mathematical practice.  However, in the unobserved realm, populated by rules, formulae and precepts (as things-in-themselves), any answer to a quadratic equation is simultaneously right and wrong!

 

A formula as a thing-in-itself cannot separate what accords with it from what conflicts with it, because there will always exist an interpretation of the formula for which a particular answer is correct, and another interpretation for which the same answer can be shown to conflict with the formula.  Divorced from human practices, the distinction between right and wrong collapses.  (This is a direct consequence of Wittgenstein celebrated “private language” argument.)  This explains Bohr’s reference to a “relation of exclusion.”  In simplistic terms, the unobserved realm, in which answers are compared with the formula for solving quadratics, responses are right-and-wrong, while in the observed realm, where answers are compared with the established practice, responses are right-or-wrong.

 

On this reading, ability has two mutually exclusive facets which cannot meaningfully be separated.  The distinguished Wittgenstein scholar, Peter Hacker, captures this situation as follows: “grasping an explanation of meaning and knowing how to use the word explained are not two independent abilities but two facets of one and the same ability.”  Ability, construed according to Bohr’s complementarity, is indefinite when unobserved and definite when observed.  Moreover, this definite measure is not an intrinsic property of the examinee, but a property of the examinee’s interaction with the measuring tool.

 

Measurement of ability is not a matter of passively checking up on what already exists – a central tenet of IRT.  Bohr teaches that the measurer effects a radical change from indefinite to definite.  Pace IRT, measurers, in effect, participate in what is measured.  No item response model can accommodate the “jump” from indefinite to definite occasioned by the measurement process.  All IRT models mistakenly treat unmeasured ability as identical to measured ability.  What scientific evidence could possibly be adduced in support of that claim?  No IRT model can represent ability’s two facets because all IRT models report ability as a single real number, construed as an intrinsic property of the measured individual.

 

 

 

Rate this:

Knewton claims of adaptive learning have no scientific merit

08 Tuesday Apr 2014

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

Adaptive Learning, Cambridge University Press, Cengage Learning, Computer Adaptive Testing, Dr Hugh Morrison, GMAT Prep, Gutenberg, Houghton Mifflin Harcourt, IRT, Issac Newton, Item Response Theory, Jose Ferreira, Knewton, Kripke S.A., Lelivrescolaire, ludwig Wittgenstein, Macmillan, Math Readiness, Mermin N.D., Microsoft, Niels Bohr, OECD Pisa, Pears D., Pearson, Robert Oppenheimer, Sebit, Stapp H.P., Triumph Learning, Wiley

Why the Knewton Platform can make no claim to scientific measurement principles

Dr Hugh Morrison (email: drhmorrison@gmail.com)

Introduction

This article is a critique of the scientific claims made for the Knewton platform. Knewton’s designers portray the platform as drawing upon a science of adaptive learning to support a high quality personalised learning experience. To paraphrase, Knewton claims that its scientific rigour allows it to adapt to the learner, thereby providing the student with a personalised syllabus that enables him or her to learn every single concept he or she encounters. The scientific model of reality which informs Knewton is the Newtonian model.

"Knewton adaptivity is distinguished by its scientific rigor and tremendous scope"

“Knewton adaptivity is distinguished by its scientific rigor and tremendous scope”

Particularly damaging for the Knewton’s claims to scientific rigour is the fact that the psychological/educational attribute “learn” is governed by a “first-person/third-person asymmetry” which the Newtonian model simply cannot represent (Suter, 1989, pp. 152-153). Newtonian attributes are first-person/third-person symmetric. One of the twentieth century’s greatest physicists, Niels Bohr, regarded Newtonian models as entirely inappropriate for predicates such as “learn.” Knewton’s choice of scientific model was condemned almost 60 years ago by one of the towering figures of American physics, Robert Oppenheimer. Oppenheimer (1955, p. 134) cautions:
Robert Oppenheimer

“It seems to me that the worst of all possible misunderstandings would be that psychology be influenced to model itself after a physics which is not there anymore, which has been quite outdated.”

Stapp (1993, p. 192) bemoans the fact that “while psychology has been moving towards the mechanical concepts of nineteenth-century physics, physics itself has moved in just the opposite direction.”

Opponents of the Knewton platform seem to be driven by a deep-rooted sense that there is something unsettling about this project. Surely learning is a human activity and “data science” cannot replace the teacher and the school? The central aim of this article is to demonstrate just how much the Knewton claim to scientific rigour is damaged by its exclusion of the all-important human dimension.

Item Response Theory and ambiguous communication

The ideas of the great physicist and Nobel Laureate, Niels Bohr, feature prominently in this paper. To give a brief sense of his standing as a scientist, on news of his death, the following words appeared in the editorial of the New York Times:

“With the passing of Niels Bohr the world has lost not only one of the greatest scientists of this century but also one of the intellectual giants of all time.”

Bohr defined the hallmark of science to be “unambiguous communication.” In quantum theory one cannot meaningfully separate what is measured from the measurement instrument without communicating ambiguously. The following simple illustration will help make Bohr’s point. Let’s return to the twentieth century when Einstein was alive. Suppose that Einstein and a student at an American high school both produced a perfect score on a high school mathematics examination paper. Surely to claim that the student had the same mathematical ability as Einstein would be to communicate ambiguously. The student has nothing to match Einstein’s contributions to special and general relativity, to say nothing of quantum theory. However, unambiguous communication can be restored if we simply take account of the measuring instrument and say, “Einstein and the student have the same mathematical ability relative to this particular examination paper.” Mathematical ability, indeed any ability, is not an intrinsic property of the individual; rather, it’s a joint property of the individual and the measuring instrument.

Unambiguous communication

Unambiguous communication

In short, ability isn’t a property of the person being measured; it’s a property of the interaction of the person with the measuring instrument. In all of Bohr’s writings one finds repeated reference to a profound conceptual equivalence between measurement in quantum theory and measurement in psychology. Bohr referred to this as “subject/object holism.” If this is accepted then (as illustrated in the case of Einstein and the high school student) one cannot meaningfully divorce what is measured from the measuring instrument in psychology.

Pisa graphic

But a central tenet of Item Response Theory is that the measuring instrument and the measurement outcome must be viewed as entirely independent. This puts Item Response Theory at odds with Bohr’s teachings. Item Response Theory features prominently in the Knewton Platform, in Computer Adaptive Testing, and underpins the PISA league tables.

One cannot meaningfully divorce what is measured from the measuring instrument in psychology

One cannot meaningfully divorce what is measured from the measuring instrument in psychology

Why Knewton cannot lay claim to a science of learning

Teaching by pointing is a commonplace classroom device. For example, young students are taught the general concept “colour” by observing the colour of objects pointed out by their teacher. In order that the students grasp the concept “colour,” the teacher might define a range of individual colours – green, red, blue and so on – before later revealing that these are all instances of the general concept “colour.” In this case learning might progress from particular concepts such as “red,” “green” and so on, to the more general concept of colour.

In teaching the concept “green,” the teacher might point to a sample square patch of green cloth and utter the word “green.” This gives rise to a “learning paradox” that dates back to Plato: how is the student to know that the teacher wants him or her to attend to the colour of the patch? Couldn’t the teacher, by pointing at the square patch of green cloth, be attempting to teach the concept “square” or “patch” or “cloth”? Clearly the teacher wants the student to attend to the colour of the square patch of cloth. But the student hasn’t yet “got” the general concept “colour.” It seems that before the student can be taught the sub-concept “green,” he or she must already have the wider concept “colour.”

The learning paradox now becomes clear. If the student does not have the concept “colour” he or she can’t learn the sub-concept “green” because the student can’t grasp that the teacher intends him or her to attend to the colour of the patch rather than its geometric shape or texture. On the other hand, if the student already has the concept “colour” he or she can’t be said to learn the sub-concept green from the teacher’s pointing because the student already has grasped the concept “green,” since this colour is a sub-concept of the wider concept “colour.” It seems that no explanation can be offered for how students learn the simple concept “green.” But we all know from experience that students do learn individual colours by pointing.

Learning is a myth

Learning is a myth

I feel sure some readers will protest as follows: “You’ve got this wrong – when I taught my child the colour green, I pointed to a green door, a green toy, a green car, a green fence etc. etc.” I’m afraid the paradox still applies with full force. Kripke (1982), for example, devotes the greater part of his celebrated text on rule-following to the case of a student taught to add using hundreds of examples. When a concept such as addition is considered to be attributable to the student, one can only explain the student’s ability to give correct answers to a single novel addition problem by attributing to the student an infinite “look-up list” of all possible addition problems complete with answers!

This forces one to conclude that if concepts are treated as the properties of individuals (as in the Knewton platform), then one has to accept that learning and education are phantasms: “the learning paradox may be viewed as shaking the foundations of educational thought by demonstrating that the supposed role of learning (and hence education) is an illusion” (Bereiter, 1985, p. 221). The central message of the Knewton science of learning would seem to be that learning itself is a myth.

Jose Ferreira CEO Knewton

Jose Ferreira CEO Knewton

One can escape the learning paradox entirely by adopting the reasoning of Niels Bohr and Ludwig Wittgenstein. According to their approach – which rejects the idea that concepts can be ascribed to individuals – the concept “fraction,” for example, is a property of interactions between the individual and the established mathematical practice of manipulating fractions. Human practices play a pivotal role in this escape from paradox and the arguments generalise to all concepts.

Avoiding the learning paradox

In classes all over the world, colour concepts are taught successfully by pointing and students master addition through worked examples, and classroom practice. Teachers needn’t worry about the learning paradox, nor adjust their teaching techniques to take account of it. Indeed, I suspect that the vast majority of teachers wouldn’t even recognise a version of the learning paradox in Donald Rumsfeld’s words: “There are known knowns, … .” Why is this? It is because teachers in everyday classrooms have access to an ingredient missing from Knewton’s science of adaptive learning, and that ingredient is human nature.

Wittgenstein2

Wittgenstein‘s point is only that, when such lessons do succeed, as they often do in real life, they draw on a resource that is presupposed by the verbal interpretations and definitions but not mentioned in them. That extra resource, to put it at first quite generally and vaguely, is human nature. (Pears, 2006, p. 19)

How can human nature rescue the situation? The distinguished American physicist David Mermin sets out Bohr’s case:

What does it mean for a property to be real? When you study an object how can you be sure you are learning something about the object itself, and not merely discovering some irrelevant feature of the instrument you used in your study? This is a question that has plagued generations of psychologists. When you measure IQ are you learning something about an inherent quality of a person called “intelligence,” or are you merely acquiring information about how a person responds to something you have fancifully called an IQ Test? Until the advent of the quantum theory in 1925 physicists were above such concerns. But since then, with the discovery that experiments at the atomic level necessarily disturb the object they measure, precisely such reservations have been built into the foundations of physics. (Mermin, 1993, p. 1)

Einstein & Bohr

Einstein & Bohr

This passage derives from a famous debate in the history of physics between the two greatest physicists of the twentieth-century: Bohr and Einstein. The debate centred on a fundamental question, namely, “what is it for something to have a property?” The outcome favoured Bohr’s cautious position that the result of a quantum measurement is best thought of as the property of an interaction between the measuring tool and the entity mesured. Bohr believed that this was also true for measurement in psychology. Wittgenstein’s philosophy arrives at precisely the same conclusion. Bohr taught that it was wrong to think that the task of physics was to find out how nature is by which he meant that quantum physics cannot be tasked with investigating the intrinsic properties of electrons, photons, and so on. The cautious position to take is that we must settle for the interaction between the electron and the instrument designed by the experimental physicist. Quantum theory teaches that that which is measured, and the measuring tool form an “indivisible whole.” Heisenberg (1958, p. 58) writes:

This again emphasizes a subjective element in the description of atomic events, since the measuring tool has been constructed by the observer, and we have to remember that what we observe is not nature in itself but nature exposed to our method of questioning. Our scientific work in physics consists in asking questions about nature in the language that we possess and trying to get an answer from experiment by the means at our disposal. In this way quantum theory reminds us, as Bohr has put it, of the old wisdom that when searching for harmony in life one must never forget that in the drama of existence we are ourselves both players and spectators.

Returning to our analysis of the Knewton platform, consider a student who has demonstrated mastery of the concept “multiplication,” for example. One gets the clear impression from the information made available by Knewton that, for the Knewton scientists, the measuring tool is the multiplication question together with its associated scoring rubric. This is only true if one subscribes to a quasi-religious Platonism and this approach isn’t available to the Knewton scientist since it rules out the possibility of learning new things. The calculation “20 times 20 = 400” is only true against a background mathematical practice of multiplication. Without this human background, the notions of “right” and “wrong” in mathematics lose their meaning. According to Bohr and Wittgenstein, it is incorrect to treat mastery of the multiplication concept as a property of the individual. Rather, one should define concept mastery as a joint property of the student and the mathematical practice.

Wittgenstein3

And Wittgenstein’s contention is precisely that, with the demise of Platonism, there can be such a thing as adding correctly – such a thing as a determinate requirement imposed by the rules of addition – only within a framework of extensive institutional activity and agreement in the judgements which participation in those institutions involves us in making. The very existence of our concepts depends on such activity. (Wright, 2001, pp. 155-156).

Recall that the learning paradox arose when concepts like “green” and “colour” were ascribed to the individual. Here the paradox can be avoided because a concept is now the property of an interaction. In short, one escapes the learning paradox by treating what is measured and the measuring tool as an indivisible whole, the all-important measuring tool being founded in human customs and practices.

Conclusion

While there can be little doubt that Knewton has considerable value as another learning resource available to teachers and parents, its claim to be a “science” of adaptive learning has no validity. The Knewton software can lay no claim whatever to function according to a science of learning.

References

Bereiter, C. (1985). Towards a solution of the learning paradox. Review of Educational Research, 55(2), 210-226.
Heisenberg, W. (1958). Physics and philosophy. New York: Harper & Row.
Kripke, S.A. (1982). Wittgenstein on rules and private language. Oxford: Blackwell.
Mermin, N.D. (1993). Lecture given at the British Association Annual Science Festival. Keele: British Association for the Advancement of Science.
Oppenheimer, R. (1955, September 4). Analogy in science. Paper presented at the 63rd Annual Meeting of the American Psychological Association, San Francisco, CA.
Pears, D. (2006). Paradox and platitude in Wittgenstein’s philosophy. Oxford: Oxford University Press.
Stapp, H.P. (1993). Mind, matter and quantum mechanics. Berlin: Springer-Verlag.
Suter, R. (1989). Interpreting Wittgenstein: A cloud of philosophy, a drop of grammar. Philadelphia: Temple University Press.
Wright, C. (2001). Rails to infinity. Cambridge, MA: Harvard University Press.

Rate this:

Mr Gove’s major problem: Why Pisa ranks are wrong.

02 Sunday Feb 2014

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

academic debate regarding item response theory, American Psychological Association, Andreas Schleicher, Boring, Borsboom, David Spiegelhalter, Department of Education England, Dr Hugh Morrison, E.B., E.G., Fechner, Frederick Lord, G.T., Gardner, H. (2005). Scientific psychology: Should we bury it or praise it?, Hölder’s seven axioms, Issac Newton, Item Response Theory, J. (2000). Normal science, J.R. (1974). Fifty years progress in soil physics., Joel Michell, ludwig Wittgenstein, Mellenbergh, methodological thought disorder, Michael Gove, Michell, OECD, Oppenheimer, pathological science and psychometrics., Philip, PISA, Quantity and measurement, R. (1956). Analogy in science. The American Psychologist, Robert Oppenheimer, Ross, S. (1964). Logical foundations of psychological measurement., S.S. Stevens, Sir Peter Medawar, Svend Kriener, Titchener, Van Heerden

Why PISA ranks are founded on a methodological thought disorder

When psychometricians claimed to be able to measure, they used the term ‘measurement’ not just for political reasons but also for commercial ones. … Those who support scientific research economically, socially and politically have a manifest interest in knowing that the scientists they support work to advance science, not subvert it. And those whose lives are affected by the application of what are claimed to be ‘scientific findings’ also have an interest in knowing that these ‘findings’ have been seriously investigated and are supported by evidence. (Michell, 2000, p. 660)

This essay is a response to the claim by the Department of Education that: “The OECD is at the forefront of the academic debate regarding item response theory [and] the OECD is using what is acknowledged as the best available methodology [for international comparison studies].”

Item Response Theory plays a pivotal role in the methodology of the PISA international league table. This essay refutes the claim that item response theory is a settled, well-reasoned approach to educational measurement. It may well be settled amongst quantitative psychologists, but I doubt if there is a natural scientist on the planet who would accept that one can measure mental attributes in a manner which is independent of the measuring instrument (a central claim of item response theory). It will be argued below that psychology’s approach to the twin notions of “quantity” and “measurement” has been controversial (and entirely erroneous) since its earliest days. It will be claimed that the item response methodolology, in effect, misuses the two fundamental concepts of quantity and measurement by re-defining them for its own purposes. In fact, the case will be made that PISA ranks are founded on a “methodological thought disorder” (Michell, 1997).

Given the concerns of such a distinguished statistician as Professor David Spiegelhalter, the Department of Education’s continued endorsement of PISA is difficult to understand. This essay extends the critique of PISA and item response theory beyond the concerns of Spiegelhalter to the very data from which the statistics are generated. Frederick Lord (1980, p. 227-228), the father of modern psychological measurement, warned psychologists that when applied to the individual test-taker, item response theory produces “absurd” and “paradoxical” results. Given that Lord is one of the architects of item response theory, it is surprising that this admission provoked little or no debate among quantitative psychologists. Are politicians and the general public aware that item response theory breaks down when applied to the individual?

In order to protect the item response model from damaging criticism, Lord proposed what physicists call a “hidden variables” ensemble model when interpreting the role probability plays in item response theory. As a consequence item response models are deterministic and draw on Newtonian measurement principles. “Ability” is construed as a measurement-independent “state” of the individual which is the source of the responses made to test items (Borsboom, Mellenbergh, & van Heerden, 2003). Furthermore, item response theory is incapable of taking account of the fact that the psychologist participates in what he or she observe. Richardson (1999) writes: “[W]e find that the IQ-testing movement is not merely describing properties of people: rather, the IQ test has largely created them” (p. 40). The participative nature of psychological enquiry renders the objective Newtonian model inappropriate for psychological measurement. This prompted Robert Oppenheimer, in his address to the American Psychological Association, to caution: [I]t seems to me that the worst of all possible misunderstandings would be that psychology be influenced to model itself after a physics which is not there anymore, which has been quite outdated.”

Unlike psychology, Newtonian measurement has very precise definitions of “quantity” and “measurement” which item response theorists simply ignore. This can have only one interpretation, namely, that the numerals PISA attaches to the education systems of countries aren’t quantities, and that PISA doesn’t therefore “measure” anything, in the everyday sense of that word. I have argued elsewhere that item response theory can escape these criticisms by adopting a quantum theoretical model (in which the notions of “quantity” and “measurement” lose much of their classical transparency). However, that would involve rejecting one of the central tenets of item response theory, namely, the independence of what is measured from the measuring instrument. Item response theory has no route out of its conceptual difficulties.

This represents a conundrum for the Department of Education. In endorsing PISA, the Department is, in effect, supporting a methodology designed to identify shortcomings in the mathematical attainment of pupils, when that methodology itself has serious mathematical shortcomings.

Modern item response theory is founded on a definition of measurement promulgated by Stanley Stevens and addressed in detail below. By this means, Stevens (1958, p. 384) simply pronounced psychology a quantitative science which supported measurement, ignoring established practice elsewhere in the natural sciences. Psychology refused to confront Kant’s view that psychology couldn’t be a science because mental predicates couldn’t be quantified. Wittgenstein’s (1953, p. 232) scathing critique had no impact on quantitative psychology: “The confusion and barrenness of psychology is not to be explained by calling it a “young science”; its state is not comparable with that of physics, for instance, in its beginnings. … For in psychology there are experimental methods and conceptual confusion. … The existence of the experimental method makes us think we have the means of solving the problems which trouble us; though problem and method pass one another by.”

Howard Gardner (2005, p. 86), the prominent Harvard psychologist looks back in despair to the father of psychology itself, William James:

On his better days William James was a determined optimist, but he harboured his doubts about psychology. He once declared, “There is no such thing as a science of psychology,” and added “the whole present generation (of psychologists) is predestined to become unreadable old medieval lumber, as soon as the first genuine insights are made.” I have indicated my belief that, a century later, James’s less optimistic vision has materialised and that it may be time to bury scientific psychology, at least as a single coherent undertaking.

I will demonstrate in a follow-up paper to this essay, an alternative approach which solves the measurement problem as Stevens presents it, but in a manner which is perfectly in accord with contemporary thinking in the natural sciences. None of the seemingly intractable problems which attend item response theory trouble my account of measurement in psychology.

However, my solution renders item response theory conceptually incoherent.

In passing it should be noted that some have sought to conflate my analysis with that of Svend Kreiner, suggesting that my concerns would be assuaged if only PISA could design items which measured equally from country to country. Nothing could be further from the truth; no adjustment in item properties can repair PISA or item response theory. No modification of the item response model would address its conceptual difficulties.

The essay draws heavily on the research of Joel Michell (1990, 1997, 1999, 2000, 2008) who has catalogued, with great care, the troubled history of the twin notions of quantity and measurement in psychology. The following extracts from his writings, in which he accuses quantitative psychologists of subverting science, counter the assertion that item response theory is an appropriate methodology for international comparisons of school systems.

From the early 1900s psychologists have attempted to establish their discipline as a quantitative science. In proposing quantitative theories they adopted their own special definition of measurement and treated the measurement of attributes such as cognitive abilities, personality traits and sensory intensities as though they were quantities of the type encountered in the natural sciences. Alas, Michell (1997) presents a carefully reasoned argument that psychological attributes lack additivity and therefore cannot be quantities in the same way as the attributes of Newtonian physics. Consequently he concludes: “These observations confirm that psychology, as a discipline, has its own definition of measurement, a definition quite unlike the traditional concept used in the physical sciences” (p. 360).

Boring (1929) points out that the pioneers of psychology quickly came to realise that if psychology was not a quantitative discipline which facilitated measurement, psychologists could not adopt the epithet “scientist” for “there would … have been little of the breath of science in the experimental body, for we hardly recognise a subject as scientific if measurement is not one of its tools” (Michell, 1990, p. 7).

The general definition of measurement accepted by most quantitative psychologists is that formulated by Stevens (1946) which states: “Measurement is the assignment of numerals to objects or events according to rules” (Michell, 1997, p. 360). It seems that psychologists assign numbers to attributes according to some pre-determined rule and do not consider the necessity of justifying the measurement procedures used so long as the rule is followed. This rather vague definition distances measurement in psychology from measurement in the natural sciences. Its near universal acceptance within psychology and the reluctance of psychologists to confirm (via. empirical study) the quantitative character of their attributes casts a shadow over all quantitative work in psychology. Michell (1997, p. 361) sees far-reaching implications for psychology:

If a quantitative scientist (i) believes that measurement consists entirely in making numerical assignments to things according to some rule and (ii) ignores the fact that the measurability of an attribute presumes the contingent … hypothesis that the relevant attribute possesses an additive structure, then that scientist would be predisposed to believe that the invention of appropriate numerical assignment procedures alone produces scientific measurement.

Historically, Fechner (1860) – who coined the word “psychophysics” – is recognised as the father of quantitative psychology. He considered that the only creditworthy contribution psychology could make to science was through quantitative approaches and he believed that reality was “fundamentally quantitative.” His work focused on the instrumental procedures of measurement and dismissed any requirement to clarify the quantitative nature of the attribute under consideration.

His understanding of the logic of measurement was fundamentally flawed in that he merely presumed (under some Pythagorean imperative) that his psychological attributes were quantities. Michell (1997) contends that although occasional criticisms were levied against quantitative measurement in psychology, in general the approach was not questioned and became part of the methodology of the discipline. Psychologists simply assumed that when the study of an attribute generated numbers, that attribute was being measured.

The first official detailed investigation of the validity of psychological measurement from beyond its professional ranks was conducted – under the auspices of the British Association for the Advancement of Science – by the Ferguson Committee in 1932. The non-psychologists on the committee concluded that there was no evidence to suggest that psychological methods measured anything, as the additivity of psychological attributes had not been demonstrated. Psychology moved to protect its place in the academy at all costs. Rather than admitting the error identified by the committee and going back to the drawing board, psychologists sought to defend their modus operandi by attempting a redefinition of psychological measurement. Stevens’ (1958, p. 384) definition that measurement involved “attaching numbers to things” legitimised the measurement practices of psychologists who subsequently were freed from the need to test the quantitative structure of psychological predicates.

Michell (1997, p. 356) declares that presently many psychological researchers are “ignorant with respect to the methods they use.” This ignorance permeates the logic of their methodological practices in terms of their understanding of the rationale behind the measurement techniques used. The immutable outcome of this new approach to measurement within psychology is that the natural sciences and psychology have quite different definitions of measurement.

Michell (1997, p. 374) believes that psychology’s failure to face facts constitutes a “methodological thought disorder” which he defines as “the sustained failure to see things as they are under conditions where the relevant facts are evident.” He points to the influence of an ideological support structure within the discipline which serves to maintain this idiosyncratic approach to measurement. He asserts that in the light of commonly available evidence, interested empirical psychologists recognise that “Stevens’ definition of measurement is nonsense and the neglect of quantitative structure a serious omission” (Michell, 1997, p. 376).

Despite the writings of Ross (1964) and Rozeboom (1966), for example, Stevens’ definition has been generally accepted as it facilitates psychological measurement by an easily attainable route. Michell (1997, p. 395) describes psychology’s approach to measurement as “at best speculation and, at worst, a pretence at science.”

[W]e are dealing with a case of thought disorder, rather than one of simple ignorance or error and, in this instance, these states are sustained systemically by the almost universal adherence to Stevens’ definition and the almost total neglect of any other in the relevant methodology textbooks and courses offered to students. The conclusion that follows from this history, especially that of the last five decades, is that systemic structures within psychology prevent the vast majority of quantitative psychologists from seeing the true nature of scientific measurement, in particular the empirical conditions necessary for measurement. As a consequence, number-generating procedures are consistently thought of as measurement procedures in the absence of any evidence that the relevant psychological attributes are quantitative. Hence, within modern psychology a situation exists which is accurately described as systemically sustained methodological thought disorder. (Michell, 1997, p. 376)

To make my case, let me first make two fundamental points which should shock those who believe that the OECD is using what is acknowledged as the best available methodology for international comparisons. Both of these points should concern the general public and those who support the OECD’s work. First, the numerals that PISA publishes are not quantities, and second, PISA tables do not measure anything.

To illustrate the degree of freedom afforded to psychological “measurement” by Stevens it is instructive to focus on the numerals in the PISA table. Could any reasonable person believe in a methodology which claims to summarise the educational system of the United States or China in a single number? Where is the empirical evidence for this claim? Three numbers are required to specify even the position of a single dot produced by a pencil on one line of one page of one of the notebooks in the schoolbag of one of the thousands of American children tested by PISA. The Nobel Laureate, Sir Peter Medawar refers to such claims as “unnatural science.” Medawar (1982, p. 10) questions such representations using Philip’s (1974) work on the physics of a particle of soil:

The physical properties and field behaviour of soil depends on particle size and shape, porosity, hydrogen iron concentration, material flora, and water content and hygroscopy. No single figure can embody itself in a constellation of values of all these variables in any single real instance … psychologists would nevertheless like us to believe that such considerations as these do not apply to them.

Quantitative psychology, since its inception, has modelled itself on the certainty and objectivity of Newtonian mechanics. The numerals of the PISA tables appear to the man or woman in the street to have all the precision of measurements of length or weight in classical physics. But, by Newtonian standards, psychological measurement in general, and item response theory in particular, simply have no quantities, and do not “measure,” as that word is normally understood.

How can this audacious claim to “measure” the quality of a continent’s education provision and report it in a single number be justified? The answer, as has already been pointed out, is to be found in the fact that quantitative psychology has its own unique definition of measurement, which is that “measurement is the business of pinning numbers on things” (Stevens, 1958, p. 384). With such an all-encompassing definition of measurement, PISA can justify just about any rank order of countries. But this isn’t measurement as that word is normally understood.

This laissez faire attitude wasn’t always the case in psychology. It is clear that, as far back as 1905, psychologists like Titchener recognised that his discipline would have to embrace the established definition of measurement in the natural sciences: “When we measure in any department of natural science, we compare a given measurement with some conventional unit of the same kind, and determine how many times the unit is contained in the magnitude” (Titchener, 1905, p. xix). Michell (1999) makes a compelling case that psychology adopted Stevens’ ultimately meaningless definition of measurement – “according to Stevens’ definition, every psychological attribute is measurable” (Michell, 1999, p. 19) – because they feared that their discipline would be dismissed by the “hard” sciences without the twin notions of quantity and measurement.

The historical record shows that the profession of psychology derived economic and other social advantages from employing the rhetoric of measurement in promoting its services and that the science of psychology, likewise, benefited from supporting the profession in this by endorsing the measurability thesis and Stevens’ definition. These endorsements happened despite the fact that the issue of the measurability of psychological attributes was rarely investigated scientifically and never resolved. (Mitchell, 1999, p. 192)

The mathematical symbolism in the next paragraph makes clear the contrast between the complete absence of rigorous measurement criteria in psychology and the onerous demands placed on the classical physicist.

Holder's Axioms

An essential step in establishing the validity of the concepts “quantity” and “measurement” in item response theory is an empirical analysis centred on Hölder’s conditions. The reader will search in vain for evidence that quantitative psychologists in general, and item response theorists in particular, subject the predicate “ability” to Hölder’s conditions.

This is because the definition of measurement in psychology is so vague that it frees psychologists of any need to address Hölder’s conditions and permits them, without further ado, to simply accept that the predicates they purport to measure are quantifiable.

Quantitative psychology presumed that the psychological attributes which they aspired to measure were quantitative. … Quantitative attributes are attributes having a quite specific structure. The issue of whether psychological attributes have that sort of structure is an empirical issue … Despite this, mainstream quantitative psychologists … not only neglected to investigate this issue, they presumed that psychological attributes are quantitative, as if no empirical issue were at stake. This way of doing quantitative psychology, begun by its founder, Gustav Theodor Fechner, was followed almost universally throughout the discipline and still dominates it. … [I]t involved a defective definition of a fundamental methodological concept, that of measurement. … Its understanding of the concept of measurement is clearly mistaken because it ignores the fact that only quantitative attributes are measurable. Because this … has persisted within psychology now for more than half a century, this tissue of errors is of special interest. (Michell, 1999, pp. xi – xii)

This essay has sought to challenge the Department of Education’s claim that in founding its methodology on item response theory, PISA is using the best available methodology to rank order countries according to their education provision. As Sir Peter Medawar makes clear, any methodology which claims to capture the quality of a country’s entire education system in a single number is bound to be suspect. If my analysis is correct PISA is engaged in rank-ordering countries according to the mathematical achievements of their young people, using a methodology which itself has little or no mathematical merit.

Item response theorists have identified two broad interpretations of probability in their models: the “stochastic subject” and “repeated sampling” interpretations. Lord has demonstrated that the former leads to absurd and paradoxical results without ever investigating why this should be the case. Had such an investigation been initiated, quantitative psychologists would have been confronted with the profound question of the very role probability plays in psychological measurement. Following a pattern of behaviour all too familiar from Michell’s writings, psychologists simply buried their heads in the sand and, at Lord’s urging, set the stochastic subject interpretation aside and emphasised the repeated sampling approach.

In this way the constitutive nature of irreducible uncertainty in psychology was eschewed for the objectivity of Newtonian physics. This is reflected in item response theory’s “local hidden variables” ensemble model in which ability is an intrinsic measurement-independent property of the individual and measurement is construed as a process of merely checking up on what pre-exists measurement. For this to be justified, Hölder’s seven axioms must apply.

In order to justify the labels “quantity” and “measurement” PISA must produce the relevant empirical evidence against the Hölder axioms. Absent such evidence, it seems very difficult to justify the Department of Education’s claims that (i) “the OECD is at the forefront of the academic debate regarding item response theory,” and (ii) “the OECD is using what is acknowledged as the best available methodology [for international comparison studies].”

Dr Hugh Morrison

(drhmorrison@gmail.com)

References

Boring, E.G. (1929). A history of experimental psychology. New York: Century.

Borsboom, D., Mellenbergh, G.J., & van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110(2), 203-219.

Fechner, G.T. (1860). Elemente der psychophysik. Leipzig: Breitkopf & Hartel. (English translation by H.E. Adler, Elements of Psychophysics, vol. 1, D.H. Howes & E.G. Boring (Eds.). New York: Holt, Rinehart & Winston.)

Gardner, H. (2005). Scientific psychology: Should we bury it or praise it? In R.J. Sternberg (Ed.), Unity in psychology (pp. 77-90). Washington DC: American Psychological Association.

Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hilldale, NJ.: Lawrence Erlbaum Associates, Publishers.

Medawar, P.B. (1982). Pluto’s republic. Oxford University Press.

Michell, J. (1990). An introduction to the logic of psychological measurement. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.

Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88, 353-385.

Michell, J. (1999). Measurement in psychology: A critical history of a methodological concept. Cambridge: Cambridge University Press.

Michell, J. (2000). Normal science, pathological science and psychometrics. Theory and Psychology, 10, 639-667.

Michell, J. (2008). Is psychometrics pathological science? Measurement: Interdisciplinary Research and Perspectives, 6, 7-24.

Oppenheimer, R. (1956). Analogy in science. The American Psychologist, 11, 127-135.

Philip, J.R. (1974). Fifty years progress in soil physics. Geoderma, 12, 265-280.

Richardson, K. (1999). The making of intelligence. London: Weidenfeld & Nicolson.

Ross, S. (1964). Logical foundations of psychological measurement. Copenhagen: Munksgaard.

Rozeboom, W.W. (1966). Scaling theory and the nature of measurement. Synthese, 16, 170-223.

Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103, 667-680.

Stevens, S.S. (1958). Measurement and man. Science, 127, 383-389.

Titchener, E.B. (1905). Experimental psychology: A manual of laboratory practice, vol. 2. London: Macmillan.

Wittgenstein, L. (1953). Philosophical Investigations. Oxford: Blackwell.

Rate this:

Northern Ireland Education Minister develops a bad Rasch

02 Sunday Jun 2013

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

Andreas Schleicher, AQO 4167/11-15, Belfast Newsletter, Classical Test Theory, Item Response Theory, Jim Allister, John O'Dowd, Michael Davidson OECD PISA, Michael Gove, Northern Ireland Assembly questions, OECD, PISA, Professor Ray Adams, Rasch Model, Sinn Fein Education Minister, Times Educational Supplement, William Stewart

O'Dowd PISA Rasch Model

Here is the Minister of Education’s reply. Note that no opportunity was afforded the MLA asking the question Jim Allister AQO 4167/11-15, to respond via a supplementary question. The Minister merely read a response into the record treating the question in the same manner as he has with AQW 22049/11-15 of April 22, 2013. The Speaker then called time raising concerns about proper use of Assembly procedure

O'Dowd Oral Question on PISA answer

Rate this:

Subscribe

  • Entries (RSS)
  • Comments (RSS)

Archives

  • May 2019
  • October 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • February 2018
  • January 2018
  • December 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • September 2016
  • August 2016
  • May 2016
  • January 2016
  • December 2015
  • November 2015
  • August 2015
  • May 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • October 2014
  • September 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • February 2013
  • January 2013
  • November 2012
  • October 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • July 2011
  • June 2011
  • May 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008

Categories

  • 11-plus
  • academic selection
  • Caitriona Ruane
  • General
  • Grammar Schools
  • Numeracy and Literacy
  • The Department of Education N.Ireland
  • Uncategorized

Meta

  • Register
  • Log in

Create a free website or blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy