Why OECD Pisa cannot be rescued by ETS


, , , , , , , , , , , , , , , , , , , , , , , , ,

PISA cannot be rescued by switching IRT model because all IRT modelling is flawed.


Dr Hugh Morrison (The Queen’s University of Belfast [retired])



On page 33 of the Times Educational Supplement of Friday 25th November 2016, Andreas Schleicher, who oversees PISA, appears to accept my analysis of the shortcomings of the Rasch model which plays a central role in PISA’s league table.  The Rasch model is a “one parameter” Item Response Theory (IRT) model, and Schleicher argues that PISA’s conceptual difficulties can be resolved by abandoning the Rasch model for a two or three parameter model.  However, my criticisms apply to all IRT models, irrespective of the number of parameters.  In this essay I will set out the reasoning behind this claim.


One can find the source of IRT’s difficulty in Niels Bohr’s 1949 paper entitled Discussion with Einstein on Epistemological Problems in Atomic Physics.  Few scientists have made a greater contribution to the study of measurement than the Nobel Laureate and founding father of quantum theory, Niels Bohr.  Given Bohr’s preoccupation what the scientist can say about aspects of reality that are not visible (electrons, photons, and so on), one can understand his constant references to measurement in psychology.  “Ability” cannot be seen directly; rather, like the microentities that manifest as tracks in particle accelerators, ability manifests in the examinee’s responses to test items.  IRT is concerned with “measuring” something which the measurer cannot experience directly, namely, the ability of the examinee.


IRT relies on a simple inner/outer picture for its models to function.  In IRT the inner (a realm of timeless, unobserved latent variables, or abilities) is treated as independent of the outer (here examinees write or speak responses at moments in time).  This is often referred to as a “reservoir” model in which timeless abilities are treated as the source of the responses given at specific moments in time.


As early as 1929 Bohr rejected this simplistic thinking in strikingly general terms: “Strictly speaking, the conscious analysis of any concept stands in a relation of exclusion to its immediate application.  The necessity of taking recourse to a complementary … mode of description is perhaps most familiar to us from psychological problems.”  Now what did Bohr mean by these words?  Consider, for example, the concept “quadratic.”  It is tempting to adopt a reservoir approach and trace a pupil’s ability to apply that concept in accord with established mathematical practice to his or her having the formula in mind.  The guidance offered by the formula in mind (Bohr’s reference to “conscious analysis”) accounts for the successful “application,” for example, to the solution of specific items on an algebra test.


However, this temptingly simplistic model in which the formula is in the unobserved mental realm and written or spoken applications of the concept “quadratic” take place in the observed realm, contains a fundamental flaw; the two realms cannot be meaningfully connect.  The “inner” formula (in one realm) gets its guidance properties from human practices (in the other realm).  A formula as a thing-in-itself cannot guide; one has to be trained in the established practice of using the formula before it has guidance properties.  In school mathematics examinations around the world, pupils are routinely issued with a page of formulae relevant to the examination.  Alas, it is the experience of mathematics teachers everywhere that simply having access to the formula as a thing-in-itself offers little or no guidance to the inadequately trained pupil.  The formula located in one realm cannot connect with the applications in the other.


Wittgenstein teaches that no formula, rule, principle, etc. in itself can ever determine a course of action.  The timeless mathematical formula in isolation cannot generate all the complexities of a practice (something which evolves in time); rather, as Michael Oakeshott puts it, a formula is a mere “abridgement” of the practice – the practice is primary, with the formula, rule, precept etc. deriving its “life” from the practice.


Returning to Bohr’s writing, it is instructive to explain his use of the word “complementarity” in respect of psychology and to explain the meaning of the words: “stands in a relation of exclusion.”  Complementarity was the most important concept Bohr bequeathed to physics.  It involves a combination of two mutually exclusive facets.  In order to see its relevance to the validity of IRT modelling, let’s return to the two distinct realms.


We think of the answers to a quadratic equation as being right or wrong (a typical school-level quadratic equation has two distinct answers).  In the realm of application this is indeed the case.  When the examinee is measured, his or her response is pronounced right or wrong dependent upon its relation to established mathematical practice.  However, in the unobserved realm, populated by rules, formulae and precepts (as things-in-themselves), any answer to a quadratic equation is simultaneously right and wrong!


A formula as a thing-in-itself cannot separate what accords with it from what conflicts with it, because there will always exist an interpretation of the formula for which a particular answer is correct, and another interpretation for which the same answer can be shown to conflict with the formula.  Divorced from human practices, the distinction between right and wrong collapses.  (This is a direct consequence of Wittgenstein celebrated “private language” argument.)  This explains Bohr’s reference to a “relation of exclusion.”  In simplistic terms, the unobserved realm, in which answers are compared with the formula for solving quadratics, responses are right-and-wrong, while in the observed realm, where answers are compared with the established practice, responses are right-or-wrong.


On this reading, ability has two mutually exclusive facets which cannot meaningfully be separated.  The distinguished Wittgenstein scholar, Peter Hacker, captures this situation as follows: “grasping an explanation of meaning and knowing how to use the word explained are not two independent abilities but two facets of one and the same ability.”  Ability, construed according to Bohr’s complementarity, is indefinite when unobserved and definite when observed.  Moreover, this definite measure is not an intrinsic property of the examinee, but a property of the examinee’s interaction with the measuring tool.


Measurement of ability is not a matter of passively checking up on what already exists – a central tenet of IRT.  Bohr teaches that the measurer effects a radical change from indefinite to definite.  Pace IRT, measurers, in effect, participate in what is measured.  No item response model can accommodate the “jump” from indefinite to definite occasioned by the measurement process.  All IRT models mistakenly treat unmeasured ability as identical to measured ability.  What scientific evidence could possibly be adduced in support of that claim?  No IRT model can represent ability’s two facets because all IRT models report ability as a single real number, construed as an intrinsic property of the measured individual.




The problem of Social Mobility explained


, , , , , , , , , , , , , , , , , , ,


how-not-to-be-a-hypocriteWhile the mainstream media offer endless analysis and political party talking heads pontificate on the issue of grammar schools and social mobility, the explanation for any reduction of social mobility is made clear by actions of these Members of Parliament

It is not the grammar schools  which are responsible for restricting social mobility but those influential people who had the benefit of receiving private, independent schooling or attended a grammar school and then denying to others something that improved their own social mobility.

All those illustrated in this post would benefit from reading a copy of Adam Swift’s book, How not to be a hypocrite.how-not-to-be-a-hypocrite







Stroud MP Neil Carmichael Conservative chairman of the Education Select Committee told Radio Four’s Westminster Hour:

We have serious issues about social mobility, in particular white working-class young people, and I don’t think that having more grammar schools is going to help them


Neil Carmichael boarded at St Peter’s, an independent school in York that dates back to AD627 and includes among its alumni Guy Fawkes, cricketer Jonny Bairstow and actor Greg Wise. Today to send your son to board would cost £27,375 a year.

Neil Carmichael’s Wikipedia page makes reference to him being a hypocrite.




John Pugh, Liberal Democrat education spokesperson condemns grammars. Pugh attended Prescot and Maidstone grammars, and taught in the independent sector at Merchant Taylors’ Boys’ School





Jeremy Corbyn MP, leader of the Labour Party was educated at Castle House Independent Preparatory School and Adams Grammar School.

In 1999, the MP split from his first wife over a conflict over their son’s education. His wife, Claudia Bracchita, explained:

We had to make the right decision in the interests of our child. We would have been less than human if we had done anything else.



Sir Michael Wilshaw attended Clapham College Grammar School.

The Ofsted Chief recently made a plea to Theresa May, the Prime Minister,  to stop grammar schools He told Nick Ferrari on LBC, Leading Britain’s Conversation,

We need more than the top 10 or 20% of youngsters to do well in our economy and in our society.

Sir Michael Wilshaw has conveniently ignored the Northern Ireland education system which is entirely selective.  Northern Ireland leads the UK in performance in GCSE and A-level examinations and has only one private post-primary school.

Update 13th September via Guido Fawkes


Polly Toynbee today attacks Theresa May’s grammar school plans, arguing that segregation by social class is “irrational” and claiming grammars add to “splits and divisions” in society. She has some front. Polly herself failed her 11-plus and attended the independent Badminton School. Earning £110,000-a-year at theGuardian meant she was able to send two of her children to private school as well. Today Toynbee writes that “inequality is monstrously unfair… it means birth is almost always social destiny”. Some children are evidently more equal than others.Is there a bigger hypocrite in the grammar schools debate?

The DUP have failed to protect UK parity on academic standards


, , , , , , , , , , , , , ,


Peter Weir 1Jim Hacker

Despite the power to do so the DUP Education Minister, Peter Weir, has failed to address his predecessor’s break with United Kingdom parity in respect of academic standards.

Had he acted immediately, instead of buying time for the Northern Ireland Executive, Mr Weir could have adopted the United Kingdom model both in respect of the grading scale for examinations and longstanding concerns regarding coursework or so-called “controlled assessment”

It appears that the DUPs Yes Minister equivalent of Jim Hacker has been an easy victim of the green Blob’s civil servants in Rathgael House. The green Blob is Northern Ireland’s devolved version of the UK education establishment.


The text of the Newsletter Lead Letter

Are GCSE and GCE exam results between GB and N. Ireland comparable?

The answer regrettably, for the moment, is that it is too early to tell.

The general public must be careful not to assume that Peter Weir, by overturning the effective monopoly John O’Dowd granted CCEA over GCSE and GCE assessment in Northern Ireland, has done anything other than tinker at the edges of the problem bequeathed him by Sinn Fein. He seems to be a ‘Yes Minister’ captured by the green Blob, the entrenched education establishment

John O’Dowd’s break with UK parity in respect of academic standards goes beyond his expulsion of two of the largest UK awarding bodies, and presents huge technical difficulties in respect of standards.  These could have been solved at a stroke had Peter Weir responded positively by cutting this Gordian knot and adopted the UK model both in respect of its grading scale and its concerns regarding coursework or so-called “controlled assessment.”

This action would have allowed Peter Weir to significantly scale down CCEA’s GCSE/GCE functions. Northern Ireland could simply “borrow” papers from larger awarding bodies and make the substantial savings available to hard-pressed schools.

Given the achievements of our schools in the recent GCSE and AS/A2 results, it is bizarre they now enter another time of uncertainty while CCEA – who act as their own qualifications regulator– fail to reconcile these two sets of standards.  The technical difficulties are considerable; CCEA’s assessments differ in the role given to controlled assessment.

The public have a right to know precisely what CCEA’s Qualification Regulator, Roger McCune, means when he promises: “We will start work immediately on the technical implementation of the new grading and continue to ensure that our qualifications remain comparable to other similar qualifications elsewhere in the United Kingdom.”

The CCEA Regulator is confusing squares and circles.

Peter Weir stands in danger of being compared to Jim Hacker for his failure to master his opponents within the green Blob and refusal to act decisively during the first 100 days of a new administration.

News Letter 30-08-16 Weir_20160830_0001



Why the UK Department for Education is wrong on promoting OECD Pisa


, , , , , , , , , , , , ,


Why PISA ranks are founded on a methodological thought disorder


Dr Hugh Morrison

(The Queen’s University of Belfast [retired])



When psychometricians claimed to be able to measure, they used the term ‘measurement’ not just for political reasons but also for commercial ones. … Those who support scientific research economically, socially and politically have a manifest interest in knowing that the scientists they support work to advance science, not subvert it.  And those whose lives are affected by the application of what are claimed to be ‘scientific findings’ also have an interest in knowing that these ‘findings’ have been seriously investigated and are supported by evidence. (Michell, 2000, p. 660)



This essay is a response to the claim by the Department of Education that: “The OECD is at the forefront of the academic debate regarding item response theory [and] the OECD is using what is acknowledged as the best available methodology [for international comparison studies].”


Item Response Theory plays a pivotal role in the methodology of the PISA international league table.  This essay refutes the claim that item response theory is a settled, well-reasoned approach to educational measurement.  It may well be settled amongst quantitative psychologists, but I doubt if there is a natural scientist on the planet who would accept that one can measure mental attributes in a manner which is independent of the measuring instrument (a central claim of item response theory).  It will be argued below that psychology’s approach to the twin notions of “quantity” and “measurement” has been controversial (and entirely erroneous) since its earliest days.  It will be claimed that the item response methodolology, in effect, misuses the two fundamental concepts of quantity and measurement by re-defining them for its own purposes.  In fact, the case will be made that PISA ranks are founded on a “methodological thought disorder” (Michell, 1997).


Given the concerns of such a distinguished statistician as Professor David Spiegelhalter, the Department of Education’s continued endorsement of PISA is difficult to understand.  This essay extends the critique of PISA and item response theory beyond the concerns of Spiegelhalter to the very data from which the statistics are generated.  Frederick Lord (1980, p. 227-228), the father of modern psychological measurement, warned psychologists that when applied to the individual test-taker, item response theory produces “absurd” and “paradoxical” results.  Given that Lord is one of the architects of item response theory, it is surprising that this admission provoked little or no debate among quantitative psychologists.  Are politicians and the general public aware that item response theory breaks down when applied to the individual?


In order to protect the item response model from damaging criticism, Lord proposed what physicists call a “hidden variables” ensemble model when interpreting the role probability plays in item response theory.  As a consequence item response models are deterministic and draw on Newtonian measurement principles. “Ability” is construed as a measurement-independent “state” of the individual which is the source of the responses made to test items (Borsboom, Mellenbergh, & van Heerden, 2003).  Furthermore, item response theory is incapable of taking account of the fact that the psychologist participates in what he or she observe.  Richardson (1999) writes: “[W]e find that the IQ-testing movement is not merely describing properties of people: rather, the IQ test has largely created them” (p. 40).  The participative nature of psychological enquiry renders the objective Newtonian model inappropriate for psychological measurement.  This prompted Robert Oppenheimer, in his address to the American Psychological Association, to caution: [I]t seems to me that the worst of all possible misunderstandings would be that psychology be influenced to model itself after a physics which is not there anymore, which has been quite outdated.”


Unlike psychology, Newtonian measurement has very precise definitions of “quantity” and “measurement” which item response theorists simply ignore.  This can have only one interpretation, namely, that the numerals PISA attaches to the education systems of countries aren’t quantities, and that PISA doesn’t therefore “measure” anything, in the everyday sense of that word. I have argued elsewhere that item response theory can escape these criticisms by adopting a quantum theoretical model (in which the notions of “quantity” and “measurement” lose much of their classical transparency).  However, that would involve rejecting one of the central tenets of item response theory, namely, the independence of what is measured from the measuring instrument.  Item response theory has no route out of its conceptual difficulties.


This represents a conundrum for the Department of Education.  In endorsing PISA, the Department is, in effect, supporting a methodology designed to identify shortcomings in the mathematical attainment of pupils, when that methodology itself has serious mathematical shortcomings.


Modern item response theory is founded on a definition of measurement promulgated by Stanley Stevens and addressed in detail below.  By this means, Stevens (1958, p. 384) simply pronounced psychology a quantitative science which supported measurement, ignoring established practice elsewhere in the natural sciences.  Psychology refused to confront Kant’s view that psychology couldn’t be a science because mental predicates couldn’t be quantified.  Wittgenstein’s (1953, p. 232) scathing critique had no impact on quantitative psychology: “The confusion and barrenness of psychology is not to be explained by calling it a “young science”; its state is not comparable with that of physics, for instance, in its beginnings. … For in psychology there are experimental methods and conceptual confusion. … The existence of the experimental method makes us think we have the means of solving the problems which trouble us; though problem and method pass one another by.”


Howard Gardner (2005, p. 86), the prominent Harvard psychologist looks back in despair to the father of psychology itself, William James:


On his better days William James was a determined optimist, but he harboured his doubts about psychology.  He once declared, “There is no such thing as a science of psychology,” and added “the whole present generation (of psychologists) is predestined to become unreadable old medieval lumber, as soon as the first genuine insights are made.”  I have indicated my belief that, a century later, James’s less optimistic vision has materialised and that it may be time to bury scientific psychology, at least as a single coherent undertaking.


I will demonstrate in a follow-up paper to this essay, an alternative approach which solves the measurement problem as Stevens presents it, but in a manner which is perfectly in accord with contemporary thinking in the natural sciences.  None of the seemingly intractable problems which attend item response theory trouble my account of measurement in psychology.

However, my solution renders item response theory conceptually incoherent.


In passing it should be noted that some have sought to conflate my analysis with that of Svend Kreiner, suggesting that my concerns would be assuaged if only PISA could design items which measured equally from country to country.  Nothing could be further from the truth; no adjustment in item properties can repair PISA or item response theory.  No modification of the item response model would address its conceptual difficulties.


The essay draws heavily on the research of Joel Michell (1990, 1997, 1999, 2000, 2008) who has catalogued, with great care, the troubled history of the twin notions of quantity and measurement in psychology.  The following extracts from his writings, in which he accuses quantitative psychologists of subverting science, counter the assertion that item response theory is an appropriate methodology for international comparisons of school systems.


From the early 1900s psychologists have attempted to establish their discipline as a quantitative science.  In proposing quantitative theories they adopted their own special definition of measurement and treated the measurement of attributes such as cognitive abilities, personality traits and sensory intensities as though they were quantities of the type encountered in the natural sciences.  Alas, Michell (1997) presents a carefully reasoned argument that psychological attributes lack additivity and therefore cannot be quantities in the same way as the attributes of Newtonian physics.  Consequently he concludes: “These observations confirm that psychology, as a discipline, has its own definition of measurement, a definition quite unlike the traditional concept used in the physical sciences” (p. 360).


Boring (1929) points out that the pioneers of psychology quickly came to realise that if psychology was not a quantitative discipline which facilitated measurement, psychologists could not adopt the epithet “scientist” for “there would … have been little of the breath of science in the experimental body, for we hardly recognise a subject as scientific if measurement is not one of its tools” (Michell, 1990, p. 7).


The general definition of measurement accepted by most quantitative psychologists is that formulated by Stevens (1946) which states: “Measurement is the assignment of numerals to objects or events according to rules” (Michell, 1997, p. 360).  It seems that psychologists assign numbers to attributes according to some pre-determined rule and do not consider the necessity of justifying the measurement procedures used so long as the rule is followed.  This rather vague definition distances measurement in psychology from measurement in the natural sciences.  Its near universal acceptance within psychology and the reluctance of psychologists to confirm (via. empirical study) the quantitative character of their attributes casts a shadow over all quantitative work in psychology.  Michell (1997, p. 361) sees far-reaching implications for psychology:


If a quantitative scientist (i) believes that measurement consists entirely in making numerical assignments to things according to some rule and (ii) ignores the fact that the measurability of an attribute presumes the contingent … hypothesis that the relevant attribute possesses an additive structure, then that scientist would be predisposed to believe that the invention of appropriate numerical assignment procedures alone produces scientific measurement.


Historically, Fechner (1860) – who coined the word “psychophysics” – is recognised as the father of quantitative psychology.  He considered that the only creditworthy contribution psychology could make to science was through quantitative approaches and he believed that reality was “fundamentally quantitative.”  His work focused on the instrumental procedures of measurement and dismissed any requirement to clarify the quantitative nature of the attribute under consideration.


His understanding of the logic of measurement was fundamentally flawed in that he merely presumed (under some Pythagorean imperative) that his psychological attributes were quantities.  Michell (1997) contends that although occasional criticisms were levied against quantitative measurement in psychology, in general the approach was not questioned and became part of the methodology of the discipline.  Psychologists simply assumed that when the study of an attribute generated numbers, that attribute was being measured.


The first official detailed investigation of the validity of psychological measurement from beyond its professional ranks was conducted – under the auspices of the British Association for the Advancement of Science – by the Ferguson Committee in 1932.  The non-psychologists on the committee concluded that there was no evidence to suggest that psychological methods measured anything, as the additivity of psychological attributes had not been demonstrated.  Psychology moved to protect its place in the academy at all costs.  Rather than admitting the error identified by the committee and going back to the drawing board, psychologists sought to defend their modus operandi by attempting a redefinition of psychological measurement.  Stevens’ (1958, p. 384) definition that measurement involved “attaching numbers to things” legitimised the measurement practices of psychologists who subsequently were freed from the need to test the quantitative structure of psychological predicates.


Michell (1997, p. 356) declares that presently many psychological researchers are “ignorant with respect to the methods they use.”  This ignorance permeates the logic of their methodological practices in terms of their understanding of the rationale behind the measurement techniques used.  The immutable outcome of this new approach to measurement within psychology is that the natural sciences and psychology have quite different definitions of measurement.


Michell (1997, p. 374) believes that psychology’s failure to face facts constitutes a “methodological thought disorder” which he defines as “the sustained failure to see things as they are under conditions where the relevant facts are evident.”  He points to the influence of an ideological support structure within the discipline which serves to maintain this idiosyncratic approach to measurement.  He asserts that in the light of commonly available evidence, interested empirical psychologists recognise that “Stevens’ definition of measurement is nonsense and the neglect of quantitative structure a serious omission” (Michell, 1997, p. 376).


Despite the writings of Ross (1964) and Rozeboom (1966), for example, Stevens’ definition has been generally accepted as it facilitates psychological measurement by an easily attainable route.  Michell (1997, p. 395) describes psychology’s approach to measurement as “at best speculation and, at worst, a pretence at science.”


[W]e are dealing with a case of thought disorder, rather than one of simple ignorance or error and, in this instance, these states are sustained systemically by the almost universal adherence to Stevens’ definition and the almost total neglect of any other in the relevant methodology textbooks and courses offered to students.  The conclusion that follows from this history, especially that of the last five decades, is that systemic structures within psychology prevent the vast majority of quantitative psychologists from seeing the true nature of scientific measurement, in particular the empirical conditions necessary for measurement.  As a consequence, number-generating procedures are consistently thought of as measurement procedures in the absence of any evidence that the relevant psychological attributes are quantitative.  Hence, within modern psychology a situation exists which is accurately described as systemically sustained methodological thought disorder. (Michell, 1997, p. 376)


To make my case, let me first make two fundamental points which should shock those who believe that the OECD is using what is acknowledged as the best available methodology for international comparisons.  Both of these points should concern the general public and those who support the OECD’s work.  First, the numerals that PISA publishes are not quantities, and second, PISA tables do not measure anything.


To illustrate the degree of freedom afforded to psychological “measurement” by Stevens it is instructive to focus on the numerals in the PISA table.  Could any reasonable person believe in a methodology which claims to summarise the educational system of the United States or China in a single number?  Where is the empirical evidence for this claim?  Three numbers are required to specify even the position of a single dot produced by a pencil on one line of one page of one of the notebooks in the schoolbag of one of the thousands of American children tested by PISA.  The Nobel Laureate, Sir Peter Medawar refers to such claims as “unnatural science.”  Medawar (1982, p. 10) questions such representations using Philip’s (1974) work on the physics of a particle of soil:


The physical properties and field behaviour of soil depends on particle size and shape, porosity, hydrogen iron concentration, material flora, and water content and hygroscopy.  No single figure can embody itself in a constellation of values of all these variables in any single real instance … psychologists would nevertheless like us to believe that such considerations as these do not apply to them.


Quantitative psychology, since its inception, has modelled itself on the certainty and objectivity of Newtonian mechanics.  The numerals of the PISA tables appear to the man or woman in the street to have all the precision of measurements of length or weight in classical physics.  But, by Newtonian standards, psychological measurement in general, and item response theory in particular, simply have no quantities, and do not “measure,” as that word is normally understood.


How can this audacious claim to “measure” the quality of a continent’s education provision and report it in a single number be justified?  The answer, as has already been pointed out, is to be found in the fact that quantitative psychology has its own unique definition of measurement, which is that “measurement is the business of pinning numbers on things” (Stevens, 1958, p. 384).  With such an all-encompassing definition of measurement, PISA can justify just about any rank order of countries.  But this isn’t measurement as that word is normally understood.


This laissez faire attitude wasn’t always the case in psychology.  It is clear that, as far back as 1905, psychologists like Titchener recognised that his discipline would have to embrace the established definition of measurement in the natural sciences: “When we measure in any department of natural science, we compare a given measurement with some conventional unit of the same kind, and determine how many times the unit is contained in the magnitude” (Titchener, 1905, p. xix).  Michell (1999) makes a compelling case that psychology adopted Stevens’ ultimately meaningless definition of measurement – “according to Stevens’ definition, every psychological attribute is measurable” (Michell, 1999, p. 19) – because they feared that their discipline would be dismissed by the “hard” sciences without the twin notions of quantity and measurement.


The historical record shows that the profession of psychology derived economic and other social advantages from employing the rhetoric of measurement in promoting its services and that the science of psychology, likewise, benefited from supporting the profession in this by endorsing the measurability thesis and Stevens’ definition.  These endorsements happened despite the fact that the issue of the measurability of psychological attributes was rarely investigated scientifically and never resolved. (Mitchell, 1999, p. 192)


The mathematical symbolism in the next paragraph makes clear the contrast between the complete absence of rigorous measurement criteria in psychology and the onerous demands placed on the classical physicist.



To merit the label “quantity” in Newtonian physics, Hölder’s seven axioms must all be satisfied.  Hölder’s axioms are as follows:


  1. magnitude pairs, a and b, of Q, one and only one of the following is true:

(i).        a = b and b = a

(ii).       a > b and b < a

(iii).      b > a and a < b


  1. magnitudes a of Q, $ some b in Q such that b < a.


  1. magnitude pairs, a and b, in Q, $ c in Q such that a + b = c.


  1. magnitude pairs, a and b, in Q, a + b > a and a + b > b.


  1. magnitude pairs, a and b, in Q, if a < b, $ magnitudes, c and d, in Q, such that a + c = b and d + a = b.


  1. magnitude triplets, a, b and c, in Q, (a + b) + c = a + (b + c).


  1. pairs of classes, f and y, of magnitudes of Q, such that

(i)         each magnitude of Q belongs to one and only one of f and y

(ii)        neither f nor y are empty, and

(iii)       every magnitude in f is less than each magnitude in y,

$ a magnitude x in Q such that for every other magnitude, x’, in Q, if x’ < x, then x’ Î f and if x’ > x, then x’ Î y (depending on the particular case, x may belong to either class).


An essential step in establishing the validity of the concepts “quantity” and “measurement” in item response theory is an empirical analysis centred on Hölder’s conditions.  The reader will search in vain for evidence that quantitative psychologists in general, and item response theorists in particular, subject the predicate “ability” to Hölder’s conditions.

This is because the definition of measurement in psychology is so vague that it frees psychologists of any need to address Hölder’s conditions and permits them, without further ado, to simply accept that the predicates they purport to measure are quantifiable.


Quantitative psychology presumed that the psychological attributes which they aspired to measure were quantitative. … Quantitative attributes are attributes having a quite specific structure.  The issue of whether psychological attributes have that sort of structure is an empirical issue … Despite this, mainstream quantitative psychologists … not only neglected to investigate this issue, they presumed that psychological attributes are quantitative, as if no empirical issue were at stake.  This way of doing quantitative psychology, begun by its founder, Gustav Theodor Fechner, was followed almost universally throughout the discipline and still dominates it. … [I]t involved a defective definition of a fundamental methodological concept, that of measurement. … Its understanding of the concept of measurement is clearly mistaken because it ignores the fact that only quantitative attributes are measurable.  Because this … has persisted within psychology now for more than half a century, this tissue of errors is of special interest. (Michell, 1999, pp. xi – xii)


This essay has sought to challenge the Department of Education’s claim that in founding its methodology on item response theory, PISA is using the best available methodology to rank order countries according to their education provision.  As Sir Peter Medawar makes clear, any methodology which claims to capture the quality of a country’s entire education system in a single number is bound to be suspect.  If my analysis is correct PISA is engaged in rank-ordering countries according to the mathematical achievements of their young people, using a methodology which itself has little or no mathematical merit.


Item response theorists have identified two broad interpretations of probability in their models: the “stochastic subject” and “repeated sampling” interpretations.  Lord has demonstrated that the former leads to absurd and paradoxical results without ever investigating why this should be the case.  Had such an investigation been initiated, quantitative psychologists would have been confronted with the profound question of the very role probability plays in psychological measurement.  Following a pattern of behaviour all too familiar from Michell’s writings, psychologists simply buried their heads in the sand and, at Lord’s urging, set the stochastic subject interpretation aside and emphasised the repeated sampling approach.


In this way the constitutive nature of irreducible uncertainty in psychology was eschewed for the objectivity of Newtonian physics.  This is reflected in item response theory’s “local hidden variables” ensemble model in which ability is an intrinsic measurement-independent property of the individual and measurement is construed as a process of merely checking up on what pre-exists measurement.  For this to be justified, Hölder’s seven axioms must apply.


In order to justify the labels “quantity” and “measurement” PISA must produce the relevant empirical evidence against the Hölder axioms.  Absent such evidence, it seems very difficult to justify the Department of Education’s claims that (i) “the OECD is at the forefront of the academic debate regarding item response theory,” and (ii) “the OECD is using what is acknowledged as the best available methodology [for international comparison studies].”







Boring, E.G. (1929).  A history of experimental psychology.  New York: Century.

Borsboom, D., Mellenbergh, G.J., & van Heerden, J. (2003).  The theoretical status of latent variables.  Psychological Review, 110(2), 203-219.

Fechner, G.T. (1860).  Elemente der psychophysik.  Leipzig: Breitkopf & Hartel.  (English translation by H.E. Adler, Elements of Psychophysics, vol. 1, D.H. Howes & E.G. Boring (Eds.).  New York: Holt, Rinehart & Winston.)

Gardner, H. (2005).  Scientific psychology: Should we bury it or praise it?  In R.J. Sternberg (Ed.), Unity in psychology (pp. 77-90).  Washington DC: American Psychological Association.

Lord, F.M. (1980).  Applications of item response theory to practical testing problems.  Hilldale, NJ.: Lawrence Erlbaum Associates, Publishers.

Medawar, P.B. (1982).  Pluto’s republic.  Oxford University Press.

Michell, J. (1990).  An introduction to the logic of psychological measurement.  Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.

Michell, J. (1997).  Quantitative science and the definition of measurement in psychology.  British Journal of Psychology, 88, 353-385.

Michell, J. (1999).  Measurement in psychology: A critical history of a methodological concept.  Cambridge: Cambridge University Press.

Michell, J. (2000).  Normal science, pathological science and psychometrics.  Theory and Psychology, 10, 639-667.

Michell, J. (2008).  Is psychometrics pathological science? Measurement: Interdisciplinary Research and Perspectives, 6, 7-24.

Oppenheimer, R. (1956).  Analogy in science.  The American Psychologist, 11, 127-135.

Philip, J.R. (1974).  Fifty years progress in soil physics.  Geoderma, 12, 265-280.

Richardson, K. (1999).  The making of intelligence.  London: Weidenfeld & Nicolson.

Ross, S. (1964).  Logical foundations of psychological measurement.  Copenhagen: Munksgaard.

Rozeboom, W.W. (1966).  Scaling theory and the nature of measurement.  Synthese, 16, 170-223.

Stevens, S.S. (1946).  On the theory of scales of measurement.  Science, 103, 667-680.

Stevens, S.S. (1958).  Measurement and man.  Science, 127, 383-389.

Titchener, E.B. (1905).  Experimental psychology: A manual of laboratory practice, vol. 2.  London: Macmillan.

Wittgenstein, L. (1953).  Philosophical Investigations.  Oxford: Blackwell.


The Growth Mindset : Telling Penguins to Flap Harder ?

 We should be rather cautious about adopting the “Growth Mindset” approach as some sort of universal principle. 

Disappointed Idealist

I’m not sure whether this particular blog might lose me friends. It’s not intended to, but I’m going to stumble into an area where I know some people have very strong views. It was prompted by a post-parents’ evening trawl through some blogs, and I came across this blog by Dylan Wiliam :

I’m generally a fan of Dylan Wiliam, although I once tried to joke with him on Twitter, and I’m not sure my humour survived the transition to 140 characters. If I made any impression, it was almost certainly a bad one. Oh well. In any case, it’s not actually his blog on feedback which is at issue here – it’s a good piece, and I agree with the central message about marking/feedback. The bit I want to write about is this :

“Students must understand that they are not born with talent (or lack of it) and…

View original post 4,773 more words

Warning to parents prior to N. Ireland Assembly Election


, , , , , , , , , , , , , ,

FosterThe DUP’s Educational Incoherence


At the same time as the DUP has committed itself to a “No Child Left Behind” policy, Peter Weir (Chair of the Education Committee) suggested that the Party might return the Transfer Test to CCEA control.  Has he forgotten that the current AQE test was written to address shortcomings – such as unacceptable high pupil misclassification rate – in the  old CCEA test?


More worrying for the coherence of DUP education policy is the remarkably high proportion of children on free school meals (FSM) qualifying for grammar school places under the current AQE tests.  ALMOST HALF of AQE entrants eligible for FSM are meeting minimal standards for grammar school entry.  Handing the test back to CCEA would see a dramatic reduction in this number.  In short, returning to a CCEA test would be entirely at odds with a policy of leaving no child behind.

Advice to Parents on AQE & GL Transfer Tests 2015/16


, , , , , , , , , , , , , , , , , , ,

Scan_20160126 (2)

Now that the majority of pupils and parents have the results of the test(s) in hand it is right that there is time taken to acknowledge the effort, celebrate and relax.  If only the media would allow it. Instead the annual circus turns up right on cue. Never let facts get in the way of a good story.

Transfer Test Papers)

T he BBCNI Education correspondent, Robbie Meredith, has prepared a package for today’s local  news on the transfer test results.  He talks about the Education Minister calling for an end to academic selection – that is not news. Sinn Fein Education Ministers have been trying to end the existence of grammar schools for sixteen years   Dr Meredith suggests that non- Catholic grammar schools are mostly controlled – that statement is totally inaccurate and finally he fleetingly mentions the “dualling” schools, ignoring entirely the fact that it is only those schools which require pupils to take multiple tests. Dr Meredith has been informed of the potential misclassification of pupils using the ‘equating’ schemes cited by the “dualling schools” but will not investigate or report on the problem.


A question from the AQE transfer test in 2015

The schools accepting GL Assessment and or AQE test results without accepting responsibility for the pressure their unnecessary demands cause are: Lagan College, Belfast (not a grammar school), Glenlola Collegiate, Bangor; Campbell College, Belfast;  Antrim Grammar, Antrim; Victoria College, Belfast; St Patrick’s Grammar, Downpatrick; Wellington College, Belfast; Hunterhouse College, Belfast.

Source: Belfast Telegraph Transfer Test Guide published January 25, 2016 Page 19

Most politicians would like to see the end of academic selection but will not admit it to you lest they lose your vote, a problem they are evidently incapable of reconciling. Former DUP First Minister Peter Robinson made much of his determination to deliver a single test. He left office defeated by the resolve of parents and a dedicated group of principled individuals who will not allow political expediency to destroy parental choice.

Enjoy the weekend.


First Minister Arlene Foster must make an education choice


, , , ,

The Parental Alliance for Choice in Education welcomes Arlene Foster’s recent statement on education if it is a vow of commitment to her educational vision and not simply a sound bite.


Recent comments from her suggest that she would lead a revolution in education.  It says something about the effects of fifteen years of Sinn Fein misrule that common sense proposals to return to the traditional values that made our education system admired worldwide seem revolutionary.


The promise of positive change at this stage is perhaps necessarily vague.  Given her appreciation for the education system in which she grew up, perhaps we can look forward to concrete proposals for protecting the educational heritage currently under direct threat from John O’Dowd and that she will commit her Party to retaining the long-established parity between Northern Ireland’s public examinations and those in England.


Why didn’t she give a cast iron guarantee to underprivileged children to remove entirely the Revised Curriculum with its “learning-to-learn” philosophy, proven to be damaging to the achievement of children living in poverty?  Why are we continuing to teach these children according to flawed constructivist principles when a longitudinal investigation of the impact of the Enriched Curriculum on disadvantaged children demonstrated that they had fallen significantly behind their peers in traditional classrooms?  Why not just remove a curriculum in which the rich were getting richer and the poor poorer?  Readers may recall that the CCEA-designed curriculum proposed to raise the reading standards of those children deemed to be not “developmentally ready” by delaying the formal teaching of reading by up to two years!  If Arlene Foster were to abandon this ill-conceived curriculum her party could claim – without fear of contradiction – to have removed a significant number of poor children from the “left behind” category.


It currently seems that Arlene Foster doesn’t intend to sweep away John O’Dowd’s legacy. This leaves schools under threat, a curriculum in place which leaves the underprivileged child behind and the standards demanded by CCEA examinations (for the first time ever) perceived to be inferior to those in England, breaking parity.


It follows, therefore, that the most puzzling part in Arlene Foster’s no-child-left-behind policy is its widespread popularity. The First Minister of Scotland proposed precisely this policy one year ago, but that merely involved using standardised tests in Scottish schools to detect underachievement.  This could hardly be presented as an educational revolution?  Curiously, both First Ministers use the words “no child left behind” without attribution.  The education world attributes these words to George Bush’s policy that no child should be “left behind” in a school which isn’t making “adequate yearly progress.”  


Are we to believe that the DUP will advocate the American approach to no child left behind?  There can be little doubt that this would indeed amount to an educational revolution.  But there’s one among many snags facing Ms Foster.  For all its focus on tests, the real emphasis in the American model is teaching.  It is a requirement of the policy that instruction be “research-based”.  That would mean the inevitable abandonment of the Revised Curriculum and a return to traditional teaching in Northern Ireland.

In short, we await further details before deciding if the First Minister’s words are more sound bite than coherent educational vision.

Stephen Elliott

Chair, Parental Alliance for Choice in Education

Deep flaws in the OFMDFM ILiAD Project


, , , , , , , , , , , , , , , , , ,

In a Comment piece in the News Letter of 10 December, I argued that a project designed to investigate the link between deprivation and academic under-achievement was deeply flawed.  OFMDFM, who financed the ILiAD project, didn’t seem to appreciate that the sought-after link had already been investigated in one of the most sophisticated education experiments ever conducted: the USA’s Project Follow Through.

Follow Through


Project Follow Through monitored the academic attainment of 79,000 pupils from 180 low-income communities for 20 years.  It arrived at an unequivocal conclusion: those pupils who were taught by traditional methods consistently reached academic standards approximating to their middle class peers.  This conclusion was replicated by two other highly-regarded bodies.  Progressivist curricula – such as those centred on the pupil’s ability to “learn how to learn” – were demonstrated to damage the attainment of children from disadvantaged backgrounds.  This is important because our Revised Curriculum is just such a curriculum.

lyndon johnson


The lessons from Project Follow Through are clear: abandoning the Revised Curriculum and returning to traditional approaches to teaching and learning would benefit all of our children, but particularly children from poor backgrounds.  In addition, a great deal of money could be saved if we turned our back on notions like Assessment for Learning (where children are required to mark their own work) and “levels of progression” (which no country on the planet uses).  We could invest more money in our teachers if we weren’t funding what Michael Gove dismissively called “the blob.”




I am writing now to report something I discovered after the publication of my  Comment piece.  I began to feel even more uneasy about the ILiAD project when I read a paper by one of the project’s authors: Dr Cathal McManus of the School of Education at Queen’s.  In an article which addressed “Protestant working-class underachievement and unionist hegemony” and published in Irish Studies Review he argues that the ideas of Antonio Gramsci offer a superior theoretical lens through which to view the underachievement of Protestant working-class boys, than the ideas of Pierre Bourdieu.



What is curious is that the ILiAD project use Bourdieu for their theoretical lens.  Why wasn’t Gramsci chosen?  His reasoning reinforces the findings of Project Follow Through.  Could Gramsci’s rejection of curricula like the Revised Curriculum, and enthusiasm for traditional approaches to the classroom, explain the curious choice of the ILiAD team?


Stephen Elliott


Chair, Parental Alliance for Choice in Education