• About

Pace N.Ireland Education Weblog

~ Northern Ireland education analysis

Pace N.Ireland Education Weblog

Tag Archives: Andreas Schleicher

Why OECD Pisa cannot be rescued

04 Sunday Dec 2016

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

Albert Einstein, Andreas Schleicher, British Journal of Mathematical and Statistical Psychology., Christian Bokhove, Complementarity, Diane Ravitch, Dr Hugh Morrison, ETS, Greg Ashman, Item Response Theory, John Jerrim, Matthias von Davier, Measurement of ability, Michael Gove, Michael Oakeshott, Niels Bohr, PIRLS, Pisa 2015, Randy Bennett, Rasch Model, Robbie Meredith, Sean Coughlan, TES, Theresa May, Times Educational Supplement, TIMSS

PISA cannot be rescued by switching IRT model because all IRT modelling is flawed.

Dr Hugh Morrison (The Queen’s University of Belfast [retired])drhmorrison@gmail.com

On page 33 of the Times Educational Supplement of Friday 25th November 2016, Andreas Schleicher, who oversees PISA, appears to accept my analysis of the shortcomings of the Rasch model which plays a central role in PISA’s league table.  The Rasch model is a “one parameter” Item Response Theory (IRT) model, and Schleicher argues that PISA’s conceptual difficulties can be resolved by abandoning the Rasch model for a two or three parameter model.  However, my criticisms apply to all IRT models, irrespective of the number of parameters.  In this essay I will set out the reasoning behind this claim.

 

One can find the source of IRT’s difficulty in Niels Bohr’s 1949 paper entitled Discussion with Einstein on Epistemological Problems in Atomic Physics.  Few scientists have made a greater contribution to the study of measurement than the Nobel Laureate and founding father of quantum theory, Niels Bohr.  Given Bohr’s preoccupation what the scientist can say about aspects of reality that are not visible (electrons, photons, and so on), one can understand his constant references to measurement in psychology.  “Ability” cannot be seen directly; rather, like the microentities that manifest as tracks in particle accelerators, ability manifests in the examinee’s responses to test items.  IRT is concerned with “measuring” something which the measurer cannot experience directly, namely, the ability of the examinee.

 

IRT relies on a simple inner/outer picture for its models to function.  In IRT the inner (a realm of timeless, unobserved latent variables, or abilities) is treated as independent of the outer (here examinees write or speak responses at moments in time).  This is often referred to as a “reservoir” model in which timeless abilities are treated as the source of the responses given at specific moments in time.

 

As early as 1929 Bohr rejected this simplistic thinking in strikingly general terms: “Strictly speaking, the conscious analysis of any concept stands in a relation of exclusion to its immediate application.  The necessity of taking recourse to a complementary … mode of description is perhaps most familiar to us from psychological problems.”  Now what did Bohr mean by these words?  Consider, for example, the concept “quadratic.”  It is tempting to adopt a reservoir approach and trace a pupil’s ability to apply that concept in accord with established mathematical practice to his or her having the formula in mind.  The guidance offered by the formula in mind (Bohr’s reference to “conscious analysis”) accounts for the successful “application,” for example, to the solution of specific items on an algebra test.

 

However, this temptingly simplistic model in which the formula is in the unobserved mental realm and written or spoken applications of the concept “quadratic” take place in the observed realm, contains a fundamental flaw; the two realms cannot be meaningfully connect.  The “inner” formula (in one realm) gets its guidance properties from human practices (in the other realm).  A formula as a thing-in-itself cannot guide; one has to be trained in the established practice of using the formula before it has guidance properties.  In school mathematics examinations around the world, pupils are routinely issued with a page of formulae relevant to the examination.  Alas, it is the experience of mathematics teachers everywhere that simply having access to the formula as a thing-in-itself offers little or no guidance to the inadequately trained pupil.  The formula located in one realm cannot connect with the applications in the other.

 

Wittgenstein teaches that no formula, rule, principle, etc. in itself can ever determine a course of action.  The timeless mathematical formula in isolation cannot generate all the complexities of a practice (something which evolves in time); rather, as Michael Oakeshott puts it, a formula is a mere “abridgement” of the practice – the practice is primary, with the formula, rule, precept etc. deriving its “life” from the practice.

 

Returning to Bohr’s writing, it is instructive to explain his use of the word “complementarity” in respect of psychology and to explain the meaning of the words: “stands in a relation of exclusion.”  Complementarity was the most important concept Bohr bequeathed to physics.  It involves a combination of two mutually exclusive facets.  In order to see its relevance to the validity of IRT modelling, let’s return to the two distinct realms.

 

We think of the answers to a quadratic equation as being right or wrong (a typical school-level quadratic equation has two distinct answers).  In the realm of application this is indeed the case.  When the examinee is measured, his or her response is pronounced right or wrong dependent upon its relation to established mathematical practice.  However, in the unobserved realm, populated by rules, formulae and precepts (as things-in-themselves), any answer to a quadratic equation is simultaneously right and wrong!

 

A formula as a thing-in-itself cannot separate what accords with it from what conflicts with it, because there will always exist an interpretation of the formula for which a particular answer is correct, and another interpretation for which the same answer can be shown to conflict with the formula.  Divorced from human practices, the distinction between right and wrong collapses.  (This is a direct consequence of Wittgenstein celebrated “private language” argument.)  This explains Bohr’s reference to a “relation of exclusion.”  In simplistic terms, the unobserved realm, in which answers are compared with the formula for solving quadratics, responses are right-and-wrong, while in the observed realm, where answers are compared with the established practice, responses are right-or-wrong.

 

On this reading, ability has two mutually exclusive facets which cannot meaningfully be separated.  The distinguished Wittgenstein scholar, Peter Hacker, captures this situation as follows: “grasping an explanation of meaning and knowing how to use the word explained are not two independent abilities but two facets of one and the same ability.”  Ability, construed according to Bohr’s complementarity, is indefinite when unobserved and definite when observed.  Moreover, this definite measure is not an intrinsic property of the examinee, but a property of the examinee’s interaction with the measuring tool.

 

Measurement of ability is not a matter of passively checking up on what already exists – a central tenet of IRT.  Bohr teaches that the measurer effects a radical change from indefinite to definite.  Pace IRT, measurers, in effect, participate in what is measured.  No item response model can accommodate the “jump” from indefinite to definite occasioned by the measurement process.  All IRT models mistakenly treat unmeasured ability as identical to measured ability.  What scientific evidence could possibly be adduced in support of that claim?  No IRT model can represent ability’s two facets because all IRT models report ability as a single real number, construed as an intrinsic property of the measured individual.

 

 

 

Rate this:

Northern Ireland Education Minister wrong on Pisa evidence

21 Monday Dec 2015

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

Andreas Schleicher, AQE, Belfast Newsletter, Danny Kennedy MLA, Danny Kinahan MP, Department of Education Northern Ireland, dodgy dossiers, GL Assessment, John O'Dowd, Martin McGuinness MP, Mervyn Storey MLA, Michelle McIlveen MLA, NewsLetter, Northern Ireland Assembly, OECD Pisa, Parental Alliance for Choice in Education, Peter Wier MLA, PISA, private members business, Professor Svend Kreiner, Sammy Wilson MP

 

 

Scan_20151107 (2)

A challenge was made to John O’Dowd, Northern Ireland’s education minister to refute his error borne out of  reliance on so-called international evidence provided by OECD Pisa data. The Minister has failed to respond. The minister is wrong and remains so.

 

The letter  above was published in the Belfast Newsletter on Friday, November 6th, 2015.

When Sinn Fein education minister John O’Dowd deliberately used the term “dodgy dossier” in respect of transfer testing during Private Members Business in the Assembly on Tuesday, he reversed the truth.

The minister cited international evidence, based on Pisa scores, that selective education fails children.

Astoundingly not one of the unionist politicians present challenged the minister on the facts.

In a peer-reviewed analysis of that evidence, Professor Svend Kreiner wrote of OECD Pisa:

“Most people don’t know that half of the students taking part in the research do not respond to any reading items at all. Despite that, Pisa assigns reading scores to these children.”

In short, Pisa admit that they don’t measure curriculur content or attainment.

Therefore they cannot make an assessment on selective education systems.

Do the politicians who failed to tackle Mr O’Dowd or those schools participating in OECD Pisa not understand that half of the children in the minister’s research were assigned scores for tests they didn’t even sit?

Does anyone in Northern Ireland know of any pupil receiving an AQE or GL Assessment score without taking a test?

With children about to sit the first transfer test tomorrow, it is a pity that those assigned with opposition to the minister’s ideological campaign agaist selection did not challenge him on Tuesday.

If those politicians and their advisors won’t apologise for wrongly traducing the current transfer system, Mr O’Dowd should, on their collective behalf, make clear that it was he who was quoting from a dodgy dossier.

Stephen Elliott,

Parental Alliance for Choice in Education,  Antrim

Rate this:

Mr Gove’s major problem: Why Pisa ranks are wrong.

02 Sunday Feb 2014

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

academic debate regarding item response theory, American Psychological Association, Andreas Schleicher, Boring, Borsboom, David Spiegelhalter, Department of Education England, Dr Hugh Morrison, E.B., E.G., Fechner, Frederick Lord, G.T., Gardner, H. (2005). Scientific psychology: Should we bury it or praise it?, Hölder’s seven axioms, Issac Newton, Item Response Theory, J. (2000). Normal science, J.R. (1974). Fifty years progress in soil physics., Joel Michell, ludwig Wittgenstein, Mellenbergh, methodological thought disorder, Michael Gove, Michell, OECD, Oppenheimer, pathological science and psychometrics., Philip, PISA, Quantity and measurement, R. (1956). Analogy in science. The American Psychologist, Robert Oppenheimer, Ross, S. (1964). Logical foundations of psychological measurement., S.S. Stevens, Sir Peter Medawar, Svend Kriener, Titchener, Van Heerden

Why PISA ranks are founded on a methodological thought disorder

When psychometricians claimed to be able to measure, they used the term ‘measurement’ not just for political reasons but also for commercial ones. … Those who support scientific research economically, socially and politically have a manifest interest in knowing that the scientists they support work to advance science, not subvert it. And those whose lives are affected by the application of what are claimed to be ‘scientific findings’ also have an interest in knowing that these ‘findings’ have been seriously investigated and are supported by evidence. (Michell, 2000, p. 660)

This essay is a response to the claim by the Department of Education that: “The OECD is at the forefront of the academic debate regarding item response theory [and] the OECD is using what is acknowledged as the best available methodology [for international comparison studies].”

Item Response Theory plays a pivotal role in the methodology of the PISA international league table. This essay refutes the claim that item response theory is a settled, well-reasoned approach to educational measurement. It may well be settled amongst quantitative psychologists, but I doubt if there is a natural scientist on the planet who would accept that one can measure mental attributes in a manner which is independent of the measuring instrument (a central claim of item response theory). It will be argued below that psychology’s approach to the twin notions of “quantity” and “measurement” has been controversial (and entirely erroneous) since its earliest days. It will be claimed that the item response methodolology, in effect, misuses the two fundamental concepts of quantity and measurement by re-defining them for its own purposes. In fact, the case will be made that PISA ranks are founded on a “methodological thought disorder” (Michell, 1997).

Given the concerns of such a distinguished statistician as Professor David Spiegelhalter, the Department of Education’s continued endorsement of PISA is difficult to understand. This essay extends the critique of PISA and item response theory beyond the concerns of Spiegelhalter to the very data from which the statistics are generated. Frederick Lord (1980, p. 227-228), the father of modern psychological measurement, warned psychologists that when applied to the individual test-taker, item response theory produces “absurd” and “paradoxical” results. Given that Lord is one of the architects of item response theory, it is surprising that this admission provoked little or no debate among quantitative psychologists. Are politicians and the general public aware that item response theory breaks down when applied to the individual?

In order to protect the item response model from damaging criticism, Lord proposed what physicists call a “hidden variables” ensemble model when interpreting the role probability plays in item response theory. As a consequence item response models are deterministic and draw on Newtonian measurement principles. “Ability” is construed as a measurement-independent “state” of the individual which is the source of the responses made to test items (Borsboom, Mellenbergh, & van Heerden, 2003). Furthermore, item response theory is incapable of taking account of the fact that the psychologist participates in what he or she observe. Richardson (1999) writes: “[W]e find that the IQ-testing movement is not merely describing properties of people: rather, the IQ test has largely created them” (p. 40). The participative nature of psychological enquiry renders the objective Newtonian model inappropriate for psychological measurement. This prompted Robert Oppenheimer, in his address to the American Psychological Association, to caution: [I]t seems to me that the worst of all possible misunderstandings would be that psychology be influenced to model itself after a physics which is not there anymore, which has been quite outdated.”

Unlike psychology, Newtonian measurement has very precise definitions of “quantity” and “measurement” which item response theorists simply ignore. This can have only one interpretation, namely, that the numerals PISA attaches to the education systems of countries aren’t quantities, and that PISA doesn’t therefore “measure” anything, in the everyday sense of that word. I have argued elsewhere that item response theory can escape these criticisms by adopting a quantum theoretical model (in which the notions of “quantity” and “measurement” lose much of their classical transparency). However, that would involve rejecting one of the central tenets of item response theory, namely, the independence of what is measured from the measuring instrument. Item response theory has no route out of its conceptual difficulties.

This represents a conundrum for the Department of Education. In endorsing PISA, the Department is, in effect, supporting a methodology designed to identify shortcomings in the mathematical attainment of pupils, when that methodology itself has serious mathematical shortcomings.

Modern item response theory is founded on a definition of measurement promulgated by Stanley Stevens and addressed in detail below. By this means, Stevens (1958, p. 384) simply pronounced psychology a quantitative science which supported measurement, ignoring established practice elsewhere in the natural sciences. Psychology refused to confront Kant’s view that psychology couldn’t be a science because mental predicates couldn’t be quantified. Wittgenstein’s (1953, p. 232) scathing critique had no impact on quantitative psychology: “The confusion and barrenness of psychology is not to be explained by calling it a “young science”; its state is not comparable with that of physics, for instance, in its beginnings. … For in psychology there are experimental methods and conceptual confusion. … The existence of the experimental method makes us think we have the means of solving the problems which trouble us; though problem and method pass one another by.”

Howard Gardner (2005, p. 86), the prominent Harvard psychologist looks back in despair to the father of psychology itself, William James:

On his better days William James was a determined optimist, but he harboured his doubts about psychology. He once declared, “There is no such thing as a science of psychology,” and added “the whole present generation (of psychologists) is predestined to become unreadable old medieval lumber, as soon as the first genuine insights are made.” I have indicated my belief that, a century later, James’s less optimistic vision has materialised and that it may be time to bury scientific psychology, at least as a single coherent undertaking.

I will demonstrate in a follow-up paper to this essay, an alternative approach which solves the measurement problem as Stevens presents it, but in a manner which is perfectly in accord with contemporary thinking in the natural sciences. None of the seemingly intractable problems which attend item response theory trouble my account of measurement in psychology.

However, my solution renders item response theory conceptually incoherent.

In passing it should be noted that some have sought to conflate my analysis with that of Svend Kreiner, suggesting that my concerns would be assuaged if only PISA could design items which measured equally from country to country. Nothing could be further from the truth; no adjustment in item properties can repair PISA or item response theory. No modification of the item response model would address its conceptual difficulties.

The essay draws heavily on the research of Joel Michell (1990, 1997, 1999, 2000, 2008) who has catalogued, with great care, the troubled history of the twin notions of quantity and measurement in psychology. The following extracts from his writings, in which he accuses quantitative psychologists of subverting science, counter the assertion that item response theory is an appropriate methodology for international comparisons of school systems.

From the early 1900s psychologists have attempted to establish their discipline as a quantitative science. In proposing quantitative theories they adopted their own special definition of measurement and treated the measurement of attributes such as cognitive abilities, personality traits and sensory intensities as though they were quantities of the type encountered in the natural sciences. Alas, Michell (1997) presents a carefully reasoned argument that psychological attributes lack additivity and therefore cannot be quantities in the same way as the attributes of Newtonian physics. Consequently he concludes: “These observations confirm that psychology, as a discipline, has its own definition of measurement, a definition quite unlike the traditional concept used in the physical sciences” (p. 360).

Boring (1929) points out that the pioneers of psychology quickly came to realise that if psychology was not a quantitative discipline which facilitated measurement, psychologists could not adopt the epithet “scientist” for “there would … have been little of the breath of science in the experimental body, for we hardly recognise a subject as scientific if measurement is not one of its tools” (Michell, 1990, p. 7).

The general definition of measurement accepted by most quantitative psychologists is that formulated by Stevens (1946) which states: “Measurement is the assignment of numerals to objects or events according to rules” (Michell, 1997, p. 360). It seems that psychologists assign numbers to attributes according to some pre-determined rule and do not consider the necessity of justifying the measurement procedures used so long as the rule is followed. This rather vague definition distances measurement in psychology from measurement in the natural sciences. Its near universal acceptance within psychology and the reluctance of psychologists to confirm (via. empirical study) the quantitative character of their attributes casts a shadow over all quantitative work in psychology. Michell (1997, p. 361) sees far-reaching implications for psychology:

If a quantitative scientist (i) believes that measurement consists entirely in making numerical assignments to things according to some rule and (ii) ignores the fact that the measurability of an attribute presumes the contingent … hypothesis that the relevant attribute possesses an additive structure, then that scientist would be predisposed to believe that the invention of appropriate numerical assignment procedures alone produces scientific measurement.

Historically, Fechner (1860) – who coined the word “psychophysics” – is recognised as the father of quantitative psychology. He considered that the only creditworthy contribution psychology could make to science was through quantitative approaches and he believed that reality was “fundamentally quantitative.” His work focused on the instrumental procedures of measurement and dismissed any requirement to clarify the quantitative nature of the attribute under consideration.

His understanding of the logic of measurement was fundamentally flawed in that he merely presumed (under some Pythagorean imperative) that his psychological attributes were quantities. Michell (1997) contends that although occasional criticisms were levied against quantitative measurement in psychology, in general the approach was not questioned and became part of the methodology of the discipline. Psychologists simply assumed that when the study of an attribute generated numbers, that attribute was being measured.

The first official detailed investigation of the validity of psychological measurement from beyond its professional ranks was conducted – under the auspices of the British Association for the Advancement of Science – by the Ferguson Committee in 1932. The non-psychologists on the committee concluded that there was no evidence to suggest that psychological methods measured anything, as the additivity of psychological attributes had not been demonstrated. Psychology moved to protect its place in the academy at all costs. Rather than admitting the error identified by the committee and going back to the drawing board, psychologists sought to defend their modus operandi by attempting a redefinition of psychological measurement. Stevens’ (1958, p. 384) definition that measurement involved “attaching numbers to things” legitimised the measurement practices of psychologists who subsequently were freed from the need to test the quantitative structure of psychological predicates.

Michell (1997, p. 356) declares that presently many psychological researchers are “ignorant with respect to the methods they use.” This ignorance permeates the logic of their methodological practices in terms of their understanding of the rationale behind the measurement techniques used. The immutable outcome of this new approach to measurement within psychology is that the natural sciences and psychology have quite different definitions of measurement.

Michell (1997, p. 374) believes that psychology’s failure to face facts constitutes a “methodological thought disorder” which he defines as “the sustained failure to see things as they are under conditions where the relevant facts are evident.” He points to the influence of an ideological support structure within the discipline which serves to maintain this idiosyncratic approach to measurement. He asserts that in the light of commonly available evidence, interested empirical psychologists recognise that “Stevens’ definition of measurement is nonsense and the neglect of quantitative structure a serious omission” (Michell, 1997, p. 376).

Despite the writings of Ross (1964) and Rozeboom (1966), for example, Stevens’ definition has been generally accepted as it facilitates psychological measurement by an easily attainable route. Michell (1997, p. 395) describes psychology’s approach to measurement as “at best speculation and, at worst, a pretence at science.”

[W]e are dealing with a case of thought disorder, rather than one of simple ignorance or error and, in this instance, these states are sustained systemically by the almost universal adherence to Stevens’ definition and the almost total neglect of any other in the relevant methodology textbooks and courses offered to students. The conclusion that follows from this history, especially that of the last five decades, is that systemic structures within psychology prevent the vast majority of quantitative psychologists from seeing the true nature of scientific measurement, in particular the empirical conditions necessary for measurement. As a consequence, number-generating procedures are consistently thought of as measurement procedures in the absence of any evidence that the relevant psychological attributes are quantitative. Hence, within modern psychology a situation exists which is accurately described as systemically sustained methodological thought disorder. (Michell, 1997, p. 376)

To make my case, let me first make two fundamental points which should shock those who believe that the OECD is using what is acknowledged as the best available methodology for international comparisons. Both of these points should concern the general public and those who support the OECD’s work. First, the numerals that PISA publishes are not quantities, and second, PISA tables do not measure anything.

To illustrate the degree of freedom afforded to psychological “measurement” by Stevens it is instructive to focus on the numerals in the PISA table. Could any reasonable person believe in a methodology which claims to summarise the educational system of the United States or China in a single number? Where is the empirical evidence for this claim? Three numbers are required to specify even the position of a single dot produced by a pencil on one line of one page of one of the notebooks in the schoolbag of one of the thousands of American children tested by PISA. The Nobel Laureate, Sir Peter Medawar refers to such claims as “unnatural science.” Medawar (1982, p. 10) questions such representations using Philip’s (1974) work on the physics of a particle of soil:

The physical properties and field behaviour of soil depends on particle size and shape, porosity, hydrogen iron concentration, material flora, and water content and hygroscopy. No single figure can embody itself in a constellation of values of all these variables in any single real instance … psychologists would nevertheless like us to believe that such considerations as these do not apply to them.

Quantitative psychology, since its inception, has modelled itself on the certainty and objectivity of Newtonian mechanics. The numerals of the PISA tables appear to the man or woman in the street to have all the precision of measurements of length or weight in classical physics. But, by Newtonian standards, psychological measurement in general, and item response theory in particular, simply have no quantities, and do not “measure,” as that word is normally understood.

How can this audacious claim to “measure” the quality of a continent’s education provision and report it in a single number be justified? The answer, as has already been pointed out, is to be found in the fact that quantitative psychology has its own unique definition of measurement, which is that “measurement is the business of pinning numbers on things” (Stevens, 1958, p. 384). With such an all-encompassing definition of measurement, PISA can justify just about any rank order of countries. But this isn’t measurement as that word is normally understood.

This laissez faire attitude wasn’t always the case in psychology. It is clear that, as far back as 1905, psychologists like Titchener recognised that his discipline would have to embrace the established definition of measurement in the natural sciences: “When we measure in any department of natural science, we compare a given measurement with some conventional unit of the same kind, and determine how many times the unit is contained in the magnitude” (Titchener, 1905, p. xix). Michell (1999) makes a compelling case that psychology adopted Stevens’ ultimately meaningless definition of measurement – “according to Stevens’ definition, every psychological attribute is measurable” (Michell, 1999, p. 19) – because they feared that their discipline would be dismissed by the “hard” sciences without the twin notions of quantity and measurement.

The historical record shows that the profession of psychology derived economic and other social advantages from employing the rhetoric of measurement in promoting its services and that the science of psychology, likewise, benefited from supporting the profession in this by endorsing the measurability thesis and Stevens’ definition. These endorsements happened despite the fact that the issue of the measurability of psychological attributes was rarely investigated scientifically and never resolved. (Mitchell, 1999, p. 192)

The mathematical symbolism in the next paragraph makes clear the contrast between the complete absence of rigorous measurement criteria in psychology and the onerous demands placed on the classical physicist.

Holder's Axioms

An essential step in establishing the validity of the concepts “quantity” and “measurement” in item response theory is an empirical analysis centred on Hölder’s conditions. The reader will search in vain for evidence that quantitative psychologists in general, and item response theorists in particular, subject the predicate “ability” to Hölder’s conditions.

This is because the definition of measurement in psychology is so vague that it frees psychologists of any need to address Hölder’s conditions and permits them, without further ado, to simply accept that the predicates they purport to measure are quantifiable.

Quantitative psychology presumed that the psychological attributes which they aspired to measure were quantitative. … Quantitative attributes are attributes having a quite specific structure. The issue of whether psychological attributes have that sort of structure is an empirical issue … Despite this, mainstream quantitative psychologists … not only neglected to investigate this issue, they presumed that psychological attributes are quantitative, as if no empirical issue were at stake. This way of doing quantitative psychology, begun by its founder, Gustav Theodor Fechner, was followed almost universally throughout the discipline and still dominates it. … [I]t involved a defective definition of a fundamental methodological concept, that of measurement. … Its understanding of the concept of measurement is clearly mistaken because it ignores the fact that only quantitative attributes are measurable. Because this … has persisted within psychology now for more than half a century, this tissue of errors is of special interest. (Michell, 1999, pp. xi – xii)

This essay has sought to challenge the Department of Education’s claim that in founding its methodology on item response theory, PISA is using the best available methodology to rank order countries according to their education provision. As Sir Peter Medawar makes clear, any methodology which claims to capture the quality of a country’s entire education system in a single number is bound to be suspect. If my analysis is correct PISA is engaged in rank-ordering countries according to the mathematical achievements of their young people, using a methodology which itself has little or no mathematical merit.

Item response theorists have identified two broad interpretations of probability in their models: the “stochastic subject” and “repeated sampling” interpretations. Lord has demonstrated that the former leads to absurd and paradoxical results without ever investigating why this should be the case. Had such an investigation been initiated, quantitative psychologists would have been confronted with the profound question of the very role probability plays in psychological measurement. Following a pattern of behaviour all too familiar from Michell’s writings, psychologists simply buried their heads in the sand and, at Lord’s urging, set the stochastic subject interpretation aside and emphasised the repeated sampling approach.

In this way the constitutive nature of irreducible uncertainty in psychology was eschewed for the objectivity of Newtonian physics. This is reflected in item response theory’s “local hidden variables” ensemble model in which ability is an intrinsic measurement-independent property of the individual and measurement is construed as a process of merely checking up on what pre-exists measurement. For this to be justified, Hölder’s seven axioms must apply.

In order to justify the labels “quantity” and “measurement” PISA must produce the relevant empirical evidence against the Hölder axioms. Absent such evidence, it seems very difficult to justify the Department of Education’s claims that (i) “the OECD is at the forefront of the academic debate regarding item response theory,” and (ii) “the OECD is using what is acknowledged as the best available methodology [for international comparison studies].”

Dr Hugh Morrison

(drhmorrison@gmail.com)

References

Boring, E.G. (1929). A history of experimental psychology. New York: Century.

Borsboom, D., Mellenbergh, G.J., & van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110(2), 203-219.

Fechner, G.T. (1860). Elemente der psychophysik. Leipzig: Breitkopf & Hartel. (English translation by H.E. Adler, Elements of Psychophysics, vol. 1, D.H. Howes & E.G. Boring (Eds.). New York: Holt, Rinehart & Winston.)

Gardner, H. (2005). Scientific psychology: Should we bury it or praise it? In R.J. Sternberg (Ed.), Unity in psychology (pp. 77-90). Washington DC: American Psychological Association.

Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hilldale, NJ.: Lawrence Erlbaum Associates, Publishers.

Medawar, P.B. (1982). Pluto’s republic. Oxford University Press.

Michell, J. (1990). An introduction to the logic of psychological measurement. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.

Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88, 353-385.

Michell, J. (1999). Measurement in psychology: A critical history of a methodological concept. Cambridge: Cambridge University Press.

Michell, J. (2000). Normal science, pathological science and psychometrics. Theory and Psychology, 10, 639-667.

Michell, J. (2008). Is psychometrics pathological science? Measurement: Interdisciplinary Research and Perspectives, 6, 7-24.

Oppenheimer, R. (1956). Analogy in science. The American Psychologist, 11, 127-135.

Philip, J.R. (1974). Fifty years progress in soil physics. Geoderma, 12, 265-280.

Richardson, K. (1999). The making of intelligence. London: Weidenfeld & Nicolson.

Ross, S. (1964). Logical foundations of psychological measurement. Copenhagen: Munksgaard.

Rozeboom, W.W. (1966). Scaling theory and the nature of measurement. Synthese, 16, 170-223.

Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103, 667-680.

Stevens, S.S. (1958). Measurement and man. Science, 127, 383-389.

Titchener, E.B. (1905). Experimental psychology: A manual of laboratory practice, vol. 2. London: Macmillan.

Wittgenstein, L. (1953). Philosophical Investigations. Oxford: Blackwell.

Rate this:

Andreas Schleicher welcomes Michael Gove and Michael Wilshaw to the Blob

28 Tuesday Jan 2014

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

Andreas Schleicher, Ernst Von Glaserfeld, Michael Gove, Michael Oakeshott, Michael Polanyi, OECD Pisa-progressivism, Sian Williams, Sir Michael Wilshaw, Sir Peter Medawar, The Blob, The Sunday Times

Ma href=”https://paceni.files.wordpress.com/2014/01/govespectator.jpg”>GoveSpectator

In The Sunday Times of 26.01.14 Sir Michael Wilshaw reacts to accusations that his inspectorate wear progressivist goggles when judging the efficacy of teaching in the Conservative party’s flagship “Free Schools.” (If school inspection in England is anything like that in Northern Ireland, these accusations hold more than a grain of truth.) Sir Michael refutes the charge that “the Blob” (Mr Gove’s dismissive term for left-wing constructivist/progressivist learning theories) continues to influence the judgement of too many OFSTED inspectors. It is revealing, however, that Sir Michael’s defence uses vocabulary dear to every progressivist: he deplores the fact that children are “lectured … in serried ranks,” engage in “rote learning” inappropriate for “learners in the 21st century” because the ability to do well in examinations somehow diminishes the child’s capacity for “think[ing] for themselves.”

Sir Michael warns that Mr Gove is in danger of creating an equally damaging right-wing blob which stifles creativity. But this is to miss a central idea in pedagogy: a core purpose of school is not to offer children opportunities for being creative (the progressivist case) but, rather, to enculturate children into a framework (the practices and traditions of the established school disciplines) without which creativity is impossible. This process of enculturation – what Michael Oakeshott called the “inter-generational conversation” – has teacher authority at its heart. Wittgenstein is in the Gove camp: “Any explanation has its foundations in training. (Educators ought to remember this.)” The Education Secretary also has the support of Michael Polanyi who counselled: “No intelligence, however critical or original, can operate outside such a fiduciary framework.”

WilshawOfsted<

In The Sunday Times article, Sir Michael attacks those close to Mr Gove who suggest that the blob continues to hold sway in OFSTED. However, if the Chief Inspector finds himself drawn over the event horizon of education’s very own black hole, Mr Gove has been all but swallowed up. My evidence for this seemingly outrageous claim? Simple: the Education Secretary’s support for OECD/PISA. I have little doubt that the next few years will see a plethora of articles in the education literature interpreting current Conservative education policy as strengthening the case for the blob’s progressivist pedagogy.

Mr Gove is probably unaware that the model of mind which underpins “item response theory” – the statistical model which generates the PISA ranks -accords exactly with Ernst von Glasersfeld’s radical constructivism, when applied to the individual child. Von Glasersfeld sets out the implications of this model of mind: “[A] philosophy of education that believes in teaching right answers is not worth having.” There has to be something ill-conceived in a PISA methodology which claims to capture the education quality of an entire continent in a single number. After all, three numbers are required to locate a simple point in space. Sir Peter Medawar, the Nodel Laureate, characterises such claims as “unnatural science,” by noting that several numbers are required to encapsulate the properties of a particle of soil:

The physical properties and field behaviour of soil depends on particle size and shape, porosity, hydrogen iron concentration, material flora, and water content and hygroscopy. No single figure can embody itself in a constellation of values of all these variables in any single real instance … psychologists would nevertheless like us to believe that such considerations as these do not apply to them.

Schleicherphoto2<

Clear blob-like leanings surface elsewhere in the OECD programme. Michael Gove has a healthy disregard for theory in education, seeing teaching as a practical activity where precedent guides effective pedagogy. This echoes Oakeshott’s view that “The theoretical understanding of some activity is always the child of practical know-how, and never its parent.” The OECD, on the other hand, rejects the idea that an important purpose of school is to provide the child with a framework. Rather, at the core of its educational enterprise is the nebulous notion of “learning how to learn.” The inventor of this approach (which dates back to the 1980s) summarises it as follows: “The learner is the centre. This process of learning represents a revolutionary about-face from the politics of traditional education.” While Mr Gove is intent on limiting the endless fad-producing theorising of educationalists, the OECD sees teacher education as vital to promoting a progressivist agenda:

“Throughout the world educationalists and teacher instructors promote constructivist views about instruction. … If policy seeks to support constructivist positions, a promising strategy might be to enhance the systematic construction of knowledge about teaching and instruction in teachers’ initial education and professional development. Interventions may be especially important for experienced teachers and for those who teach mathematics. … It is therefore a good sign … that professional development is positively associated with constructivist beliefs across countries.”

In conclusion, blame attaches to both protagonists in The Sunday Times article. The Chief Inspector’s recent outburst against grammar schools will no doubt be construed as support for all the blob holds dear. In addition, Mr Gove’s praise for Andreas Schleicher, in effect, makes him one of the blob’s most significant cheerleaders.

Rate this:

The Conservative Party War on Grammar Schools

16 Monday Dec 2013

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

Andreas Schleicher, conservative party, ConservativesNI, Grammar Schools, John O'Dowd, Michael Gove, OECD Reviews of Evaluation and Assessment in Education Northern Ireland, Ofsted, PISA, Sir Michael Wilshaw

GoveSpectator

Yet another of Michael Gove’s poor choice of heroes, Ofsted chief Sir Michael Wilshaw has been pushed forward to launch a war on the parental right to have their children educated according to their philosophical convictions.

The Observer http://www.observer.co.uk front page on Sunday 15th December, 2013 announced;

Ofsted chief declares war on grammar schools

It should be made clear that Michael Wilshaw did not launch the attack alone. He has the support of the Conservative leadership in the form of Michael Gove, Secretary of State for Education in England.

Last week Michael Gove met with the leader of Kent council, Paul Carter and made it clear that he would not approve the expansion of grammar school provision in Kent. Gove claimed he would be “genuinely open” to another application in coming months – after the European elections no doubt.

WilshawOfsted

Sir Michael Wilshaw went even further though and stated unequivocally that he would not support the expansion of grammar schools.

Both men are linked in their support for the OECD Pisa tests used to create international comparisons of education systems. Both men have failed to explain with facts their position given that the Pisa rankings have been shown to be fundamentally flawed and useless.

Spiegelhalter

If Michael Gove and Sir Michael Wilshaw have a rebuttal they should publish their answer and make it available to Dr Hugh Morrison, Professor Svend Kreiner and David Spiegelhalter for comment. Mr Gove should also make contact with Diane Ravitch education policy analyst and former United States Asst Secretary for Education and author of Reign of Error

http://wp.me/p2odLa-6Aw made her views clear http://wp.me/p2odLa-6Aw

OECD Pisa have attempted to link academic selection and the existence of grammar schools to the UK’s “poor rankings”.

http://www.oecd.org/education/school/NorthernIreland_review.pdf

This report, OECD Reviews of Evaluation and Assessment, was commissioned by another of Michael Gove’s dear friends, Sinn Fein Education Minister in Northern Ireland, John O’Dowd.

There is no doubt that the Conservative Party would be delighted to have the thorn of pushy parents and their demand for new grammar school places disappear before the next elections but pushing Sir Michael Wilshaw out to begin their war will result in immediate casualties.

The silence from the Conservative Party in Northern Ireland on the attack on grammar schools will cause problems. Michael Gove must rebut his critics with facts or withdraw his support for Andreas Schleicher and the flawed OECD Pisa rankings.

He must then surrender to the will of parents and voters on the issue of grammar schools otherwise the Conservative Party will pay at the ballot box.

Rate this:

Who are parents to believe on Pisa 2012 Niels Bohr or Andreas Schleicher?

02 Monday Dec 2013

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

Andreas Schleicher, Michael Gove, Niels Bohr and Pisa, Pisa 2012, unambiguous communication

Why Michael Gove should follow India’s lead and
detach himself from PISA

Just ahead of the publication of PISA league table on 3rd December, India has withdrawn from the list of countries which will feature in the tables. The Education Secretary, Michael Gove, on the other hand, seems determined to stick with PISA despite recent concerns – published in the Times Educational Supplement in July of this year – about the global league table.

Mr Gove’s Department reiterated its support for PISA in a recently-aired Radio 4 programme entitled “PISA – Global Education Tables Tested.” That programme illustrated the dangers inherent in critiquing PISA in exclusively statistical terms. Statistical modellers have made life too easy for PISA because they simply accept the PISA interpretation of the construct “ability.” It is only when the focus moves to measurement that the profound difficulties inherent in PISA come to the fore with greatest clarity.

Niels Bohr is ranked with Newton and Einstein as one of greatest physicists of all time. The father of atomic physics taught that “unambiguous communication” is the hallmark of measurement in quantum physics. Importantly, Bohr traced measurement in quantum mechanics and measurement in psychology to a common source, which he referred to as “subject/object holism.”

The physicist cannot have direct experience of the atom, just as the teacher cannot have direct experience of the child’s mind. The microworld manifests itself in the measuring instruments of the physicist just as mind is expressed in the child’s responses to test items. Both the physicist and the psychologist are forced to describe what is beyond direct experience using the language of everyday experience. Bohr demonstrated that measurement in quantum physics and in psychology share a common inescapable constraint, namely, one cannot communicate unambiguously about measurement in either realm without factoring in the measuring instrument. In Heisenberg’s words: “what we observe is not nature in itself but nature exposed to our form of questioning.”

The lesson we learn from Bohr is that in all psychological measurement, the entity measured cannot be divorced from the measuring instrument. When this central tenet of measurement (in quantum physics or in psychology/education) is broken, nonsense always ensues. The so-called Rasch model, which produces the PISA ranks, offends against this central measurement principle and therefore the ranks it generates are meaningless. According to Bohr, the entity measured and the measuring instrument cannot be meaningfully separated. According to PISA, they are entirely independent. Who are we to believe, Niels Bohr or Andreas Schleicher?

The following simple illustration will help make Bohr’s point. Suppose Einstein and a 16 year-old pupil both produce a perfect score on a GCSE mathematics paper. Surely to claim that the pupil has the same mathematical ability as Einstein is to communicate ambiguously? However, unambiguous communication can be restored if we simply take account of the measuring instrument and say, “Einstein and the pupil have the same mathematical ability relative to this particular GCSE paper.” Mathematical ability, indeed any ability, is not an intrinsic property of the individual; rather, it’s a joint property of the individual and the measuring instrument.

In short, ability isn’t a property of the person being measured; it’s a property of the interaction of the person with the measuring instrument. One is concerned with the between rather than the within. It’s hard to imagine a more stark contrast between Bohr’s teachings and the PISA approach to measurement. Critiques of PISA by statistical modellers, however, have missed this profound conceptual error entirely.

My bookshelves are groaning with books concerned with the wide-ranging debates around the notion of intelligence. All of these debates dissolve away when one eschews the twin notions that intelligence is either a property of the person or is an ensemble property, for the simple definition that intelligence is a property of the interaction between person and intelligence test. To say “John has an IQ of 104” is to communicate ambiguously. An ocean of ink has been spilt because intelligence researchers have missed the simple truth that intelligence is not something we have.

In closing, it is only when the PISA critique shifts from statistical modelling to measurement, the profound nature of PISA’s error becomes clear. PISA produces nonsense because it misconstrues entirely the nature of ability. I trust this essay will be a comfort to those who had the courage to remove India from PISA, and hope it will prompt a similar decision from Michael Gove.

Dr Hugh Morrison

Rate this:

Pisa 2012 major flaw exposed

01 Sunday Dec 2013

Posted by paceni in Grammar Schools

≈ 1 Comment

Tags

Andreas Schleicher, David Spiegelhalter, Department of Education, Diane Ravitch, Dr Hugh Morrison, Dr Hugh Morrison Queen's University Belfast, England, ETS, Harvey Goldstein, India leaves Pisa, ludwig Wittgenstein, Michael Davidson OECD PISA, Michael Gove, Niels Bohr and Pisa, Pisa 2012 results, Pisa criticism, Pisa Global Education Tables Tested, Poor performers pull out of Pisa, Professor Svend Kreiner, Programme for International Student Assessment, The London Times, The Times of India, Times Educational Supplement

Tuesday December 3rd 2013 brings the latest OECD PISA results. Before reading a single headline or watching dazzling charts designed to mislead, first read this expose. If you have a reasoned response to Dr Morrison’s essay do not hesitate to get in touch.

Why Michael Gove should follow India’s lead and
detach himself from PISA

Just ahead of the publication of PISA league tables on 3rd December, India has withdrawn from the list of countries which will feature in the tables. The Education Secretary, Michael Gove, on the other hand, seems determined to stick with PISA despite recent concerns about the global league table published in the Times Educational Supplement in July of this year.

Mr Gove’s Department reiterated its support for PISA in a recently aired Radio 4 programme entitled “PISA – Global Education Tables Tested.” That programme illustrated the dangers inherent in critiquing PISA in statistical terms.

Statistical modellers have made life too easy for PISA because they simply accept the PISA interpretation of the construct “ability.” PISA lays claim to measure the relative qualities of education systems around the world, and it is only when the focus moves to measurement that the profound difficulties inherent in Pisa become clear.

Niels Bohr is ranked among the ten greatest physicists of all time. The father of quantum measurement taught that “unambiguous communication” was the hallmark of measurement in physics. Importantly, Bohr traced measurement in quantum mechanics and measurement in psychology to a common source, which he referred to as “subject/object holism.” The physicist cannot have direct experience of the atom, just as the teacher cannot have direct experience of the child’s mind. Both are forced to describe what is beyond direct experience using the language of everyday experience.

Bohr demonstrated that measurement in quantum physics and in psychology share a common inescapable constraint, namely, one cannot communicate unambiguously about measurement in either realm without factoring in the measuring instrument. Wittgenstein’s writings also support this argument.

The lesson we learn from Bohr is that in all psychological measurement, the entity measured cannot be divorced from the measuring instrument. When this central tenet of measurement is broken, nonsense always ensues. The so-called Rasch model, which produces the PISA ranks, offends against this central measurement principle and therefore the ranks it generates are suspect at best.

(The Rasch model is a member of a family of models which all treat what is measured as independent of the measuring tool. Given that these models underpin both computer adaptive testing and the navigation systems of the newly developed MOOCS of higher education, the implications of Bohr’s thinking are clearly far-reaching.)

The following simple illustration will help make Bohr’s point. Suppose Einstein and a GCSE pupil both produce a perfect score on a GCSE paper. Surely to claim that the pupil has the same mathematical ability as Einstein is to communicate ambiguously? However, unambiguous communication can be restored if we simply take account of the measuring instrument and say, “Einstein and the pupil have the same mathematical ability relative to this particular GCSE paper.” Mathematical ability, indeed any ability, is not an intrinsic property of the individual; rather, it’s a joint property of the individual and the measuring instrument.

In short, ability isn’t a property of the person being measured; it’s a property of the interaction of the person with the measuring instrument. One is concerned with the between rather than the within. It’s hard to imagine a more stark contrast. Statistical modelling critiques of PISA, however, have missed this conceptual error entirely.

My bookshelves are groaning with books concerned with the wide-ranging debates around the notion of intelligence. All of these debates dissolve away when one eschews the twin notions that intelligence is either a property of the person or is an ensemble property, for the simple definition that intelligence is a property of the interaction between person and intelligence test. To say “John has an IQ of 104” is to communicate ambiguously. Furthermore, clarification of the nature of measurement in psychology and education has implications for the UK’s approach to school inspection and serves as a challenge for those who reject, out of hand, “teaching to the test.”

In closing,

when the PISA critique shifts from statistical modelling to measurement, the profound nature of PISA’s error becomes clear.

I trust this essay will be a comfort to those responsible for removing India from PISA, and hope it will prompt a similar decision in the UK.

Dr Hugh Morrison

Rate this:

The paper which topples OECD Pisa 2012

28 Thursday Nov 2013

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

Andreas Schleicher, Bohr, D Borsboom, Dr Hugh Morrison, Einstein-Podolsky-Rosin paradox, F M Lord, Georg Rasch, J Mitchell, J S Bell, Michael Davidson OECD PISA, Michael Gove, OECD Pisa, P W Holland, PISA, Programme for International Student Assessment, Psychological Review, Quantum mechanics

A fundamental conundrum in psychology’s standard model of measurement and its consequences for PISA global rankings.

Dr. Hugh Morrison
Formerly The Queen’s University of Belfast
( drhmorrison@gmail.com)

Introduction

This paper is concerned with current approaches to measurement in psychology and their use by organisations like the Organisation for Economic Co-operation and Development (OECD) to hold the education systems of nation states to “global” standards. The OECD’s league table – the Programme for International Student Assessment (PISA) – has the potential to throw a country’s education system into crisis. For example, Ertl (2006) documents the effects of so-called “PISA-shock” in Germany, and Takayama (2008) describes a similar reaction in Japan. Given that a country’s PISA ranking can play a role in decisions concerning foreign direct investment, it is important to confirm that the measurement model which produces the ranks is sound. Moreover, the OECD has already spread its remit beyond the PISA league table to include teacher evaluation through its Teaching and Learning International Survey (TALIS). The OECD is currently developing PISA-like tests to facilitate global comparisons of the education on offer in universities through its Assessment of Higher Education Learning Outcomes (AHELO) programme: “Governments and individuals have never invested more in higher education. No reliable international data exists on the outcomes of learning: the few studies that exist are nationally focused” (Rinne & Ozga, 2013, p. 99). Given the sheer global reach of the OECD project, it is important to investigate the coherence of the measurement model which underpins its data.

At the heart of 21st century approaches to measurement in psychology is the Generalised Linear Item Response Theory (GLIRT) approach (Borsboom, Mellenbergh and Van Heerden, 2003, p. 204) and the OECD uses Item Response Theory (IRT) to generate its PISA ranks. A particular attraction of IRT for the OECD is its claim that estimates of examinee ability are item-independent. This is vital to PISA’s notion of “plausible values” because each examinee only takes a subset of items from the whole “item battery.” Without the Rasch model’s claim to item-independent ability measures, PISA’s assertion that student performance can be reported on common scales, even when these students have taken different subsets of items, would be invalid.

This paper will focus on the particular IRT model used by OECD, the so-called Rasch model, but the arguments generalise to all IRT models. Proponents of the model portray Rasch as closing the gap between psychological measurement and measurement in the physical sciences. Elliot, Murray and Pearson (1978, pp. 25-26) claim that “Rasch ability scores have many similar characteristics to physical measurement” and Wright (1997, p. 44) argues that the arrival of the Rasch model means that “there is no methodical reason why social science cannot become as stable, as reproducible, and hence as useful as physics.” This paper highlights the incoherence of the model.

The Rasch model and its paradox

The Rasch model is defined as follows:

P(X_is=1 ┤| θ_(s,) β_i)= e^((θ_s-β_i))/(1+ e^((θ_s-β_i)) )

X_is is the response (X) made by subject s to item i;

θ_(s )is the trait level of subject s;

β_i is the difficulty of item i; and

X_is=1 indicates a correct response to the item.

On the face of it, the model uses a mathematical function to allow the psychometrician to compute the probability that a randomly selected individual of ability θ will provide the correct response to an item of difficulty β. A particular ability and difficulty value will be chosen for illustration, but the analysis which follows has universal application. When the values θ = 1 and β = 2, for example, are substituted in the Rasch model, a scientific calculator will quickly confirm that the probability that an individual of ability θ = 1 will respond correctly to an item of difficulty β = 2 is given as 0.27 approximately. It follows that if a large sample of individuals, all with this same ability, respond to this item, 27% will give the correct response.

In the Rasch model “the abilities specified in the model are the only factors influencing examinees’ responses to test items” (Hambleton, Swaminathan & Rogers, 1991, p. 10). This results in a paradox. If a large sample of individuals of exactly the same ability respond to the same item, designed to measure that ability, why would 27% get it right and 73% get it wrong? If the item measures ability and the individuals are all of equal ability, then surely the model must indicate that they all get it right, or they all get it wrong?

Does the Rasch model really represent an advance on classical test theory?

The Rasch model is portrayed as a radical advance on what went before – classical test theory (CTT). In classical test theory, “[p]erhaps the most important shortcoming is that examinee characteristics and test characteristics cannot be separated: each can be interpreted only in the context of the other. The examinee characteristic we are interested in is the ‘ability’ measured by the test” (Hambleton, Swaminathan & Rogers, 1991, p. 2).

An examinee’s ability is defined only in terms of a particular test. When the test is “hard,” the examinee will appear to have low ability; when the test is “easy,” the examinee will appear to have higher ability. What do we mean by “hard” and “easy” tests? The difficulty of a test item is defined as ‘the proportion of examinees in a group of interest who answer the item correctly.’ Whether an item is hard or easy depends on the ability of the examinees being measured, and the ability of the examinees depends on whether the items are hard or easy! (Hambleton, Swaminathan & Rogers, 1991, pp. 2-3)

Measures of ability in the Rasch model, on the other hand, are claimed to be completely independent of the items used to measure such abilities. This is vital to the computation of plausible values because no student answers more than a fraction of the totality of PISA items.

A puzzle emerges immediately: if the Rasch model treats as separable what classical test theory treats as profoundly entangled – with Rasch regarded as a significant advance on classical test theory – why does the empirical data not reflect two radically different measurement frameworks? Based on large scale comparisons of item and person statistics, Fan (1998) notes: “These very high correlations indicate that CTT- and IRT-based person ability estimates are very comparable with each other. In other words, regardless of which measurement framework we rely on, the same or very similar conclusions will be drawn regarding the ability levels of individual examinees” (p. 8), and concludes: “the results here would suggest that the Rasch model might not offer any empirical advantage over the much simpler CTT framework” (p. 9). Fan (1998) confirms Thorndike’s (1962, p. 12) pessimism concerning the likely impact of IRT: “For the large bulk of testing, both with locally developed and standardized tests, I doubt that there will be a great deal of change. The items that we select for a test will not be much different, and the resulting tests will have much the same properties.”

In what follows, the case is made that in the Rasch model, just as in Classical Test Theory, ability cannot be separated from the item used to measure it. Rasch’s model is shown to be incoherent and this has clear consequences for the entire OECD project. Moreover, the arguments presented here undermine psychology’s “standard measurement model” (Borsboom, Mellenbergh & van Heerden, 2003) with implications for all IRT models and Structural Equation Modelling.

The Rasch model: early indications of incoherence

The first hints of Rasch’s confusion appear in the early pages of his 1960 treatise which sets out the Rasch model, Probabilistic Models for Some Intelligence and Attainment Tests. Rasch’s lifelong obsession – captured in his closely associated notions of “models of measurement” and “specific objectivity” – with measurement models capable of application to the social and natural sciences can be recognized in his portrayal of the Rasch model. In constructing his model Rasch (1960, p. 10) rejects deterministic Newtonian measurement for the indeterminism of quantum mechanics:

For the construction of the models referred to I shall take recourse to some points of view … of a more general character. Into the system of classical physics enter a number of fundamental laws, e.g. the Newtonian laws. … A characteristic property of these laws is that they are deterministic. … None the less it should not be overlooked that the laws do not give an accurate picture of nature. … In modern physics … the deterministic view has been abandoned. No deterministic description for e.g. radioactive emission seems within reach, but for the description of such irregularities the theory of probability has proved an extremely valuable tool.

Rasch (1960, p. 11) likens the unmeasured individual to a radioactive nuclide about to decay. Quantum mechanics teaches that, unlike Newtonian mechanics, if one had complete information about the nuclide, one still couldn’t predict the moment of decay with accuracy. Indeterminism is a constitutive feature of quantum mechanics: one cannot know, even if one had complete knowledge of the universe, what will happen next to a quantum system. Irreducible uncertainty applies. For Rasch (1960, p. 11): “Where it is a question of human beings and their actions, it appears quite hopeless to construct models which will be useful for purposes of prediction in separate cases. On the contrary, what a human being actually does seems quite haphazard, none less than radioactive emission.” Rasch (1960, p. 11) makes clear his rejection of deterministic Newtonian models: “This way of speaking points to the possibility of mapping upon models of a kind different from those used in classical physics, more like the models in modern physics – models that are indeterministic.”

Quantum indeterminism has implications for Rasch’s “models of measurement.” In quantum mechanics, measurement doesn’t simply produce information about some pre-existing state. Rather, measurement transforms the indeterminate to the determinate. Measurement causes what is indeterminate to take on a determinate value. In the classical model which Rasch rejects, measurement is simply a process of checking up on what pre-existed the act of measurement, while quantum measurement causes the previously indeterminate to take on a definite value. However, latent variable theorists in general, and Rasch in particular, treat “ability” as an intrinsic attribute of the person, and they view measurement as an act of checking up on that attribute.

The early pages of Rasch’s (1960) text raise doubts about his understanding of the central mathematical conceit of his model: probability. One gets the clear impression that Rasch associates probability with indeterminism. But completely determinate situations can involve probability. The outcome of the toss of a coin is completely determined from the moment the coin leaves the thrower’s hand. If one had knowledge of the initial speed of projection, the angle of inclination of the initial motion to the horizontal, the initial angular momentum, the local acceleration of gravity, and so on, one could use Newtonian mechanics to predict the outcome. Probability is invoked because of the coin-thrower’s ignorance of these parameters. Such probabilities are referred to as subjective probabilities.

In modern physics, uncertainty is constitutive and not a consequence of the limitations of human beings or their measuring instruments. Quantum physicists deal in objective probability. Finally, the notion of separability or “specific objectivity” as Rasch labelled it, is absolutely central to his thinking: “Rasch’s demand for specific objective measurement means that the measure of a person’s ability must be independent of which items were used” (Rost, 2001, p. 28). However, quantum mechanics is founded on non-separabilty; one cannot break the conceptual link between what is measured and the measuring instrument. The mathematics of the early pages of Rasch (1960) do not auger well for the mathematical coherence of his model, but it is important to set out the case against the model with greater rigour.

Bohr and Wittgenstein: indeterminism in psychological measurement

A possible source of Rasch’s efforts to find “models of measurement” which would apply equally to both psychometric measurement and measurement in physics was the writings of Rasch’s famous countryman, Niels Bohr. (Indeed, Rasch attended lecture courses in mathematics given by the great physicist’s brother.) Bohr argued for all of his professional life that there existed a structural similarity between psychological predicates and the attributes of interest to quantum physicists. Although he never published the details, he believed he had identified an “epistemological argument common to both fields” (Bohr, 1958, p. 27). For Bohr, no psychologist has direct access to mind just as no physicist has direct access to the atom. Both disciplines use descriptive language which was developed to make sense of the world of direct experience, to describe what cannot be available to direct experience. Bohr summarized this common challenge in the question, “How does one use concepts acquired through direct experience of the world to describe features of reality beyond direct experience?”

Given the central preoccupation of this paper, Bohr’s words are particularly striking: “I want to emphasize that what we have learned in physics arose from a situation where we could not neglect the interaction between the measuring instrument and the object. In psychology, we meet the quite similar situation” (Favrholdt, 1999, p. 203). Also, prominent psychologists echo Bohr’s thinking: “The study of the human mind is so difficult, so caught in the dilemma of being both the object and the agent of its own study, that it cannot limit its inquiries to ways of thinking that grew out of yesterday’s physics” (Bruner, 1990, p. xiii). Given that Bohr never developed his ideas for the epistemological argument common to both fields, what follows also addresses en passant a lacuna in Bohr scholarship.

If all this sounds fanciful (after all, what possible parallels can be drawn between Rasch’s radionuclide on the point of decaying and an individual on the point of answering a question?) it is instructive to return to Rasch’s (1960, p. 11) claim that “what a human being does seems quite haphazard, none less than radioactive emission.” In fact there are striking parallels between the experimenter’s futile attempts to predict the moment of decay and the psychometrician’s attempts to predict the child’s response to a (hitherto unseen) addition problem such as “68 + 57 = ?”

If one restricts oneself to all of the facts about the nuclide, the outcome is completely indeterminate. Similarly, Wittgenstein’s celebrated rule-following argument (central to his philosophies of mind, mathematics and language), set out in his Philosophical Investigations, makes clear that if one restricts oneself to the totality of facts (inner and outer) about the child, these facts are in accord with the right answer (68 + 57 = 125) and an infinity of wrong answers. Mathematics will be used for illustration but the reasoning applies to all rule-following. The reader interested in an accessible exposition of this claim is directed to the second chapter of Kripke’s (1982) Wittgenstein on Rules and Private Language. (The reader should come to appreciate the power of the rule-following reasoning without being troubled by Kripke’s questionable take on the so-called skeptical argument.) The author will now attempt the barest outlines of Wittgenstein’s writing on rule-following .

By their nature, human beings are destined to complete only a finite number of arithmetical problems over a lifetime. The child who is about to answer the question “68 + 57 = ?” for the first time has, of necessity, a finite computational history in respect of addition. Through mathematical reasoning which dates back to Leibniz, this finite number of completed addition problems can be brought under an infinite number of different rules, only one of which is the rule for addition. In short, any answer the child gives to the problem can be demonstrated to be in accord with a rule which generates that answer and all of the answers the child gave to all of the problems he or she has tackled to date. If one had access to the totality of facts about the child’s achievements in arithmetic, one couldn’t use these facts to predict the answer the child will give to the novel problem “68 + 57 = ?” because one can always derive a rule which generates the child’s entire past problem-solving history and any particular answer to “68 + 57 = ?”

Now what of facts concerned with the contents of the child’s mind? Surely an all-seeing God could peer into the child’s mind and determine which rule was guiding the child’s problem-solving? By substituting the numbers 68 and 57 into the rule, God could predict with certainty the child’s response. Alas, having access to inner facts (about the mind or brain) won’t help because having a rule in mind is neither sufficient nor necessary for responding correctly to mathematical problems. Is having a rule in mind sufficient? Clearly not since all pupils taking GCSE mathematics, for example, have access to the quadratic formula and yet only a fraction of these pupils will provide the correct answer to the examination question requiring the application of that formula. Is having the rule in mind necessary? Once again, clearly not because one can be entirely ignorant of the quadratic formula and yet produce the correct answers to algebraic problems involving quadratics using alternative procedures like “completing the square,” graphical methods, the Newton-Raphson procedure, and so on.

It is important to be clear what is being said here. If one could identify an addition problem beyond the set of problems Einstein had completed during his lifetime, is the claim that one couldn’t predict with certainty Einstein’s response to that problem? Obviously not. But the correct answer and an infinity of incorrect answers are in keeping with all the facts (inner and outer) about Einstein. When one is restricted to these facts, Einstein’s ability to respond correctly is indeterminate. In summary, before the child answers the question “68 + 57 = ?” his or her ability with respect to this question is indeterminate. The moment he or she answers, the child’s ability is determinate with respect to the question (125 is pronounced correct, and all other answers are deemed incorrect). One might portray this as follows: before responding the child is right and wrong and, at the moment of response, he or she is right or wrong.

The problem with the Rasch model

Ability only becomes determinate in context of a measurement; it’s indeterminate before the act of measurement. The conclusion is inescapable – ability is a relational property rather than something intrinsic to the individual, as psychology’s standard measurement model would have it. A definite ability cannot be ascribed to an individual prior to measurement. Ability is a joint property of the individual and the measurement instrument; take away the instrument and ability becomes indeterminate. It is difficult to escape the conclusion that ability (and intelligence, and self-concept, and so on) is a property of the interaction between individual and measuring instrument rather than an intrinsic property of the individual. If psychological constructs were viewed as joint properties of individuals and measuring instruments, then intractable questions such as “what is intelligence?”, “what is memory?” need no longer trouble the discipline.

What can be concluded in respect of Rasch? It is clear that the Rasch model is no more capable of separating ability from the item used to measure it than was its predecessor, classical test theory. Pick up any textbook on IRT and one finds the same assumption stated again and again in model development: individuals carry a determinate ability with them from moment to moment and measurement involves checking up on that ability. The ideas of Bohr and Wittgenstein can be used to reject this; for them, measurement effects a “jump” from the indeterminate to the determinate, transforming a potentiality to an actuality.

In simple terms it can be argued that ability has two facets; it is indeterminate before measurement and determinate immediately afterwards. The single description of the standard measurement model is replaced by two mutually exclusive descriptions. Ability is indeterminate before measurement and only determinate with respect to a measurement context. Neither of these descriptions can be dispensed with. The indeterminate and the determinate are mutually exclusive facets of one and the same ability.

Returning to the child who has been taught to add but hasn’t yet encountered the question “68 + 57 = ?” what can be said of his or her ability with respect to this question? When one ponders ability as a thing-in-itself, it’s tempting to think of it as something inner, something that resides in the child prior to being expressed when the child answers. If ability is to be found anywhere, surely it’s to the unmeasured mind one should look? Isn’t it tempting to think of it as something the child “carries” in his or her mind? When the focus is on ability as a thing-in-itself, it seems the child’s eventual answer to the question is somehow inferior; it’s the mere application of the child’s ability rather than the ability itself.

The concept of causality in classical physics is replaced by the notion of “complementarity” in quantum mechanics. Complementarity treats pre-measurement indeterminism and the determinate outcome of measurement as non-separable. Whitaker (1996, p. 184) portrays complementarity as “mutual exclusion but joint completion.” One cannot meaningfully separate the pre-measurement facet of ability from its measurement-determined counterpart. The analogue of Bohr’s complementarity is what Wittgensteinians refer to as first-person/third-person asymmetry. The first-person facet of ability (characterised by indeterminism) and the third-person measurement perspective cannot be meaningfully separated. Suter (1989, pp. 152-153) distinguished the first-person/third-person symmetry of Newtonian attributes from the first-person/third-person asymmetry of psychological predicates: “This asymmetry in the use of psychological and mental predicates – between the first-person present-tense and second- and third-person present-tense – we may take as one of the special features of the mental.” Nagel (1986, p. 22) notes: “the conditions of first-person and third-person ascription of an experience are inextricably bound together in a single public concept.”

This non-separability of first-person and third-person perspectives obviates the need to conclude, with Rasch, that the individual’s response need be “haphazard.” The first-person indeterminism detailed earlier seems to indicate that individuals offer responses entirely at random. After all, the totality of facts is in keeping with an infinity of answers, only one of which is correct. But one need only infer “random variation located within the person” (Borsboom, 2005, p. 55) if one mistakenly treats the first-person facet as separable from the third-person. (The author’s earlier practice of stressing the restriction to the totality of facts about the individual was intended to highlight this taken-for-granted separability.) Lord’s (1980) admonition that item response theorists eschew the “stochastic subject” interpretation for the “repeated sampling” interpretation led IRT practitioners astray by purging entirely the first-person facet from an indivisible whole. One only arrives at conclusions that are “absurd in practice” (p. 227) if one follows Lord (1980) and divorces ability from the item which measures it. Like Rasch, Lord failed to grasp that the within-subject and the between-subject aspects of psychological measurement are profoundly entangled.

Holland, Lord and the ensemble interpretation as the route out of paradox

Holland (1990) repeats Lord’s error by eschewing the stochastic subject interpretation for the random sampling interpretation, despite acknowledging “that most users think intuitively about IRT models in terms of stochastic subjects” (p. 584). The stochastic subject rationale traces the probabilities of the Rasch model to randomness in the individual subject:

Even if we know a person to be very capable, we cannot be sure that he will solve a certain difficult problem, not even a much easier one. There is always a possibility that he fails – he may be tired or his attention is led astray, or some other excuse may be given. And a person of slight ability may hit upon the correct solution to a difficult problem. Furthermore, if the problem is neither “too easy” nor “too difficult” for a certain person, the outcome is quite unpredictable. (Rasch, 1960, p. 73)

Rasch is proposing what quantum physicists call a “local hidden variables” measurement model. While Wittgenstein argues that ability is indefinite before the act of measurement (an act which effects a” jump” from indefinite to definite), psychometricians in general and Rasch in particular, treat ability as definite before measurement. The local hidden variables of the Rasch model are variables such as examinee fatigue, degree of distraction, and any other influence militating against his or her capacity to provide a correct answer. Rasch is suggesting that if one had complete information concerning the examinee’s ability, his or her level of fatigue, propensity for distraction, and so on, one could predict, in principle, the examinee’s response with a high degree of confidence. It is the absence of variables capable of capturing fatigue, attention, and so on, from the Rasch algorithm, that makes its probabilistic nature inevitable. In this local hidden variable model, probability is being invoked because of the measurer’s ignorance of the effects of fatigue, attention loss, and so on.

But Bell (1964) proved beyond doubt that local hidden variables models are impossible in quantum measurement. One can avoid the difficulties thrown up by Bell’s celebrated inequalities by treating unmeasured predicates as indefinite (Fuchs, 2011). This would have profound implications for how one conceives of latent variables in the Rasch model. If local hidden variables are ruled out, latent variables could not be assigned investigation-independent values. Ability only takes on a definite value in a measurement context. IRT can no more separate these two entities (ability and the item used to measure it) than could classical test theory. The “random sampling” approach that Holland (1990) recommends is a so-called “ensemble” interpretation. The definitive text on ensembles – Home and Whitaker (1992) – finds ensembles illegitimate because they mistakenly replace “superpositions” by “mixtures” (Whitaker, 2012, p. 279).

One gets the distinct impression from the IRT literature that the random sampling method is being urged on the field because of embarrassments that lurk in the stochastic subject model. For example Lord (1980, p. 228) refers to the later as “unsuitable”:

The trouble comes from an unsuitable interpretation of the practical meaning of the item response function … If we try to interpret Pi(A) as the probability that a particular examinee A will answer a particular item i correctly, we are likely to reach absurd conclusions. (Lord, 1980, p. 228)

Lord (1980) and Holland (1990) both attempt to avoid embarrassment by taking the simple step of ignoring the stochastic subject for the comfort of an ensemble interpretation. Home and Whitaker (1992) close their text with the words: “[W]e see the ensemble interpretation as the “comfortable” option, creating the illusion that all difficulties may be removed by taking one simple step” (p. 311).

What of the paradox identified earlier?

It is now possible to address the paradox presented earlier. Here is a restatement: If a large sample of individuals of exactly the same ability respond to the same item, designed to measure that ability, why would 27% get it right and 73% get it wrong? Suppose a large number of individuals answer a question (labelled Q1), and, of those who give the correct answer, 100 individuals, say, are posed a second question (labelled Q2). When these 100 individuals respond to Q2, 27% give the correct answer and 73% respond with the wrong answer. What can be said about the ability of each individual immediately after answering Q1 but before answering Q2? Given the natural tendency to think of ability as an attribute of mind, it seems reasonable to focus on the individual’s ability “between questions” as it were.

Poised between questions, each individual’s ability with respect to Q1 is determinate; they have answered Q1 correctly moments before. What of their ability with respect to Q2, the question they have yet to encounter? According to the reasoning presented above, all the facts are in keeping with both a correct and an incorrect answer. The individual’s ability relative to Q2 is indeterminate. Quantum mechanics portrays such states as “superpositions” – the individuals all have the same indefinite ability characterised as: “correct with probability 27% and incorrect with probability 73%.” It is easy to see why 100 individuals each with an ability characterised in this way could be portrayed as subsequently producing 27 correct responses and 73 incorrect responses to Q2.

In this approach the paradox dissolves. All 100 individuals have definite abilities (as measured by Q1), but only 27% go on to answer Q2 correctly. But note the crucial step in the logic required to dissolve the paradox: each individual’s ability is simultaneously determinate with respect to Q1 and indeterminate with respect to Q2. A change in question (from Q1 to Q2) effects a radical change from indeterminate to determinate. It is therefore only meaningful to talk about a definite ability in relation to a measurement context. Ability is a joint property of the individual and the item; pace Rasch they cannot be construed as separable! It follows therefore that the examiner (the person who selects the item) participates in the ability manifest in a response to that item. Pace Rasch measurement in education and psychology is a more dynamic affair than measurement in classical physics. The former is dynamic while the latter is merely a matter of checking up on what’s already there. Because that which is measured is inseparable from the question posed, the measurer participates in what he or she “sees.” Newtonian detachment is as unattainable in psychology and education as it is in quantum theory.

Conclusion

Returning to the real life consequences of this refutation of latent variable modelling in general and Rasch modelling in particular, one cannot escape the conclusion that the OECD’s claims in respect of its PISA project have scant validity given the central dependence of these claims on the clear separability of ability from the items designed to measure that ability.
 

References

Bell, J.S. (1964). On the Einstein-Podolsky-Rosin paradox. Physics, 1, 195-200.
Bohr, N. (1929/1987). The philosophical writings of Niels Bohr: Volume 1 – Atomic theory and the description of nature. Woodbridge: Ox Bow Press.
Bohr, N. (1958/1987). The philosophical writings of Niels Bohr: Volume 2 – Essays 1933 – 1957 on atomic physics and human knowledge. Woodbridge: Ox Bow Press.
Borsboom, D. (2005). Measuring the mind: conceptual issues in contemporary psychometrics. Cambridge: Cambridge University Press.
Borsboom, D., Mellenbergh, G.J., & van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110 (2), 203-219.
Bruner, J.S. (1990). Acts of meaning. Cambridge, MA: Harvard University Press.
Davies, E.B. (2003). Science in the looking glass. Oxford: Oxford University Press.
Davies, E.B. (2010). Why beliefs matter. Oxford: Oxford University Press.
Elliot, C.D., Murray, D., & Pearson, L.S. (1978). The British ability scales. Windsor: National Foundation for Educational Research.
Ertl, H. (2006). Educational standards and the changing discourse on education: the reception and consequences of the PISA study in Germany. Oxford Review of Education, 32(5), 619-634.
Fan, X. (1998). Item response theory and classical test theory: an empirical comparison of their item/person statistics. Educational and Psychological Measurement, 58(3), 357-381.
Favrholdt, D. (Ed.). (1999). Niels Bohr collected works (Volume 10). Amsterdam: Elsevier Science B.V.
Fuchs, C.A. (2011). Coming of age with quantum information: Notes on a Paulian idea. Cambridge: Cambridge University Press.
Hacker, P.M.S. (1993). Wittgenstein, mind and meaning – Part 1 Essays. Oxford: Blackwell.
Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamental of item response theory. Newbury Park, CA: Sage Publications.
Hark ter, M.R.M. (1990). Beyond the inner and the outer. Dordrecht: Kluwer Academic Publishers.
Holland, P.W. (1990). On the sampling theory foundations of item response theory models. Psychometrika, 55(4), 577-601.
Home, D., & Whitaker, M.A.B. (1992). Ensemble interpretation of quantum mechanics. A modern perspective. Physics Reports (Review section of Physics Letters), 210 (4), 223-317.
Jöreskog, K.G., & Sörbom, D. (1993). LISREL 8 user’s reference guide. Chicago: Scientific Software International.
Kalckar, J. (Ed.). (1985). Niels Bohr collected works (Volume 6). Amsterdam: Elsevier Science B.V.
Kripke, S.A. (1982). Wittgenstein on rules and private language. Oxford: Blackwell.
Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers.
Michell, J. (1997). Quantitative science and the definition of measurement in psychology. British Journal of Psychology, 88, 355-383.
Nagel, T. (1986). The view from nowhere. New York: Oxford University Press.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen, Denmark: Paedagogiske Institut.
Rinne, R., & Ozga, J. (2013). The OECD and the global re-regulation of teacher’s work: Knowledge-based regulation tools and teachers in Finland. In T. Seddon & J.S. Levin Eds.), World yearbook of education (pp. 97-116). London: Routledge.
Rost, J. (2001). The growing family of Rasch models. In A. Boomsma, M.A.J. van Duijn, & T.A.B. Snijders (Eds.), Essays on item response theory (pp. 25-42). New York: Springer.
Sobel, M.E. (1994). Causal inference in latent variable models. In A. von Eye & C.C. Clogg (Eds.), Latent variable analysis (pp. 3-35). Thousand Oakes: Sage.
Suter, R. (1989). Interpreting Wittgenstein: A cloud of philosophy, a drop of grammar. Philadelphia: Temple University Press.
Takayama, K. (2008). The politics of international league tables: PISA in Japan’s achievement crisis debate. Comparative Education, 44(4), 387-407.
Thorndike, R.L. (1982). Educational measurement: Theory and practice. In D. Spearritt (Ed.), The improvement of measurement in education and psychology: Contributions of latent trait theory (pp. 3-13). Melbourne: Australian Council for Educational Research.
Whitaker, A. (1996). Einstein, Bohr and the quantum dilemma. Cambridge: Cambridge University Press.
Whitaker, A. (2012). The new quantum age. Oxford: Oxford University Press.
Wittgenstein, L. (1953). Philosophical Investigations. G.E.M. Anscombe, & R. Rhees (Eds.), G.E.M. Anscombe (Tr.). Oxford: Blackwell.
Wittgenstein, L. (1980a). Remarks on the philosophy of psychology Volume 1 (Edited by G.E.M. Anscombe & G.H. von Wright; translated by G.E.M. Anscombe). Oxford: Basil Blackwell.
Wittgenstein, L. (1980b). Remarks on the philosophy of psychology Volume 2 (Edited by G.H. von Wright & H. Nyman; translated by C.G. Luckhardt & M.A.E. Aue). Oxford: Basil Blackwell.
Wright, B.D. (1997). A history of social science measurement. Educational Measurement: Issues and Practice, 16(4), 33-52
Wright, C. (2001). Rails to infinity. Cambridge, MA: Harvard University Press.

Rate this:

Image

Make sure to listen to Radio 4 at 8pm 25/11/2013

25 Monday Nov 2013

Tags

Andreas Schleicher, BBC, BBC Education, David Spiegelhalter, fundamental flaw in Pisa, Harvey Goldstein, Michael Davidson OECD PISA, Michael Gove, Mike Hally, PISA, Professor Svend Kreiner, Times Educational Supplement

Pisa Radio4

Rate this:

Posted by paceni | Filed under Grammar Schools

≈ Leave a comment

PISA International Tests and “Reign of Error” Who do you trust?

28 Monday Oct 2013

Posted by paceni in Grammar Schools

≈ Leave a comment

Tags

Andreas Schleicher, Diane Ravitch, International education rankings, International rankings, International test scores, OECD Pisa, Reign of Error, TES

Diane Ravitch

In Reign of Error : The Hoax of the Privatization Movement by Diane Ravitch
On Twitter @DianeRavitch
ISBN 978-0-385-35088-4 (0-385-35088-0)

Reign of Error book

“Trying to raise America’s test scores in comparison to those of other nations is worse than pointless. It looks to be harmful, for the only way to do it is to divert time, energy, skill and resources away from those other factors which propel the U.S. to the top of the heap on everything that matters: life liberty, and the pursuit of happiness.”

http://www.tes.co.uk/article.aspx?storycode=6344672

They are the world’s most trusted education league tables. But academics say the Programme for International Student Assessment rankings are based on a ‘profound conceptual error’. So should countries be basing reforms on them?

Schleicherphoto3

Rate this:

← Older posts

Subscribe

  • Entries (RSS)
  • Comments (RSS)

Archives

  • May 2019
  • October 2018
  • July 2018
  • June 2018
  • May 2018
  • April 2018
  • February 2018
  • January 2018
  • December 2017
  • September 2017
  • August 2017
  • July 2017
  • June 2017
  • May 2017
  • March 2017
  • February 2017
  • January 2017
  • December 2016
  • September 2016
  • August 2016
  • May 2016
  • January 2016
  • December 2015
  • November 2015
  • August 2015
  • May 2015
  • March 2015
  • February 2015
  • January 2015
  • December 2014
  • October 2014
  • September 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • July 2013
  • June 2013
  • May 2013
  • April 2013
  • February 2013
  • January 2013
  • November 2012
  • October 2012
  • March 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • July 2011
  • June 2011
  • May 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010
  • April 2010
  • March 2010
  • February 2010
  • January 2010
  • December 2009
  • November 2009
  • October 2009
  • September 2009
  • August 2009
  • July 2009
  • June 2009
  • May 2009
  • April 2009
  • March 2009
  • February 2009
  • January 2009
  • December 2008
  • November 2008
  • October 2008
  • September 2008
  • August 2008
  • July 2008
  • June 2008
  • April 2008
  • March 2008
  • February 2008
  • January 2008

Categories

  • 11-plus
  • academic selection
  • Caitriona Ruane
  • General
  • Grammar Schools
  • Numeracy and Literacy
  • The Department of Education N.Ireland
  • Uncategorized

Meta

  • Register
  • Log in

Blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy