Why the Ofqual/CCEA proposal for using teacher judgement to grade 2020 GCSE/A level examinations is indefensible.

Featured

Introduction

The claim made in this essay is that the academic literature clearly indicates that the capacity of teachers to predict their pupils’ grades falls far below acceptable levels. Furthermore, the evidence that teachers can rank-order their pupils within-grade is scant to non-existent. Indeed, the Awarding Bodies couldn’t stand up the claim that any of their examinations rank-order pupils on the construct they purport to measure. It follows that the only defensible solution is to provide two measures per examination: (i) a teacher-predicted grade (without an associated rank-order); and (ii) a test that could be used – if the pupil so decides – to overwrite the teacher prediction, in relevant cases. Where a pupil cannot take the test, he or she must accept the teacher-predicted grade. There is no credible evidence in the literature that one can mobilize some “standardization” algorithm (which has yet to be detailed by Ofqual or CCEA) to somehow correct for any excesses in teacher judgement.

The perils of expert prediction of all types

As far back as 1954 Paul Meehl (Clinical versus Statistical Prediction: A theoretical analysis and a review of the evidence) analysed the ability of a range of teachers to predict measures of academic success, and found scant evidence that this could meet acceptable standards. Meehl’s book ranged far beyond teachers’ predictions of grades to consider, for example, expert predictions of an individual’s probability of violating parole, predictions of success in pilot training, predictions of criminal recidivism, and so on. In his book Thinking, Fast and Slow the Nobel Laureate Daniel Kahneman (2011, p. 225) endorsed Meehl’s findings and stressed that the range of studies demonstrating the limitations of experts’ abilities to predict the future had expanded greatly since Meehl’s book was published:

“Another reason for the inferiority of expert judgement is that humans are incorrigibly inconsistent in making summary judgements of complex information. When asked to evaluate the same information twice, they frequently give different answers. The extent of the inconsistency is often a matter of real concern. Experienced radiologists who evaluate chest X-rays as “normal” or “abnormal” contradict themselves 20% of the time when they see the same picture on separate occasions. A study of 101 independent auditors who were asked to evaluate the reliability of internal corporate audits revealed a similar degree of inconsistency. A review of 41 separate studies of the reliability of judgements made by auditors, psychologists, pathologists, organizational managers, and other professionals suggests that this level of inconsistency is typical, even when a case is re-evaluated within a few minutes. Unreliable judgements cannot be valid predictors of anything.”

The perils of predicting within-grade rank order

 Ofqual and CCEA are requiring teachers to rank order their pupils according to their achievement in mathematics, English, biology, and so on. However, no GCSE or A level product designed by the Awarding Bodies can itself perform this feat. The rank-ordering of candidates on the construct “achievement in geography,” for example, is a validity issue and the Awarding Bodies have a very, very poor record in this area.

In 1991 an expert on the work of the examination boards, Robert Wood, summarized his conclusions in the book Assessment and Testing: A survey of research commissioned by the University of Cambridge Local Examination Syndicate (UCLES). On pages 147- 151 he wrote:

“If an examining board were to be asked point blank about the validities of its offerings or, more to the point, what steps it takes to validate the grades it awards, what might it say? … The examining boards have been lucky not to have been engaged in validity argument. … Nevertheless, the extent of the boards’ neglect of validity is plain to see once attention is focused. Whenever boards make claims that they are measuring the ability to make clear reasoned judgements, or the ability to form conclusions (both examples from IGCSE and Economics), they have a responsibility to at least attempt a validation of the measures. … The boards know so little about what they are assessing that if, for instance, it were to be said that teachers assess ability … rather than achievement, the boards would be in no position to defend themselves. … As long as examination boards make claims that they are assessing this or that ability or skill, they are vulnerable to challenge from disgruntled individuals.”

The claim that a GCSE or A level examination could rank-order candidates on some appropriate construct would require the Awarding Bodies to use Structural Equation Modelling to compute three indices: root mean-square residual, adjusted goodness-of-fit, and chi-squared divided by degrees of freedom. To claim a rank order, these three statistics would have to be demonstrated to satisfy relevant inequalities. Can it be reasonable to ask teachers to predict something that is beyond the capabilities of the GCSE and A level examinations themselves?

The resolution

Needless to say, staff at Ofqual and CCEA are mandated to provide young people with grades that are as error-free as possible. They should take heed of Paul Meehl’s counsel in respect of teachers’ capacities to anticipate the future: “When one is dealing with human lives and life opportunities, it is immoral to adopt a mode of decision-making which has been demonstrated repeatedly to be … inferior.” If teacher judgement (omitting the requirement to rank-order) is to be used to forecast grades, pupils must also be offered speedy access to a public examination which protects them from the well-documented vagaries of teacher prediction.

Stephen Elliott

 

 

 

 

Advertisement

A Response to the Queens University Belfast, Stranmillis College Report of the Expert Panel on Underachievement

Tags

, , , , , , , , , , , , , , , , , , , , , , , , ,

A Fair Start, the report of the Expert Panel on Educational Underachievement in Northern Ireland, fails to identify a significant culprit behind the achievement gap: the Northern Ireland Curriculum (NIC), followed by all children in our schools.  While Scotland’s “Curriculum for Excellence” (CFE) has many, many similarities to the NIC, the University of Edinburgh’s Professor Paterson’s analysis of the causes of Scottish underachievement has little in common with Dr Purdy’s.  In a 2018 article – entitled “Scotland’s Curriculum for Excellence: the betrayal of a whole generation?” – Paterson traces underachievement in Scotland to that country’s 2004 decision to eschew traditional approaches to teaching and learning for a “constructivist” curriculum.  Constructivism’s “child-centred” approach has strong links to underachievement in disadvantaged children, and Paterson is not surprised that “the curriculum has recently been the centre of widespread disquiet.”

Like the NIC, Scotland’s CFE was lauded on all sides.  Paterson writes: “An outsider might notice the remarkable consensus that has accompanied its development at the heart of policy for school education.  The report which launched it in 2004 is still endorsed by all five political parties in the Scottish Parliament.  That report proclaimed a child-centred philosophy that ran counter to the educational ideas that have dominated in England since the 1980s.”  According to Paterson, “The universities officially accepted the ideas uncritically, with their teacher-education faculties notably enthusiastic.”  Dr Purdy teaches at Stranmillis College whose website strongly endorses the NIC.

Now we get to the nub of the problem.  The NIC and the CFE have adopted an approach to learning – called constructivism – which has been demonstrated to damage the life chances of the disadvantaged.  Paterson writes: “But the reason why the new curriculum is a plausible culprit for the [attainment] decline lies in what it gets children to learn.  It belongs to that strand of curricular thinking known as constructivism.”  The shortcomings of constructivism were well known when the NIC and the CFE were being launched.  The philosopher Michael Devitt warned: “I have a candidate for the most dangerous contemporary intellectual tendency, it is … constructivism.”

In this departure from common sense, the child is construed as constructing meanings, intentions, understandings and so on, in his or her head.  Mental attributes are hidden inside the child’s mind and have no external facets.  Gadsby writes: “Constructivism [is] a view of teaching and learning predicated upon a simple but profound principle that learning is something which can only happen inside the heads of learners.”  The NIC interprets this as shifting the responsibility for learning from teacher to child.  On page 29 of CCEA’s Assessment for Learning: a practical guide appears the exhortation: “Crucially, they [pupils] need to take responsibility for their own learning and its improvement.  We [teachers] can’t do it for them!”  (It is difficult not to attach the following naive interpretation to CCEA’s words: since learning happens in the child’s head, it is “visible” only to the child.  The teacher, on the other hand, has no direct access to what the child has “constructed” in the privacy of his or her mind.)  Hey presto, the teacher’s responsibility for the child’s learning is much reduced.

Sir Roger Scruton

It isn’t difficult to see why a philosopher like Devitt might be suspicious of any portrayal of understanding or meaning as exclusively inner “activities.”  Wittgenstein has established that one gets complete incoherence when one omits the external world.  Mark Rowlands writes: “According to Wittgenstein, to mean, intend or understand something by a sign … is to possess a capacity to adjust one’s use of the sign to bring it into line with custom or practice.  And this connects meaning, intending and understanding with structures [viz. practices] that are external to the subject of this meaning, intending and understanding.”  The late Sir Roger Scruton rejected “all attempts to understand the human mind in isolation from the social practices through which it finds expression,” and the great American mathematician and philosopher Hilary Putnam concluded that “meanings just ain’t in the head.”

Hilary Putnam

In the traditional classroom, teacher and pupils are jointly responsible for learning.  Moreover, the teacher needn’t have direct access to the child’s mind in order to establish if he or she understands long division, for example.  What the child writes in a test (something external) serves as criteria for the ascription of understanding.  In the common-sense world of the traditional classroom, the biology teacher need not peer into a child’s mind before concluding that he or she grasps the meaning of “gene.”

Finally, it is important to link the Northern Ireland Curriculum – with constructivism at its core – with underachievement.  In her 1990 book “The Academic Achievement Challenge” the distinguished Harvard academic Jean Chall conducted a detailed study of a century of research on the effective teaching of disadvantaged children, finding no evidence for the efficacy of methods which depart from traditional principles.  The Expert Panel make no mention of Chall’s wide-ranging review.

On page 171, Chall writes: “Whenever the students were identified as coming from families of low socioeconomic status, they achieved at higher levels when they received a more formal, traditional education. …The teacher-centred approach was also more effective for students with learning disabilities at all social levels.  On the whole, the research found that at-risk students at all social levels achieved better academically when given a traditional education.”  On page 182, Chall draws this conclusion from 100 years of peer-reviewed evidence: “The major conclusion of my study in this book is that a traditional teacher-centred approach to education generally results in higher academic achievement than a progressive student-centred approach.  This is particularly so among students who are less well prepared for academic learning – poor children and those with learning difficulties at all social and economic levels.

The assignment of the NIC to the bin and a return to traditional approaches to teaching and learning could do much to offset the £180,000,000 figure quoted in the Expert Panel’s report.

Why Ofqual and CCEA are set to repeat the failings of 2020 in respect of 2021 awarded grades for GCSE, AS and A2.

Tags

, , , , , , , , , , , , , , , , , , ,

Dr Hugh G Morrison (The Queen’s University of Belfast [retired])

In 2020 the impact of Covid-19 prompted the Conservative government to replace public examinations (GCSE, AS and A2) by a process involving teacher-predicted grades.  In this approach, an algorithm was used to keep teacher-predicted grades in check.  This combination of teacher-predicted grades and algorithm – endorsed by Ofqual and CCEA – was attacked from all quarters in what became known as the 2020 “grades fiasco.”  In Summer 2021, while teacher-assessed grades are to continue to play a pivotal role, the algorithm is to be replaced by another form of quality assurance: teacher moderation. 

Ofqual hopes that the 2021 marriage of predicted grades and moderation will deliver grades which are “credible and meaningful.”  Ofqual also asserts that the moderation training provided by the exam boards will help teachers “objectively and consistently assess their students’ performance” so that “the [2021] grades will be indistinguishable from grades issued by exam boards in other years.”

Anyone who has engaged in GCSE coursework moderation will be aware of its shortcomings, and one member of Ofqual’s Standards Advisory Group has warned that the 2021 approach to quality-assuring GCSE, AS and A2 grades could potentially issue in “Weimar Republic” levels of grade inflation.

It is instructive to give the broadest overview of how Ofqual and CCEA see moderation creating the circumstances whereby the 2021 grades will be “indistinguishable” from the grades issued by exam boards in past years.  Consider a GCSE geography teacher.  She receives training from the examining board designed to help her “internalise” the standards associated with each of the grades in GCSE geography.  For example, she might be given the completed examination scripts of several different students who were graded A* in geography, several scripts graded A, and so on, down through the grades.  The exam board officials might stress the key features of particular scripts that mark them out as meeting the standard represented by a particular grade.  By scrutinising the completed scripts of several students graded C, for example, the teacher might gain insights into the standard: “grade C in GCSE geography.” 

Our geography teacher then turns her attention to her own students’ “portfolios.”  A student’s portfolio might contain his or her responses to mock examinations, so-called “mini tests” provided by the exam board, class tests, project work, examples of outstanding homework, and so on.  The geography teacher then uses the scale of standards she internalised during training to assign a best-fit grade to each portfolio.  Finally, external moderation might involve her exam board selecting a representative sample of her students’ portfolios to establish if her grading decisions accord with the combined professional wisdom of the exam board moderators.

Ofqual are prepared to acknowledge that moderation has some limitations: “It is often the case that two trained markers could give slightly different marks for the same answer and that both marks would be legitimate.” However, carefully conducted research paints a more depressing picture.  In his book The schools we need, and why we don’t have them, E. D. Hirsch describes a study carried out by the world’s leading testing agency, the Educational Testing Service (ETS).  In the study, in which “300 student papers were graded by 53 graders (a total of 15,900 readings), more than one third of the papers received every possible grade. That is, 101 of the 300 papers received all nine grades. … 94% [of the papers] received either seven, eight or nine different grades; and no essay received less than five different grades from 53 readers” (pp. 183-185).

Since a particular essay cannot be simultaneously worthy of nine different grades, for example, one is forced to the conclusion that a well-defined intrinsic worth cannot be ascribed to a particular essay.  The essay’s worth can only be ascribed relative to a particular marker.  In summary, the grade is not a property of the essay; rather, it is a joint property of the essay and the particular marker.  To communicate unambiguously about the quality of the essay one must specify the measuring tool (in this case the marker).  The grade is best construed as a property of an interaction between essay and marker rather than an intrinsic property of the essay.

It is important to stress that moderation’s difficulties extend beyond disciplines where essay-type questions predominate; all examinations are affected.  In the text Fundamentals of Item Response Theory, Hambleton et al. address all tests: “examinee characteristics and test characteristics cannot be separated: each can be interpreted only in the context of the other … An examinee’s ability is defined only in terms of a particular test.  When the test is “hard,” the examinee will appear to have low ability; when the test is “easy,” the examinee will appear to have higher ability. … Whether an item is hard or easy depends on the ability of the examinees being measured, and the ability of the examinees depends on whether the test items are hard or easy!” (pp. 2-3)  One cannot meaningfully separate the notion of ability from the characteristics of the particular test used to measure that ability.

Now let’s return to our GCSE geography teacher.  During her exam board training she studied the responses given by students who achieved specified grades in particular GCSE geography examinations.  However, she must now use her judgement to assign grades to her students’ portfolios, a very different form of assessment.  Dr Mike Cresswell – a past Chief Executive of the exam board AQA – underlines “the need to accept that there is no external and objective reality underpinning the comparability of results from different examinations.”  This presents our geography teacher with an intractable problem: she must somehow gain insights into absolute standards which float free of any particular examination.  For example, the absolute standard “B grade in GCSE geography” makes no reference to a particular test.  Alas, in Cresswell’s words, “absolute examination standards are … a chimera.”

Why Ofqual and CCEA are set to repeat the failings of 2020 in respect of 2021 awarded grades for GCSE, AS and A2.

Tags

, , , , , , , , , , , ,

Dr Hugh G Morrison (The Queen’s University of Belfast [retired])

In 2020 the impact of Covid-19 prompted the Conservative government to replace public examinations (GCSE, AS and A2) by a process involving teacher-predicted grades.  In this approach, an algorithm was used to keep teacher-predicted grades in check.  This combination of teacher-predicted grades and algorithm – endorsed by Ofqual and CCEA – was attacked from all quarters in what became known as the 2020 “grades fiasco.”  In Summer 2021, while teacher-assessed grades are to continue to play a pivotal role, the algorithm is to be replaced by another form of quality assurance: teacher moderation. 

Ofqual hopes that the 2021 marriage of predicted grades and moderation will deliver grades which are “credible and meaningful.”  Ofqual also asserts that the moderation training provided by the exam boards will help teachers “objectively and consistently assess their students’ performance” so that “the [2021] grades will be indistinguishable from grades issued by exam boards in other years.”

Anyone who has engaged in GCSE coursework moderation will be aware of its shortcomings, and one member of Ofqual’s Standards Advisory Group has warned that the 2021 approach to quality-assuring GCSE, AS and A2 grades could potentially issue in “Weimar Republic” levels of grade inflation.

It is instructive to give the broadest overview of how Ofqual and CCEA see moderation creating the circumstances whereby the 2021 grades will be “indistinguishable” from the grades issued by exam boards in past years.  Consider a GCSE geography teacher.  She receives training from the examining board designed to help her “internalise” the standards associated with each of the grades in GCSE geography.  For example, she might be given the completed examination scripts of several different students who were graded A* in geography, several scripts graded A, and so on, down through the grades.  The exam board officials might stress the key features of particular scripts that mark them out as meeting the standard represented by a particular grade.  By scrutinising the completed scripts of several students graded C, for example, the teacher might gain insights into the standard: “grade C in GCSE geography.” 

Our geography teacher then turns her attention to her own students’ “portfolios.”  A student’s portfolio might contain his or her responses to mock examinations, so-called “mini tests” provided by the exam board, class tests, project work, examples of outstanding homework, and so on.  The geography teacher then uses the scale of standards she internalised during training to assign a best-fit grade to each portfolio.  Finally, external moderation might involve her exam board selecting a representative sample of her students’ portfolios to establish if her grading decisions accord with the combined professional wisdom of the exam board moderators.

Ofqual are prepared to acknowledge that moderation has some limitations: “It is often the case that two trained markers could give slightly different marks for the same answer and that both marks would be legitimate.” However, carefully conducted research paints a more depressing picture.  In his book The schools we need, and why we don’t have them, E. D. Hirsch describes a study carried out by the world’s leading testing agency, the Educational Testing Service (ETS).  In the study, in which “300 student papers were graded by 53 graders (a total of 15,900 readings), more than one third of the papers received every possible grade. That is, 101 of the 300 papers received all nine grades. … 94% [of the papers] received either seven, eight or nine different grades; and no essay received less than five different grades from 53 readers” (pp. 183-185).

Since a particular essay cannot be simultaneously worthy of nine different grades, for example, one is forced to the conclusion that a well-defined intrinsic worth cannot be ascribed to a particular essay.  The essay’s worth can only be ascribed relative to a particular marker.  In summary, the grade is not a property of the essay; rather, it is a joint property of the essay and the particular marker.  To communicate unambiguously about the quality of the essay one must specify the measuring tool (in this case the marker).  The grade is best construed as a property of an interaction between essay and marker rather than an intrinsic property of the essay.

It is important to stress that moderation’s difficulties extend beyond disciplines where essay-type questions predominate; all examinations are affected.  In the text Fundamentals of Item Response Theory, Hambleton et al. address all tests: “examinee characteristics and test characteristics cannot be separated: each can be interpreted only in the context of the other … An examinee’s ability is defined only in terms of a particular test.  When the test is “hard,” the examinee will appear to have low ability; when the test is “easy,” the examinee will appear to have higher ability. … Whether an item is hard or easy depends on the ability of the examinees being measured, and the ability of the examinees depends on whether the test items are hard or easy!” (pp. 2-3)  One cannot meaningfully separate the notion of ability from the characteristics of the particular test used to measure that ability.

Now let’s return to our GCSE geography teacher.  During her exam board training she studied the responses given by students who achieved specified grades in particular GCSE geography examinations.  However, she must now use her judgement to assign grades to her students’ portfolios, a very different form of assessment.  Dr Mike Cresswell – a past Chief Executive of the exam board AQA – underlines “the need to accept that there is no external and objective reality underpinning the comparability of results from different examinations.”  This presents our geography teacher with an intractable problem: she must somehow gain insights into absolute standards which float free of any particular examination.  For example, the absolute standard “B grade in GCSE geography” makes no reference to a particular test.  Alas, in Cresswell’s words, “absolute examination standards are … a chimera.”

Is a teachers union tail now wagging the Education Minister dog on examinations?

Tags

, , , , , , , , ,

The National Association of Head Teachers (Northern Ireland) published a pdf outlining their proposals for the GCSE and A-Level examinations for 2021. Unsurprisingly the proposals sought to consolidate the aims and objectives of the teaching profession in Northern Ireland – a permanent transition to formative assessment instead of the conventional, objective summative assessment (commonly known to parents and wider society as examinations).

The Parental Alliance for Choice in Education carefully considered the NAHT (NI) document and contacted their president, Graham Gault, by email. A request for a reply met with no response. In the interim the minister of education, Peter Weir, is to make a statement on exam arrangements in the assembly on Tuesday 15th December, 2020.

A copy of the request to the NAHT (NI) for published evidence supporting their claim “that optionality in examinations should play no part in any mitigations as it firmly discriminates against lower ability students and many students have additional learning needs”

Speculation from the BBC reporter Robbie Meredith suggests that the number of exam papers a pupil will be asked to take in each subject will be reduced. In addition changes may include allowing schools some choice over the topics on which their pupils will be examined. Again all down to coronavirus.

A good reason for the minister, parents and indeed any unbiased media to be sceptical of claims made by the teaching unions is their propensity to ignore inconvenient facts. Ofqual published on December 3rd 2020 a briefing paper: Research findings relevant to the potential use of optionality in examinations in 2021.

https://www.gov.uk/government/publications/optionality-in-gcse-and-a-level-exams

Ofqual detailed that students sitting examinations in summer 2021 will have studied variable amounts of the curriculum. Some stakeholders have suggested that, to be fair to students, optionality should be introduced or expanded for 2021 examinations.

The minister is clearly on the horns of a dilemma will he direct CCEA’s Justin Edwards to follow the evidence provided by Ofqual on optionality or will allow a teachers’ union led by the Pritt Stick & toilet roll principal, Graham Gault to dictate unsupported and unsubstantiated orders to him?

What CCEA & the media don’t want the public to know about the predicted grade debacle

Tags

, , , , , , ,

 

 

 

questions to Justin Edwards CEO CCEA

Mr Justin Edwards, the Chief Executive Officer of CCEA, Northern Ireland’s Council for Curriculum, Examinations & Assessment issued a reply on May 26th, 2020. It should be noted that CCEA regulates itself. Ofqual do not regulate CCEA

The first thing to notice, given the specific nature of the questions, is the ambiguous language of the replies.

Given the very specific request made by Dr Morrison in question one  for CCEA to “identify a single peer-reviewed study”  which would confirm that ” teachers could predict rank orders within grades with any degree of accuracy” the official reply failed to produce any evidence.

Question two raised the issue of the standards conundrum.  The international literature highlights the fact that the predictions are only acceptable when teachers have access to the examination papers that the pupils would have taken.

Mr Edwards responds for CCEA without giving any reasoning.CCEA will not be releasing any planned summer 2020 examination papers.”

 

CCEA reply on 2020 exam questions

MLAs and their advisors could be using guesswork to justify decisions to lockdown businesses and close schools

Tags

, , , , , , , , , , , , , , , , , , , , , ,

The use of the R number in the Assembly’s release-of-evidence points to a profound misunderstanding of the limitations of that number.  In particular, R is not an additive variable; one cannot meaningfully add the contribution to R of hair salons to the contribution to R of pubs and then compare with 1.  This strategy makes no arithmetic sense. 

Unfortunately, the R number changes with the model used to measure it.   The Scientific Pandemic Influenza group on Modelling is a standing group that advises government on preparations to manage the risk of pandemics and keeps emerging evidence and research under review.  SPI-M use approximately ten different models to arrive at an R number for the UK.  This R number is calculated by attempting to somehow reconcile these differing values, each calculated with great uncertainty. 

How can our MLAs possibly justify basing decisions which impact on people’s livelihoods, on the tiny R-related percentages published in the Assembly’s evidence?  The uncertainty of R renders this unjustifiable.

These concerns about R are clear in the literature.  Guerra et al. (2017) could only locate the R number for measles somewhere between 3.7 and 203.3.

  Jing, Blakely and Smith (2011) published a paper entitled The Failure of R0, in which the authors conclude, “Rarely has an idea so erroneous enjoyed such popular appeal”.

Coming right up to date, The Royal Society’s report entitled Reproduction number (R) and growth rate (r) of the COVID -19 epidemic in the UK (on page 53) struggles to make the case for R: “Given the suggested wide bounds of uncertainty that surround estimates of R in particular … are they still of value in policy formulation?  The answer is definitely, yes … this is certainly a much better place to be in than just making a guess through verbal argument.”

Northern Ireland’s Health Minister and his two shields, the Chief Medical Officer and the Chief Scientific Advisor attempt to see off detractors by urging them to look at the “evidence in the round.”  I am confident that no amount of additional evidence produced by the Minister and his two advisors will see off the criticisms set out in this letter.

The warning letter on centre-based moderation sent to CCEA on 23rd June

Tags

, , , , , , , , , , , , , , , , , , , , , ,

Why Centre-based Moderation cannot work

Ofqual and CCEA intend to apply a “moderation” process to teacher-predicted grades in order to prevent, for example, teachers awarding inflated grades to their students.  This process – yet to be set out in detail – will focus on the Examination Centre in which each pupil would have taken his or her 2020 GCSE, AS or A2 examinations had Covid-19 not intervened.  To simplify matters, consider a Centre in which, for instance, 20% to 24% of pupils have secured B grades in CCEA AS Physics for the past three years.  Now suppose that in 2020 the physics teachers associated with that Centre return a B-grade prediction for 67% of their AS pupils.  Does a statistical technique, or AI algorithm, or mathematical model (possibly drawing on teachers’ predicted rank orders) exist which can defensibly adjust the predicted grades to bring them into line with the 20% to 24% range of the past?  Can one compute a defensible compromise position somewhere between 20% and 67%?  The answer is an emphatic No.

It is difficult to escape the conclusion that Ofqual and CCEA simply interpret grades as quantities which are countable and can be assigned to individual pupils.  But two of the most influential figures in UK assessment reject these claims.  The UK’s examining bodies and researchers in education have been, for years, treating grades as quantifiable entities.  This is because the awarding bodies have a very poor track record for in-depth thinking about the nature of the “measurements” in which they engage (see Wood[1] (1991)).

There are few individuals with Mike Cresswell’s understanding of the grading of UK examinations.  Cresswell’s definition[2] of a grade as representing “whatever standard of attainment it is judged by the awarders to represent” (p. 224), indicates that counting grades as one might count pencils is indefensible.  Let me be clear: I am not suggesting that the process for awarding grades needs to be abandoned.  I am simply making the point that grading is not governed by strict scientific principles and that adding or subtracting grades is mathematically impermissible.  As Cresswell’s definition makes clear, the grading process is a qualitative process rather than a quantitative one.

Now why is Cresswell forced to this vague qualitative definition of a grade?  The reason is that the awarding bodies, education researchers, and the general public think of the grade awarded to a given pupil as a measure of the ability of that pupil.  The awarding bodies think of a grade as a property of the particular pupil to whom it is awarded.  But this is wrong: a grade is not an intrinsic property of the pupil but rather a joint property of the pupil and the examination from which the grade derives.  In his 1996 book Assessment: Problems, Developments and Statistical Issues Harvey Goldstein[3] (a towering figure who contributed much to debates on statistical rigour in UK assessment) cautions: “[T]he object of measurement is expected to interact with the measurement in a way that may alter the state of the individual in a non-trivial fashion” (p. 54).

According to Goldstein, the examination does not merely “check up on” a pre-existing ability that the candidate had when he or she entered the examination hall.  This static model is rejected for a more dynamic alternative in which the pupil’s ability is expressed through his or her responses to the questions which make up the examination.  For Goldstein, ability changes as the candidate interacts with the examination questions: “Thus, on answering a sequence of test questions about quadratic equations the individual may well become “better” at solving them so that the attribute changes during the course of the test” (p. 54).  Cresswell’s grade is not an intrinsic property of the candidate; rather, it’s the property of an interaction.  Grades do not lend themselves to simple arithmetic manipulation and therefore quantitative procedures – such as simple regression or neural nets – are indefensible.

One can find unequivocal support for the claims of Cresswell and Goldstein in the writings of Niels Bohr and Ludwig Wittgenstein.  Also Joel Michell’s research[4] can be used to establish that grades do not satisfy the seven Hölder axioms[5] and therefore are not quantifiable.  There can be little doubt that the claims of Ofqual and CCEA that Centre-based statistical techniques, or algorithms, or mathematical modelling, can be used to moderate predicted grades are without foundation.

Dr Hugh Morrison (The Queen’s University of Belfast [retired])

[1] Wood, R. (1991).  Assessment and testing: A survey of research.  Cambridge: Cambridge University Press.

[2] Baird, J., Cresswell, M., & Newton, P. (2000).  Would the real gold standard please step forward? Research Papers in Education, 15(2), 213-229.

[3] Goldstein, H. (1996).  Statistical and psychometric models for assessment.  In H. Goldstein & T. Lewis (Eds.), Assessment: problems, developments and statistical issues (pp. 41-55).  Chichester: John Wiley & Sons.

[4] Michell, J. (1999).  Measurement in psychology.  Cambridge: Cambridge University Press.

[5] Michell, J., & Ernst, C. (1996).  The axioms of quantity and the theory of measurement, Part 1, An English translation of Hölder (1901), Part 1, Journal of Mathematical Psychology, 40, 235-52.

(1997), The axioms of quantity and the theory of measurement, Part II, An English translation of Hölder (1901), Part II, Journal of Mathematical Psychology, 41, 345-56.

Hölder, O. (1901).  Die axiome der quatität und die lehre vom mass,  Berichte der Sachsischen Gesellschaft der Wissenschaften, Mathematische-Physicke Klasse, 53, 1-64.

Lyra McKee

This is the sort of honest communication between the generations that has the potential to do more good than the wasted hundreds of millions spent on conflict resolution ever could.

seftonblog

I’ve waited some time before putting pen to paper.

The death of a young woman, not much older than my daughter, is hard.

Murder is harder still.

When I got the news, at two in the morning, I did not sleep again that night.

I was introduced to Lyra about five years ago.There could be be fewer similarities.

I met this small, owlish, slightly diffident girl, in a Victoria Square coffee shop. She met a grumpy old man , with issues and a background. She had many difficulties with technology, which we laughed about. She was softly spoken, and I’m slightly deaf.

I was hoping that she could introduce me to contacts that might progress my enquiries into the murders of my parents. This she did.

Despite the disparity in our ages and in our experience of the world, she dispensed sage advice about me and my predicament.  She was…

View original post 402 more words

Why AI will never expose the “mind’s inner workings”

Tags

, , , , ,

The text of a letter submitted to the New Scientist in reply to an article by Timothy Revell on a claim that mind-reading devices can access your thoughts and dreams using AI.

As usual there has been no acknowledgement, response or publication by the New Scientist

Timothy Revell’s article Thoughts Laid Bare (29 September, p. 28) illustrates a worrying tendency of AI enthusiasts to over-hype the capabilities of their algorithms. The article suggests that AI offers the possibility of the “ultimate privacy breach” by gaining access to “one of the only things we can keep to ourselves,” namely, “the thoughts in our heads.”

Niels Bohr counselled that the hallmark of science is not experiment or even quantification, but “unambiguous communication.” AI has much to learn from this great physicist. When one scans an individual’s brain one does not thereby gain any access whatsoever to that individual’s thoughts; brains are in the head while thoughts are not. The brain isn’t doing the thinking. As far back as 1877, G H Lewes cautioned: “It is the man and not the brain that thinks.” To quote Peter Hacker, what neuroscientists show us “is merely a computer-generated image of increased oxygenation in select areas of the brain” of the thinking individual. Needless to say, one cannot think without an appropriately functioning brain, but thinking is not located in the brain; no analysis of neural activity will give insights to thoughts because thinking is neither an activity of the mind or the brain.

In ascribing thoughts to the brain or the mind (rather than to the individual) AI falls prey to a fallacy that can be traced all the way back to Aristotle: the “mereological fallacy.”

Dr Hugh Morrison, The Queen’s University, Belfast (retired)

drhmorrison@gmail.com