Why the Ofqual/CCEA proposal for using teacher judgement to grade 2020 GCSE/A level examinations is indefensible.



The claim made in this essay is that the academic literature clearly indicates that the capacity of teachers to predict their pupils’ grades falls far below acceptable levels. Furthermore, the evidence that teachers can rank-order their pupils within-grade is scant to non-existent. Indeed, the Awarding Bodies couldn’t stand up the claim that any of their examinations rank-order pupils on the construct they purport to measure. It follows that the only defensible solution is to provide two measures per examination: (i) a teacher-predicted grade (without an associated rank-order); and (ii) a test that could be used – if the pupil so decides – to overwrite the teacher prediction, in relevant cases. Where a pupil cannot take the test, he or she must accept the teacher-predicted grade. There is no credible evidence in the literature that one can mobilize some “standardization” algorithm (which has yet to be detailed by Ofqual or CCEA) to somehow correct for any excesses in teacher judgement.

The perils of expert prediction of all types

As far back as 1954 Paul Meehl (Clinical versus Statistical Prediction: A theoretical analysis and a review of the evidence) analysed the ability of a range of teachers to predict measures of academic success, and found scant evidence that this could meet acceptable standards. Meehl’s book ranged far beyond teachers’ predictions of grades to consider, for example, expert predictions of an individual’s probability of violating parole, predictions of success in pilot training, predictions of criminal recidivism, and so on. In his book Thinking, Fast and Slow the Nobel Laureate Daniel Kahneman (2011, p. 225) endorsed Meehl’s findings and stressed that the range of studies demonstrating the limitations of experts’ abilities to predict the future had expanded greatly since Meehl’s book was published:

“Another reason for the inferiority of expert judgement is that humans are incorrigibly inconsistent in making summary judgements of complex information. When asked to evaluate the same information twice, they frequently give different answers. The extent of the inconsistency is often a matter of real concern. Experienced radiologists who evaluate chest X-rays as “normal” or “abnormal” contradict themselves 20% of the time when they see the same picture on separate occasions. A study of 101 independent auditors who were asked to evaluate the reliability of internal corporate audits revealed a similar degree of inconsistency. A review of 41 separate studies of the reliability of judgements made by auditors, psychologists, pathologists, organizational managers, and other professionals suggests that this level of inconsistency is typical, even when a case is re-evaluated within a few minutes. Unreliable judgements cannot be valid predictors of anything.”

The perils of predicting within-grade rank order

 Ofqual and CCEA are requiring teachers to rank order their pupils according to their achievement in mathematics, English, biology, and so on. However, no GCSE or A level product designed by the Awarding Bodies can itself perform this feat. The rank-ordering of candidates on the construct “achievement in geography,” for example, is a validity issue and the Awarding Bodies have a very, very poor record in this area.

In 1991 an expert on the work of the examination boards, Robert Wood, summarized his conclusions in the book Assessment and Testing: A survey of research commissioned by the University of Cambridge Local Examination Syndicate (UCLES). On pages 147- 151 he wrote:

“If an examining board were to be asked point blank about the validities of its offerings or, more to the point, what steps it takes to validate the grades it awards, what might it say? … The examining boards have been lucky not to have been engaged in validity argument. … Nevertheless, the extent of the boards’ neglect of validity is plain to see once attention is focused. Whenever boards make claims that they are measuring the ability to make clear reasoned judgements, or the ability to form conclusions (both examples from IGCSE and Economics), they have a responsibility to at least attempt a validation of the measures. … The boards know so little about what they are assessing that if, for instance, it were to be said that teachers assess ability … rather than achievement, the boards would be in no position to defend themselves. … As long as examination boards make claims that they are assessing this or that ability or skill, they are vulnerable to challenge from disgruntled individuals.”

The claim that a GCSE or A level examination could rank-order candidates on some appropriate construct would require the Awarding Bodies to use Structural Equation Modelling to compute three indices: root mean-square residual, adjusted goodness-of-fit, and chi-squared divided by degrees of freedom. To claim a rank order, these three statistics would have to be demonstrated to satisfy relevant inequalities. Can it be reasonable to ask teachers to predict something that is beyond the capabilities of the GCSE and A level examinations themselves?

The resolution

Needless to say, staff at Ofqual and CCEA are mandated to provide young people with grades that are as error-free as possible. They should take heed of Paul Meehl’s counsel in respect of teachers’ capacities to anticipate the future: “When one is dealing with human lives and life opportunities, it is immoral to adopt a mode of decision-making which has been demonstrated repeatedly to be … inferior.” If teacher judgement (omitting the requirement to rank-order) is to be used to forecast grades, pupils must also be offered speedy access to a public examination which protects them from the well-documented vagaries of teacher prediction.

Stephen Elliott





Why Ofqual and CCEA are set to repeat the failings of 2020 in respect of 2021 awarded grades for GCSE, AS and A2.


, , , , , , , , , , , ,

Dr Hugh G Morrison (The Queen’s University of Belfast [retired])

In 2020 the impact of Covid-19 prompted the Conservative government to replace public examinations (GCSE, AS and A2) by a process involving teacher-predicted grades.  In this approach, an algorithm was used to keep teacher-predicted grades in check.  This combination of teacher-predicted grades and algorithm – endorsed by Ofqual and CCEA – was attacked from all quarters in what became known as the 2020 “grades fiasco.”  In Summer 2021, while teacher-assessed grades are to continue to play a pivotal role, the algorithm is to be replaced by another form of quality assurance: teacher moderation. 

Ofqual hopes that the 2021 marriage of predicted grades and moderation will deliver grades which are “credible and meaningful.”  Ofqual also asserts that the moderation training provided by the exam boards will help teachers “objectively and consistently assess their students’ performance” so that “the [2021] grades will be indistinguishable from grades issued by exam boards in other years.”

Anyone who has engaged in GCSE coursework moderation will be aware of its shortcomings, and one member of Ofqual’s Standards Advisory Group has warned that the 2021 approach to quality-assuring GCSE, AS and A2 grades could potentially issue in “Weimar Republic” levels of grade inflation.

It is instructive to give the broadest overview of how Ofqual and CCEA see moderation creating the circumstances whereby the 2021 grades will be “indistinguishable” from the grades issued by exam boards in past years.  Consider a GCSE geography teacher.  She receives training from the examining board designed to help her “internalise” the standards associated with each of the grades in GCSE geography.  For example, she might be given the completed examination scripts of several different students who were graded A* in geography, several scripts graded A, and so on, down through the grades.  The exam board officials might stress the key features of particular scripts that mark them out as meeting the standard represented by a particular grade.  By scrutinising the completed scripts of several students graded C, for example, the teacher might gain insights into the standard: “grade C in GCSE geography.” 

Our geography teacher then turns her attention to her own students’ “portfolios.”  A student’s portfolio might contain his or her responses to mock examinations, so-called “mini tests” provided by the exam board, class tests, project work, examples of outstanding homework, and so on.  The geography teacher then uses the scale of standards she internalised during training to assign a best-fit grade to each portfolio.  Finally, external moderation might involve her exam board selecting a representative sample of her students’ portfolios to establish if her grading decisions accord with the combined professional wisdom of the exam board moderators.

Ofqual are prepared to acknowledge that moderation has some limitations: “It is often the case that two trained markers could give slightly different marks for the same answer and that both marks would be legitimate.” However, carefully conducted research paints a more depressing picture.  In his book The schools we need, and why we don’t have them, E. D. Hirsch describes a study carried out by the world’s leading testing agency, the Educational Testing Service (ETS).  In the study, in which “300 student papers were graded by 53 graders (a total of 15,900 readings), more than one third of the papers received every possible grade. That is, 101 of the 300 papers received all nine grades. … 94% [of the papers] received either seven, eight or nine different grades; and no essay received less than five different grades from 53 readers” (pp. 183-185).

Since a particular essay cannot be simultaneously worthy of nine different grades, for example, one is forced to the conclusion that a well-defined intrinsic worth cannot be ascribed to a particular essay.  The essay’s worth can only be ascribed relative to a particular marker.  In summary, the grade is not a property of the essay; rather, it is a joint property of the essay and the particular marker.  To communicate unambiguously about the quality of the essay one must specify the measuring tool (in this case the marker).  The grade is best construed as a property of an interaction between essay and marker rather than an intrinsic property of the essay.

It is important to stress that moderation’s difficulties extend beyond disciplines where essay-type questions predominate; all examinations are affected.  In the text Fundamentals of Item Response Theory, Hambleton et al. address all tests: “examinee characteristics and test characteristics cannot be separated: each can be interpreted only in the context of the other … An examinee’s ability is defined only in terms of a particular test.  When the test is “hard,” the examinee will appear to have low ability; when the test is “easy,” the examinee will appear to have higher ability. … Whether an item is hard or easy depends on the ability of the examinees being measured, and the ability of the examinees depends on whether the test items are hard or easy!” (pp. 2-3)  One cannot meaningfully separate the notion of ability from the characteristics of the particular test used to measure that ability.

Now let’s return to our GCSE geography teacher.  During her exam board training she studied the responses given by students who achieved specified grades in particular GCSE geography examinations.  However, she must now use her judgement to assign grades to her students’ portfolios, a very different form of assessment.  Dr Mike Cresswell – a past Chief Executive of the exam board AQA – underlines “the need to accept that there is no external and objective reality underpinning the comparability of results from different examinations.”  This presents our geography teacher with an intractable problem: she must somehow gain insights into absolute standards which float free of any particular examination.  For example, the absolute standard “B grade in GCSE geography” makes no reference to a particular test.  Alas, in Cresswell’s words, “absolute examination standards are … a chimera.”

Is a teachers union tail now wagging the Education Minister dog on examinations?


, , , , , , , , ,

The National Association of Head Teachers (Northern Ireland) published a pdf outlining their proposals for the GCSE and A-Level examinations for 2021. Unsurprisingly the proposals sought to consolidate the aims and objectives of the teaching profession in Northern Ireland – a permanent transition to formative assessment instead of the conventional, objective summative assessment (commonly known to parents and wider society as examinations).

The Parental Alliance for Choice in Education carefully considered the NAHT (NI) document and contacted their president, Graham Gault, by email. A request for a reply met with no response. In the interim the minister of education, Peter Weir, is to make a statement on exam arrangements in the assembly on Tuesday 15th December, 2020.

A copy of the request to the NAHT (NI) for published evidence supporting their claim “that optionality in examinations should play no part in any mitigations as it firmly discriminates against lower ability students and many students have additional learning needs”

Speculation from the BBC reporter Robbie Meredith suggests that the number of exam papers a pupil will be asked to take in each subject will be reduced. In addition changes may include allowing schools some choice over the topics on which their pupils will be examined. Again all down to coronavirus.

A good reason for the minister, parents and indeed any unbiased media to be sceptical of claims made by the teaching unions is their propensity to ignore inconvenient facts. Ofqual published on December 3rd 2020 a briefing paper: Research findings relevant to the potential use of optionality in examinations in 2021.


Ofqual detailed that students sitting examinations in summer 2021 will have studied variable amounts of the curriculum. Some stakeholders have suggested that, to be fair to students, optionality should be introduced or expanded for 2021 examinations.

The minister is clearly on the horns of a dilemma will he direct CCEA’s Justin Edwards to follow the evidence provided by Ofqual on optionality or will allow a teachers’ union led by the Pritt Stick & toilet roll principal, Graham Gault to dictate unsupported and unsubstantiated orders to him?

What CCEA & the media don’t want the public to know about the predicted grade debacle


, , , , , , ,




questions to Justin Edwards CEO CCEA

Mr Justin Edwards, the Chief Executive Officer of CCEA, Northern Ireland’s Council for Curriculum, Examinations & Assessment issued a reply on May 26th, 2020. It should be noted that CCEA regulates itself. Ofqual do not regulate CCEA

The first thing to notice, given the specific nature of the questions, is the ambiguous language of the replies.

Given the very specific request made by Dr Morrison in question one  for CCEA to “identify a single peer-reviewed study”  which would confirm that ” teachers could predict rank orders within grades with any degree of accuracy” the official reply failed to produce any evidence.

Question two raised the issue of the standards conundrum.  The international literature highlights the fact that the predictions are only acceptable when teachers have access to the examination papers that the pupils would have taken.

Mr Edwards responds for CCEA without giving any reasoning.CCEA will not be releasing any planned summer 2020 examination papers.”


CCEA reply on 2020 exam questions

MLAs and their advisors could be using guesswork to justify decisions to lockdown businesses and close schools


, , , , , , , , , , , , , , , , , , , , , ,

The use of the R number in the Assembly’s release-of-evidence points to a profound misunderstanding of the limitations of that number.  In particular, R is not an additive variable; one cannot meaningfully add the contribution to R of hair salons to the contribution to R of pubs and then compare with 1.  This strategy makes no arithmetic sense. 

Unfortunately, the R number changes with the model used to measure it.   The Scientific Pandemic Influenza group on Modelling is a standing group that advises government on preparations to manage the risk of pandemics and keeps emerging evidence and research under review.  SPI-M use approximately ten different models to arrive at an R number for the UK.  This R number is calculated by attempting to somehow reconcile these differing values, each calculated with great uncertainty. 

How can our MLAs possibly justify basing decisions which impact on people’s livelihoods, on the tiny R-related percentages published in the Assembly’s evidence?  The uncertainty of R renders this unjustifiable.

These concerns about R are clear in the literature.  Guerra et al. (2017) could only locate the R number for measles somewhere between 3.7 and 203.3.

  Jing, Blakely and Smith (2011) published a paper entitled The Failure of R0, in which the authors conclude, “Rarely has an idea so erroneous enjoyed such popular appeal”.

Coming right up to date, The Royal Society’s report entitled Reproduction number (R) and growth rate (r) of the COVID -19 epidemic in the UK (on page 53) struggles to make the case for R: “Given the suggested wide bounds of uncertainty that surround estimates of R in particular … are they still of value in policy formulation?  The answer is definitely, yes … this is certainly a much better place to be in than just making a guess through verbal argument.”

Northern Ireland’s Health Minister and his two shields, the Chief Medical Officer and the Chief Scientific Advisor attempt to see off detractors by urging them to look at the “evidence in the round.”  I am confident that no amount of additional evidence produced by the Minister and his two advisors will see off the criticisms set out in this letter.

The warning letter on centre-based moderation sent to CCEA on 23rd June


, , , , , , , , , , , , , , , , , , , , , ,

Why Centre-based Moderation cannot work

Ofqual and CCEA intend to apply a “moderation” process to teacher-predicted grades in order to prevent, for example, teachers awarding inflated grades to their students.  This process – yet to be set out in detail – will focus on the Examination Centre in which each pupil would have taken his or her 2020 GCSE, AS or A2 examinations had Covid-19 not intervened.  To simplify matters, consider a Centre in which, for instance, 20% to 24% of pupils have secured B grades in CCEA AS Physics for the past three years.  Now suppose that in 2020 the physics teachers associated with that Centre return a B-grade prediction for 67% of their AS pupils.  Does a statistical technique, or AI algorithm, or mathematical model (possibly drawing on teachers’ predicted rank orders) exist which can defensibly adjust the predicted grades to bring them into line with the 20% to 24% range of the past?  Can one compute a defensible compromise position somewhere between 20% and 67%?  The answer is an emphatic No.

It is difficult to escape the conclusion that Ofqual and CCEA simply interpret grades as quantities which are countable and can be assigned to individual pupils.  But two of the most influential figures in UK assessment reject these claims.  The UK’s examining bodies and researchers in education have been, for years, treating grades as quantifiable entities.  This is because the awarding bodies have a very poor track record for in-depth thinking about the nature of the “measurements” in which they engage (see Wood[1] (1991)).

There are few individuals with Mike Cresswell’s understanding of the grading of UK examinations.  Cresswell’s definition[2] of a grade as representing “whatever standard of attainment it is judged by the awarders to represent” (p. 224), indicates that counting grades as one might count pencils is indefensible.  Let me be clear: I am not suggesting that the process for awarding grades needs to be abandoned.  I am simply making the point that grading is not governed by strict scientific principles and that adding or subtracting grades is mathematically impermissible.  As Cresswell’s definition makes clear, the grading process is a qualitative process rather than a quantitative one.

Now why is Cresswell forced to this vague qualitative definition of a grade?  The reason is that the awarding bodies, education researchers, and the general public think of the grade awarded to a given pupil as a measure of the ability of that pupil.  The awarding bodies think of a grade as a property of the particular pupil to whom it is awarded.  But this is wrong: a grade is not an intrinsic property of the pupil but rather a joint property of the pupil and the examination from which the grade derives.  In his 1996 book Assessment: Problems, Developments and Statistical Issues Harvey Goldstein[3] (a towering figure who contributed much to debates on statistical rigour in UK assessment) cautions: “[T]he object of measurement is expected to interact with the measurement in a way that may alter the state of the individual in a non-trivial fashion” (p. 54).

According to Goldstein, the examination does not merely “check up on” a pre-existing ability that the candidate had when he or she entered the examination hall.  This static model is rejected for a more dynamic alternative in which the pupil’s ability is expressed through his or her responses to the questions which make up the examination.  For Goldstein, ability changes as the candidate interacts with the examination questions: “Thus, on answering a sequence of test questions about quadratic equations the individual may well become “better” at solving them so that the attribute changes during the course of the test” (p. 54).  Cresswell’s grade is not an intrinsic property of the candidate; rather, it’s the property of an interaction.  Grades do not lend themselves to simple arithmetic manipulation and therefore quantitative procedures – such as simple regression or neural nets – are indefensible.

One can find unequivocal support for the claims of Cresswell and Goldstein in the writings of Niels Bohr and Ludwig Wittgenstein.  Also Joel Michell’s research[4] can be used to establish that grades do not satisfy the seven Hölder axioms[5] and therefore are not quantifiable.  There can be little doubt that the claims of Ofqual and CCEA that Centre-based statistical techniques, or algorithms, or mathematical modelling, can be used to moderate predicted grades are without foundation.

Dr Hugh Morrison (The Queen’s University of Belfast [retired])

[1] Wood, R. (1991).  Assessment and testing: A survey of research.  Cambridge: Cambridge University Press.

[2] Baird, J., Cresswell, M., & Newton, P. (2000).  Would the real gold standard please step forward? Research Papers in Education, 15(2), 213-229.

[3] Goldstein, H. (1996).  Statistical and psychometric models for assessment.  In H. Goldstein & T. Lewis (Eds.), Assessment: problems, developments and statistical issues (pp. 41-55).  Chichester: John Wiley & Sons.

[4] Michell, J. (1999).  Measurement in psychology.  Cambridge: Cambridge University Press.

[5] Michell, J., & Ernst, C. (1996).  The axioms of quantity and the theory of measurement, Part 1, An English translation of Hölder (1901), Part 1, Journal of Mathematical Psychology, 40, 235-52.

(1997), The axioms of quantity and the theory of measurement, Part II, An English translation of Hölder (1901), Part II, Journal of Mathematical Psychology, 41, 345-56.

Hölder, O. (1901).  Die axiome der quatität und die lehre vom mass,  Berichte der Sachsischen Gesellschaft der Wissenschaften, Mathematische-Physicke Klasse, 53, 1-64.

Lyra McKee

This is the sort of honest communication between the generations that has the potential to do more good than the wasted hundreds of millions spent on conflict resolution ever could.


I’ve waited some time before putting pen to paper.

The death of a young woman, not much older than my daughter, is hard.

Murder is harder still.

When I got the news, at two in the morning, I did not sleep again that night.

I was introduced to Lyra about five years ago.There could be be fewer similarities.

I met this small, owlish, slightly diffident girl, in a Victoria Square coffee shop. She met a grumpy old man , with issues and a background. She had many difficulties with technology, which we laughed about. She was softly spoken, and I’m slightly deaf.

I was hoping that she could introduce me to contacts that might progress my enquiries into the murders of my parents. This she did.

Despite the disparity in our ages and in our experience of the world, she dispensed sage advice about me and my predicament.  She was…

View original post 402 more words

Why AI will never expose the “mind’s inner workings”


, , , , ,

The text of a letter submitted to the New Scientist in reply to an article by Timothy Revell on a claim that mind-reading devices can access your thoughts and dreams using AI.

As usual there has been no acknowledgement, response or publication by the New Scientist

Timothy Revell’s article Thoughts Laid Bare (29 September, p. 28) illustrates a worrying tendency of AI enthusiasts to over-hype the capabilities of their algorithms. The article suggests that AI offers the possibility of the “ultimate privacy breach” by gaining access to “one of the only things we can keep to ourselves,” namely, “the thoughts in our heads.”

Niels Bohr counselled that the hallmark of science is not experiment or even quantification, but “unambiguous communication.” AI has much to learn from this great physicist. When one scans an individual’s brain one does not thereby gain any access whatsoever to that individual’s thoughts; brains are in the head while thoughts are not. The brain isn’t doing the thinking. As far back as 1877, G H Lewes cautioned: “It is the man and not the brain that thinks.” To quote Peter Hacker, what neuroscientists show us “is merely a computer-generated image of increased oxygenation in select areas of the brain” of the thinking individual. Needless to say, one cannot think without an appropriately functioning brain, but thinking is not located in the brain; no analysis of neural activity will give insights to thoughts because thinking is neither an activity of the mind or the brain.

In ascribing thoughts to the brain or the mind (rather than to the individual) AI falls prey to a fallacy that can be traced all the way back to Aristotle: the “mereological fallacy.”

Dr Hugh Morrison, The Queen’s University, Belfast (retired)


The incoherence of Professor Boaler’s “Visual Mathematics”


, , , , , , , , ,

Dr Hugh Morrison (The Queen’s University of Belfast [retired])


Professor Jo Boaler’s case for a new approach to teaching and learning in mathematics is an incoherent mix of dubious mathematical reasoning and neuroscience.  Boaler’s (2016, p. 1) claim that her “visual mathematics” approach satisfies “an urgent need for change in the way mathematics is offered to learners”[1] is outlined in her TEDx Stanford presentation entitled How you can be good at math, and other surprising facts about learning.  Her recent visit to Scotland confirms that her visual approach is now being urged upon that country’s teachers.  This short essay is designed to alert teachers everywhere to the dangers of replacing traditional approaches to pedagogy with Professor Boaler’s confused reasoning.

Boaler’s case for her visual mathematics is illustrated using a sequence of patterns each comprising a number of squares (see her TedxStanford talk for a very engaging outline of her analysis): the first pattern (n = 1) has four squares, the second (n = 2) has nine squares, the third (n = 3) has sixteen squares, and so on.  (The reader will, no doubt, recognise the three numbers 4, 9 and 16 as “square numbers” because 22 = 4, 32 = 9 and 42 = 16.)  The pupil is asked to continue the patterns in the same way and find the general rule of which these three patterns are instances.  According to Boaler’s TedxStanford presentation, the general rule which generates the number of squares (4, 9, 16, and so on) in the sequence of patterns is, needless to say: number of squares = (n + 1)2.  This isn’t difficult to verify.  Substituting

n = 1 in this rule gives 4, substituting n = 2 gives 9, and substituting n = 3 gives 16, and so on.

Every mathematics teacher in England, Wales and Northern Ireland with experience of GCSE mathematics coursework – now abandoned in the UK after decades of effort to promote and assess “discovery learning” – will recognise Professor Boaler’s illustrative example of visual mathematics as one of the GCSE “growing squares” tasks.  Indeed, one could be forgiven for thinking that Boaler’s visual mathematics is little more than her UK experience of discovery learning, with a pinch of neuroscience.  Once identified, the (n + 1)2 rule can then be used to continue the sequence of patterns onwards, yielding:

4, 9, 16, 25, 36, 49, …


Published in 1989




There can be little doubt that mathematical activities such as the “growing squares” task serve to enrich the mathematical experience of children by teaching the principles of problem-solving, and facilitating collaborative learning.  However, the case I want to advance in this essay is that it is nonsensical to argue, as Boaler does, that such activities can ever challenge established, “traditional” approaches to the teaching and learning of mathematics.  Traditional learning is always prior to discovery learning; without the framework laid down by the traditional teacher (the so-called “fiduciary” framework), discovery learning is impossible.  Boaler’s error is to have ignored Polanyi’s (1958, p. 266) warning: “No intelligence, however critical or original, can operate outside such a fiduciary framework.”[2]   Boaler’s visual mathematics can never replace the traditional approach to teaching and learning.

Michael Polanyi

Professor Boaler seems unaware of a problem first identified by the great mathematician Leibniz, namely, that a finite number of examples always underdetermines the rule which generates these examples.  Boaler focuses on the rule (n + 1)2 as the answer, but there is an infinity of such answers.  Anscombe (1985, pp. 342 – 343) presents the Leibniz argument using the even numbers: “[A]lthough an intelligence tester may suppose that there is only one possible continuation to the sequence 2, 4, 6, 8, …, mathematical and philosophical sophisticates know that an indefinite number of rules (even rules stated in terms of mathematical functions as conventional as polynomials) are compatible with any such finite initial segment.  So if the tester urges me to respond, after 2, 4, 6, 8, with the unique appropriate next number, the proper response is that no such unique number exists. … The intelligence tester has arbitrarily fixed on one answer as the correct one.”[3]

Godfried Liebniz

In her TEDx Stanford presentation, Boaler presents her pupils with three instances of a rule (the first pattern has 4 squares, the second has 9 squares, and the third has 16 squares) and implies that the brain (for Professor Boaler the brain is pivotal to appreciating how children learn mathematics) should, after a process of understanding, arrive at the conclusion that the rule is:

number of squares = (n + 1)2

However, there is an infinite number of alternative rules which begin with the numbers

4, 9, 16, but diverge thereafter.  These can be characterised as follows:

number of squares = (n + 1)2 + a (n – 1)(n – 2)(n – 3)

where a can take an infinite number of values.

For example, a = 0.5 generates the sequence: 4, 9, 16, 28, …

and a = 5 generates the sequence: 4, 9, 16, 55, …

and a = 12.5 generates the sequence: 4, 9, 16, 100, …

and a = 100 generates the sequence: 4, 9, 16, 625, …

and a = 2000 generates the sequence: 4, 9, 16, 12025, …

In all these cases, the pupil can protest that he or she went on in the same way.  It is tempting to suggest that the pupil’s way diverges from Professor Boaler’s because he or she had a different rule in mind, or should I say, in brain.  Indeed, what of the pupil who simply repeats the pattern, arriving at the sequence: 4, 9, 16, 4, 9, 16, 4, 9, 16, … ?  Hasn’t this pupil gone on in the same way as indicated by the three initial examples?

Dweck Brain image

Clearly, one rarely, if ever, comes across a pupil who would propose one of these alternatives to the (n + 1)2 rule in real classrooms, but Boaler’s thesis is that understanding is a rational process in the brain.  By what neural mechanism would the rational brain select one rule – Boaler’s (n + 1)2 rule – from an indefinite number of alternatives?  How, in a finite time period, does the brain brand all of these alternative rules (indefinite in number) as somehow incorrect, and settle on the (n + 1)2 rule as the correct rule?  What possible criterion does the brain use to distinguish correct from incorrect?


The role of the brain in mathematics is central to Boaler’s research.  It is tempting therefore to think that the “visual” learner, in understanding the problem, attaches an interpretation (something represented in the brain) to the three examples which constitute the statement of the problem, namely, a pattern of four squares, followed by a pattern of nine squares, followed by a pattern of sixteen squares.  If a pupil responds by suggesting that the next pattern is made up of 55 squares, for example, (the a = 5 sequence has 55 for its fourth term), Professor Boaler will treat this as a mistake (after all, the “correct” answer is 25).


But nothing in the statement of the problem rules out the answer “55” because it could be argued that the pupil has merely interpreted the statement of the problem in a way which is at odds with Professor Boaler.  Of course, the pupil’s interpretation accords perfectly with the three examples used in the statement of the problem.  The pupil has responded correctly to the instruction to continue the sequence of shapes in the same way.  What makes Professor Boaler’s interpretation correct and the pupil’s incorrect?  Indeed, any answer whatsoever to the question “how many squares are in the fourth pattern?” will be correct on some interpretation.  Wright (2001, p. 98) captures this intractable situation in the words: “Finite behaviour cannot constrain its interpretation to within uniqueness.”[4]  This is at the core of the case made by Leibniz.  It would seem that if understanding in mathematics is construed as an activity of the mind or brain, then the notions of a “correct” and an “incorrect” answer are rendered meaningless!

Liebniz notes 2

Did anyone at Professor Boaler’s TEDx Stanford talk, or at her Scotland talk, spot this profound error in her reasoning?  Thousands of papers and many hundreds of books have been written about Ludwig Wittgenstein’s resolution of what has been called the “rule-following paradox.”  Furthermore, Wittgenstein’s resolution is unlikely to be to Professor Boaler’s liking, for the solution emphasises the traditional classroom in which children are trained to adhere to established mathematical practices, and Wittgenstein makes no mention of the brain.  According to Wittgenstein, we are forced to conclude that children go to school to acquire a “framework” or “background,” which they grow to accept without question and within which they can be creative.  This framework constrains the pupil’s creativity in order that he or she can be understood by peers and teachers, but, it never determines the pupil’s subsequent response to any particular problem.

Wittgenstein Rule Following

If the object of analysis is the pupil treated as a separate individual with a brain/mind, divorced from the framework of mathematical practices that the pupil takes on trust from authority figures at school and beyond, one gets incoherent nonsense.  Understanding is not an activity, state or process of the brain or mind; understanding is a capacity.  This is the error at the heart of Boaler’s analysis: her model omits the framework of mathematical customs and practices which the pupil has come to accept (as common sense, one might say) through his or her training at school.  According to Scruton (1981, p. 291), “All attempts to understand the human mind in isolation from the social practices through which it finds expression”[5] are doomed to fail.
Ludwig Wittgenstein

In Zettel (§419) Wittgenstein cautions: “Any explanation has its foundations in training. (Educators ought to remember this).”[6]  Because she omits the all-important, long-established mathematical customs and practices in which the pupil participates in the traditional classroom, and treats the pupil as separately analysable, Boaler is forced to accept an indefinite number of different answers as the correct answer for the number of squares in the 4th pattern, for example, and she must conclude that it is meaningless for pupils to seek the correct rule.

It is instructive to identify the source of the incoherence in Boaler’s “visual mathematics.”  She has a confused grasp of the notion of understanding in general and in mathematics in particular.  Her 2016 paper with Chen, Williams and Montserrat has the title: Seeing as understanding: The importance of visual mathematics for our brain and learning.[7]  For Boaler, understanding consists in an inner state or inner process in the head, that state or process being the source of the pupil’s subsequent behaviour.  Nothing could be further from the truth: Rowlands (2003, p. 5) writes:

Thus, according to Wittgenstein, to … understand something by a sign is not to be the subject of an inner state or process.  Rather, it is to possess a capacity: the capacity to adjust one’s use of the sign to bring it into line with custom or practice.  And this connects … understanding with structures that are external to the subject.[8]

Note the absence of any mention of the brain in Wittgenstein’s resolution of the rule-following paradox.

This is why the vast majority of children respond to the “growing squares” problem with the answer: number of squares = (n + 1)2.  It is custom and practice in mathematics to respond in this way.  The conundrum identified by Leibniz (see above) is also resolved.  If understanding the even numbers is a matter of adjusting one’s behaviour to accord with the established mathematical practice in respect of the these numbers, then there is only one unique answer we ought to give, namely, “10.”  In §6.21 of his Remarks on the Foundations of Mathematics, Wittgenstein writes: “The application of the concept ‘following a rule’ presupposes a custom,”[9] and McGinn (1984, p. 39) defines custom as follows: “A custom, like a habit, is something that gets established, not through the deliverances of reason, but on the basis of what we might call a tradition.”[10]  Boaler et al. (2016, p. 5) must appreciate that if students were not “made to memorise math facts, and plough through worksheets of numbers,”[11]  in the traditional classroom, mathematical rules would not so much as exist.



[1] Boaler, J., Chen, L., Williams, C., & Cordero, M. (2016).  Seeing as understanding: The importance of visual mathematics for our brain and learning.  Journal of Applied & Computing Mathematics, 5(5), 1-6.

[2] Polanyi, M. (1958).  Personal knowledge.  Chicago: University of Chicago Press.

[3] Anscombe, G.E.M. (1958).  Wittgenstein on rules and private language.  Ethics, 95, 342-352.

[4] Wright, C. (2001).  Rails to infinity.  Cambridge, MA: Harvard University Press.

[5] Scruton, R. (1981).  A short history of modern philosophy.  London: Taylor and Francis.

[6] Wittgenstein, L. (1967).  Zettel.  Oxford: Blackwell

[7] Ibid.

[8] Rowlands, M. (2003).  Externalism.  Ithaca: McGill-Queen’s University Press.

[9] Wittgenstein, L. (1956).  Remarks on the foundations of mathematics.  Cambridge MA: MIT Press.

[10] McGinn, C. (1984).  Wittgenstein on meaning.  Oxford: Blackwell.

[11] Ibid.

Response to Professor Luckin’s TES (29.06.2018) article: “AI is coming: use it or lose it.”


, , , , , ,


Alan Turing 3

Alan Turing

 Dr Hugh Morrison (Queen’s University Belfast [retired])

Hilary Putnam 2

Hilary Putnam

Jerry Fodor 3

Jerry Fodor

Jerome Bruner2.png

Jerome Bruner

Given that Rose Luckin is professor of “learner-centred design” at UCL, one would expect that she has a strong appreciation of the meaning of the word “learning.”  This isn’t clear from her article.  Professor Luckin seems resigned to the fact that teachers must change and embrace a role for Artificial Intelligence in the classroom.  According to Luckin, this acceptance of AI will enable teachers to influence how its various products will be deployed in teaching and learning.  Professor Luckin’s sense of resignation is clear in the title of her piece: “AI is coming: use it or lose to it.”  The headline writer at the TES goes further, seeming to suggest that teachers should yield a substantial part of their current remit to machines: “When knowledge isn’t power.  Why teachers need to focus on the things machines can’t teach.”

Luckin TES June 18 AI

Alas, both Professor Luckin and the TES seem totally unaware that a “category error” lurks at the core of the AI project, a category error which should be deployed to protect the teaching profession from the impact of neural nets, deep learning and artificial intelligence.

Rose Luckin 3

Anyone familiar with the research of one of the giants of machine learning, the computer scientist Judea Pearl, will know that artificial intelligence, as currently conceived, has profound and intractable difficulties.  (Pearl describes AI as little more than curve-fitting.)  By way of illustration, consider a concept which should be close to the hearts of both Luckin and the TES, namely, “learning.”  If any profession can lay claim to expertise concerning the nature of learning, it is teachers.  From Professor Luckin’s TES article, I suspect she is unaware that AI suffers from a category error in respect of the concept “learning,” an error first identified by Aristotle, which goes by the name of the “mereological fallacy.”

Judea Pearl 2

Judea Pearl

Those computer scientists who work in the field of so-called “deep learning” claim to model the learning that occurs in the brain using extremely complex neural nets.  Look at any You Tube presentation in which an AI enthusiast lectures on the structures underpinning neural nets and you will likely hear the claim that learning and thinking are (neural) activities in the brain.  However, it transpires that it is nonsense to suggest that learning or thinking are processes located in the brain.

Popular science publications routinely refer to brains “learning”, “thinking”, “processing information,” “creating meaning,” “perceiving patterns” and so on.  Now where is the scientific evidence for these claims?  There are no laboratory demonstrations of brains learning or thinking.  Such activities are carried out by human beings, not their brains.  Needless to say, no one would dispute that without a functioning brain an individual couldn’t learn or think, but it does not follow that the individual’s brain is doing the thinking or learning.

While it is clear that learning would be impossible without a properly functioning brain, the claim that brains can learn or that learning takes place in the brain ought to be supported by scientific evidence.  There isn’t any.  To mistakenly attribute properties to the brain which are, in fact, properties of the human being is to fall prey to the “mereological fallacy” where mereology is concerned with part/whole relations.

To ascribe psychological predicates – such as “learn” and “think” – to the brain is simply nonsensical.  If the human brain could learn or think, “This would be astonishing, and we should want to hear more.  We should want to know what the evidence for this remarkable discovery was” (Bennett & Hacker, 2003, p. 71)[1].  “Psychological predicates are predicates that apply essentially to the whole animal, not its parts.  It is not the eye (let alone the brain) that sees, but we see with our eyes (and we do not see with our brains, although without a brain functioning normally in respect of the visual system, we would not see)” (Bennett & Hacker, 2003, pp. 72-73)[2].

“We know what it is for human beings to experience things, to see things, to know or believe things, to make decisions … But do we know what it is for a brain to see …for a brain to have experiences, to know or believe something?  Do we have any conception of what it would be like for a brain to make a decision? … These are all attributes of human beings.  Is it a new discovery that brains also engage in such human activities?” (Bennett & Hacker, 2003, p. 70)[3].

“It is our contention that this application of psychological predicates to the brain makes no sense.  It is not that as a matter of fact brains do not think, … rather, it makes no sense to ascribe such predicates or their negations to the brain. … just as sticks and stones are not awake, but they are not asleep either” (Bennett & Hacker, 2003, p. 72)[4].

If one casts one’s mind back through the many, many ill-conceived fads visited upon a long-suffering teaching profession, one may recall the “brain-based learning” movement.  Proponents of brain-based learning were constantly drawing the attention of mathematics teachers, for example, to the illuminated area of the brain devoted to the learning of mathematics.  A more careful, conservative approach which eschews hype would be to say that this area of the brain is “lit up” when the person learns mathematics.  Bennett & Hacker (2007, p. 143) demonstrate how careful science avoids the hype which characterises popular accounts of the functioning of the brain: “All his brain can show is what goes on there while he is thinking; all fMRI scanners can show is which parts of his brain are metabolizing more oxygen than others when the patient in the scanner is thinking.”[5]

Luckin proposes the following: “To ensure their place in the schools of future, educators need to move on from a knowledge-based curriculum that could soon become automatable through AI.”  Rather than urging yet further radical professional change on already innovation-fatigued teachers, she should be protecting schools from the over-hyped claims of the AI industry.  Luckin’s radical suggestion for the future of the teaching profession reveals a lamentable grasp of the fundamental concepts “learning” and “knowledge”: “It is not that the knowledge-based curriculum is wrong per se, the problem is that it is wrong for the 21st century.  Because now that we can build AI systems that can learn well-defined knowledge so effectively, it’s probably not very wise to continue to develop the human intelligence of our students to achieve this main goal,”

The key words in this quotation are: “we can now build AI systems that can learn well-defined knowledge.”  Surely the central aim of AI is to design machines which can “learn” and “know” in the same way as human beings learn and know?  I have already established that for human beings, learning is not an activity of the mind/brain.  What about Luckin’s claim that machines can have access to knowledge?  Wittgenstein teaches that “The grammar of the word ‘knows’ is … closely related to the word ‘understands’” (PI, §150)[6].  To know or understand is not to have access to inner states of the mind or brain; knowing and understanding are best thought of as capacities.  Rowlands (2003, p. 5) writes: “Thus, according to Wittgenstein, to … understand something by a sign is not to be the subject of an inner state or process.  Rather, it is to possess a capacity: the capacity to adjust one’s usage of the sign to bring it into line with custom or practice.  And this connects … understanding with structures that are external to the subject of this … understanding.”[7]

According to Wittgenstein, human knowledge is best construed as a capacity rather than an inner actuality.  An AI machine capable of knowing or understanding the concept “molecule,” say, as a human being does, would have to be capable of adjusting its use of the concept “molecule” so that it accords with the established use of that concept in physics, biology, and so on.  In short, a machine capable of non-collusively agreeing with the human practices which surround it!  Moreover, these human practices lie outside the computer.

I disagree with the headline on the front page of the TES; the invaluable mathematical knowledge I acquired from my teachers and lecturers allows me to confirm Judea Pearl’s claim that deep learning algorithms amount to little more than mathematical curve-fitting, and machines capable of knowing, thinking, learning and understanding are a fantasy.  My mathematical knowledge protects me from hype.  Pace the front page of the TES, knowledge is power.

David Sumpter Outnumbered

ISBN 978-1-4729-4741-3

The teaching profession would be well advised to give AI a wide berth.  AI research conducted at Cambridge and Stanford universities has been described as “incredibly ethically questionable” by Professor Alexander Todorov, who warns that “developments in artificial intelligence and machine learning have enabled scientific racism to enter a new era” (see The Guardian 07.07.18).  I will leave the last word to mathematician David Sumpter (2018, p. 226).  He reports on a Future of Life Institute meeting: “Despite the panel’s conviction that AI is on its way, my scepticism increased as I watched them talk.  I had spent the last year of my life dissecting algorithms used within the companies these guys lead and, from what I have seen, I simply couldn’t understand where they think this intelligence is going to come from.  I had found very little in the algorithms they are developing to suggest that human-like intelligence is on its way.  As far as I could see, this panel, consisting of the who’s-who of the tech industry, wasn’t taking the question seriously.  They were enjoying the speculation, but it wasn’t science.  It was pure entertainment.”[8]

[1] Bennett, M.R., & Hacker, P.M.S. (2003).  Philosophical foundations of neuroscience.  Oxford: Blackwell Publishing.

[2] Ibid.

[3] Ibid.

[4] Ibid.

[5] Bennett, M.R. & Hacker, P.M.S. (2007).  The conceptual presuppositions of cognitive neuroscience.  In M.R. Bennett, D. Dennett, P.M.S. Hacker, & J. Searle, Neuroscience and philosophy (pp. 127-162).  New York: Columbia University Press.

[6] Wittgenstein, L. (1953).  Philosophical investigations.  G.E.M. Anscombe, & R. Rhees (Eds.), G.E.M. Anscombe (Tr.).  Oxford: Blackwell.

[7] Rowlands, M. (2003).  Externalism.  Ithaca: McGill-Queen’s University Press.

[8] Sumpter, D. (2018).  Outnumbered.  London: Bloomsbury Sigma.