Tags
AQA, Belfast Live, Belfast Newsletter, Belfast Telegraph, Ben Lowry, CCEA, Dr Hugh Morrison, E.D. Hirsch, Educational Testing Service, ETS, Gavin Williamson, GCSE, GCSE changes, GCSE standards, Mike Cresswell, Northern Ireland Assembly Education Committee, Ofqual, Peter Weir MLA, Ronald K Hambleton, Standards Advisory Group
Dr Hugh G Morrison (The Queen’s University of Belfast [retired])
In 2020 the impact of Covid-19 prompted the Conservative government to replace public examinations (GCSE, AS and A2) by a process involving teacher-predicted grades. In this approach, an algorithm was used to keep teacher-predicted grades in check. This combination of teacher-predicted grades and algorithm – endorsed by Ofqual and CCEA – was attacked from all quarters in what became known as the 2020 “grades fiasco.” In Summer 2021, while teacher-assessed grades are to continue to play a pivotal role, the algorithm is to be replaced by another form of quality assurance: teacher moderation.
Ofqual hopes that the 2021 marriage of predicted grades and moderation will deliver grades which are “credible and meaningful.” Ofqual also asserts that the moderation training provided by the exam boards will help teachers “objectively and consistently assess their students’ performance” so that “the [2021] grades will be indistinguishable from grades issued by exam boards in other years.”
Anyone who has engaged in GCSE coursework moderation will be aware of its shortcomings, and one member of Ofqual’s Standards Advisory Group has warned that the 2021 approach to quality-assuring GCSE, AS and A2 grades could potentially issue in “Weimar Republic” levels of grade inflation.
It is instructive to give the broadest overview of how Ofqual and CCEA see moderation creating the circumstances whereby the 2021 grades will be “indistinguishable” from the grades issued by exam boards in past years. Consider a GCSE geography teacher. She receives training from the examining board designed to help her “internalise” the standards associated with each of the grades in GCSE geography. For example, she might be given the completed examination scripts of several different students who were graded A* in geography, several scripts graded A, and so on, down through the grades. The exam board officials might stress the key features of particular scripts that mark them out as meeting the standard represented by a particular grade. By scrutinising the completed scripts of several students graded C, for example, the teacher might gain insights into the standard: “grade C in GCSE geography.”
Our geography teacher then turns her attention to her own students’ “portfolios.” A student’s portfolio might contain his or her responses to mock examinations, so-called “mini tests” provided by the exam board, class tests, project work, examples of outstanding homework, and so on. The geography teacher then uses the scale of standards she internalised during training to assign a best-fit grade to each portfolio. Finally, external moderation might involve her exam board selecting a representative sample of her students’ portfolios to establish if her grading decisions accord with the combined professional wisdom of the exam board moderators.
Ofqual are prepared to acknowledge that moderation has some limitations: “It is often the case that two trained markers could give slightly different marks for the same answer and that both marks would be legitimate.” However, carefully conducted research paints a more depressing picture. In his book The schools we need, and why we don’t have them, E. D. Hirsch describes a study carried out by the world’s leading testing agency, the Educational Testing Service (ETS). In the study, in which “300 student papers were graded by 53 graders (a total of 15,900 readings), more than one third of the papers received every possible grade. That is, 101 of the 300 papers received all nine grades. … 94% [of the papers] received either seven, eight or nine different grades; and no essay received less than five different grades from 53 readers” (pp. 183-185).
Since a particular essay cannot be simultaneously worthy of nine different grades, for example, one is forced to the conclusion that a well-defined intrinsic worth cannot be ascribed to a particular essay. The essay’s worth can only be ascribed relative to a particular marker. In summary, the grade is not a property of the essay; rather, it is a joint property of the essay and the particular marker. To communicate unambiguously about the quality of the essay one must specify the measuring tool (in this case the marker). The grade is best construed as a property of an interaction between essay and marker rather than an intrinsic property of the essay.
It is important to stress that moderation’s difficulties extend beyond disciplines where essay-type questions predominate; all examinations are affected. In the text Fundamentals of Item Response Theory, Hambleton et al. address all tests: “examinee characteristics and test characteristics cannot be separated: each can be interpreted only in the context of the other … An examinee’s ability is defined only in terms of a particular test. When the test is “hard,” the examinee will appear to have low ability; when the test is “easy,” the examinee will appear to have higher ability. … Whether an item is hard or easy depends on the ability of the examinees being measured, and the ability of the examinees depends on whether the test items are hard or easy!” (pp. 2-3) One cannot meaningfully separate the notion of ability from the characteristics of the particular test used to measure that ability.
Now let’s return to our GCSE geography teacher. During her exam board training she studied the responses given by students who achieved specified grades in particular GCSE geography examinations. However, she must now use her judgement to assign grades to her students’ portfolios, a very different form of assessment. Dr Mike Cresswell – a past Chief Executive of the exam board AQA – underlines “the need to accept that there is no external and objective reality underpinning the comparability of results from different examinations.” This presents our geography teacher with an intractable problem: she must somehow gain insights into absolute standards which float free of any particular examination. For example, the absolute standard “B grade in GCSE geography” makes no reference to a particular test. Alas, in Cresswell’s words, “absolute examination standards are … a chimera.”