Moderating with Moderation - BooBook Education

November 11, 2024 by Doug McCurry from BooBook Education

Moderating with Moderation

What is under the school’s control?

It was disturbing to read the concerns about the recent VATE Crafting/Creating Texts Survey Report.

Great concerns were expressed about the increased workload of the new AoS that seemed to drown out the celebration of this change one would have expected.

There is an increase in the number of SACs from one to two (with a reflective comment) for the new AoS, but it should be noted that the amount of teacher and student work involved is work largely up to the school. The terms and tenor of this SAC can be made bigger or smaller by the school.

The conditions of the writing and the number of drafts (if any) for the two Section B tasks is under the control of the school. Replacing two texts as the examined substance of Section B with four brief mentor texts in the new AoS is certainly a reduction of work. The mentor texts can be dealt with very expeditiously, unlike the attention that the texts of the old Section B required in preparation for the competition of the exam. And schools have the ability to magnify or mute the intensity of workload and the competition on the SACs. While in SACs students are competing with each other, they are not competing with the rest of the state.

Assessment moderation procedures are the fundamental issue

When one looks at the concerns expressed by survey respondents in the report it seems that assessment moderation procedures are the fundamental issue rather than the AoS itself. The implicit sub-text of the report is about the onerousness of within school moderation processes. Given these concerns, I thought I would have a look at the VCE within-school moderation requirements for English.

Concerns were suggested in the review question about school support for moderation. The discussion of the 280 responses to this issue opens with this crucial insight.

'There was a wide variation in both the processes that schools adopted and the provision of supports to teachers. Some approaches to moderation and marking taken in schools were far more arduous than others, and some schools were more generous in the ways they supported the English staff to complete the work, particularly in the provision of shared time for moderation.'

The issue of support given by a particular school is a matter of within school negotiation. The issue of variation in moderation processes is also within the control of the school.

I recall that in the VISE courses before the VCE courses statistical moderation against the exam was applied at class rather than school level. The change to moderation at school level is a significant change because it puts a large responsibility for equity for individual student on the schools before statistical moderation is applied by the VCAA. In the previous system there was (more or less) no need for school moderation processes as such. The ranking of students within a class was what the authority statistically moderated rather than a ranking for the whole school.

As schools now are required to present a rank order for a whole school rather than a class, schools have to have processes for integrating the assessments of different classes. It seems that this integration is very onerous in some schools.

VCAA moderation requirements and advice

I wondered what the VCAA moderation requirements are and how schools might fulfill them in a fair and cost-effective fashion. There seem to be no formal VCAA requirements for within-school moderation although there are guidelines. The approaches offered in VCAA guidelines vary from the light touch of a group process for determining standards and benchmark scripts to the heavy hand of blind double marking of all pieces of work.

Schools can choose between these more or less onerous between-class moderation processes. How onerous the processes need to be depends (or should depend) on the group of teachers involved. How much moderation is needed to get a satisfactory degree of consensus amongst a group of teachers and their classes is determined in a school.

In the VCAA Guidelines for Scored School Assessment three approaches to moderation are presented:

Approach 1 is blind marking of a sample of other classes after teacher marking.
Approach 2 is second marking of all scripts.
Approach 3 common marking of a benchmark sample and then teacher marking.

The third approach seems the simplest and least time-consuming process. Marking a common sample of scripts is a way of initially exploring criteria and standards by the team, and then individual teachers mark their own students.

Is blind second-marking the gold standard?

It is worth reviewing these different processes to see what might be necessary and cost effective.

The aim of within school moderation is to mitigate potential differences between the marking of different teachers. Teachers can have various expectations about the quality of work students can and should attain. That some teachers are harder and some are easier markers is to be expected, and these differences are to be monitored and mitigated by moderation.

In the current system a faculty needs to try to ensure that there is a satisfactory degree of comparability between the assessments of different teachers. What processes are needed to get a satisfactory degree of comparability depends on the membership and the culture of a faculty. Making comparable assessment in a faculty where all English teachers are in one staffroom, know each other well and have mostly worked together over a number of years is obviously different from a large faculty with people who have not worked much together, and there is a regular churn of teachers coming and going.

It might be thought that blind second-marking as used in external marking is the ‘gold standard’ of fairness. But this assumption is open to question for school-based assessment.

The teachers who knows the student and has an overview of the assessment process that produces a piece of work has experience and insight that is not available to the blind marker.

I would be reluctant use blind second marking as a method of ensuring comparability in school-based assessment. As well as being time consuming, blind second-marking can cause a range of complications. For instance, what is a discrepancy and how are discrepancies resolved in such situation?

Options for discrepancy marking: blind marking vs check marking

While the VCAA uses blind third-making to reconcile unresolved discrepancies, discrepancy marking can be done with processes other than blind marking. What is called in some regimes ‘check marking’ is not a necessarily blind marking. In check marking, a second (or third marker in discrepant cases) can review the marks of other markers. This can be done after blind marking or it can be done with the first mark in mind. Such cognisant review of marks or discrepancies is a more cost-effective process than blind double marking.

In such a process a reviewer can skim papers and fairly confirm marks without the pondering involved in blind second marking. As well as cost effectiveness, cognisant second marking is a superior process because it has the advantage of opening up possibilities for pattern identification and meaningful feedback to markers. In situation where there is significant concern about the comparability of the standards used by different teachers, I would be inclined to use cognisant rather than blind second marking.

Mark distribution as a check on comparability between classes

It should be noted that in the VCAA marking processes, discrepancies are not the only way the marking of individuals is monitored. The distribution of marks given by individuals is also monitored.

It would seem to me that monitoring the distribution of marks produced by individuals is a important and cost-effective measure for monitoring comparability that should also be used routinely by schools.

The distribution of marks given by individuals should be monitored because discrepancy rates alone is not an adequate basis for reviewing marking. There are markers who have what is called a ‘strong central tendency’ in the statistical jargon which means they have few discrepancies because they produce a narrow range of marks.

Quantitative expectations and quantitative outcomes

This raises the issue of monitoring the spread of marks produced by individuals that is not mentioned in the VCAA procedures. In external marking markers are ideally monitored in terms of discrepancy rates and in terms of distribution or spread of marks.

The VCAA approaches do discuss groups of teachers establishing common understandings of criteria and levels of performance, but nothing is mentioned about quantitative expectations and the monitoring of quantitative outcomes. Just as the VCAA does such monitoring in their external marking, so it is beneficial and cost effective for faculties to set quantitative expectations of marking and monitor the quantitative outcomes of marking.

It is not satisfactory to monitor one’s own or other teacher’s marking on discrepancy rates alone.

And it is reasonable for discussion of standards for marking to involve discussion of expectations in terms of spread of marks to be produced. In marking, individual teachers should be thinking in terms of the spread of marks expected for this group of students on this task.

In my view the background to a piece of work to be assessed is part of setting expectations for marking.

Who are the candidates?
What are they asked to do?
What information were they given about the task before hand?
In what conditions did they produce the work?

These contextual features should shape one’s judgement about a piece or a number of pieces of student work.

It is most appropriate to have normative expectations in mass external marking regimes. The assumption of a VCAA marker is that they are marking a representative sample of the population.

This of course is not the case in school marking.

In external marking markers know (it is in the Examiner’s Report) the percentage of students given each mark. There is a problem with setting numerical targets for within school marking in that class groups may differ significantly in levels of ability, so one can’t necessarily expect a similar range of scores from different class groups.

Just as a faculty should be monitoring the range of marks produced by individuals and the group as a whole, so a faculty should be monitoring the constitution of different class groups and assessing whether different results for different groups are to be expected and are intelligible.

Teachers are always thinking and saying that this group is better or worse than the group they had last year of the year before. Or that 12 Purple is much stronger than 12 Orange.

These perceptions may or may not be well grounded, and they are as fallible as everything else in the art of teaching.

It seems that some methods of school timetabling can lead to significant differences in the overall ability of different class groups. Monitoring this issue would be of value to faculties in setting expectations and reviewing outcomes for different class groups.

I understand that exam and GAT feedback of results can give retrospective information about the relative levels of ability of different class groups in a school. It would be reasonable for faculties to want to know if some classes are of higher or lower ability than others in the external exam and the GAT. Monitoring these similarities and differences will help a faculty set expectations about the relative performance of different classes.

It would seem to me that it is fair and cost effective for faculties to discuss criteria and levels of performance and to agree on benchmark examples of different levels. It would also seem reasonable to discuss the range of marks expected from class groups (in the light of past results), and to monitor the range of marks for each group before they are put into a single rank order.

Sample remarking or check marking could be added to these procedures. It does not seem necessary to me to have arduous complications of blind double-marking to attain comparability between the marking of different teachers in a school.

Qualitative judgements about this or that script in a bunch is a limited kind of review.

Given its intention, it is curious that the VCAA does not suggest the possibility of reviewing the range of marks produced by individual teachers. The overall distribution of a set of scores in comparison with other scores is a meaningful basis for reviewing the marking of particular teachers.

It is reasonable to monitor the range and spread of marks produces by individual teachers.

Substantial differences in the range and levels of scores produced by different teachers should be a matter of review. When teachers are putting their scores together on a common scale, it is reasonable that each teacher knows the range of results they produced. Any differences between the marks produced by different teachers should be intelligible to the team or faculty leader.

It would be reasonable, fair and efficient for the set of scores produced by individual teachers be reviewed in relation to the set of scores produced by other teachers.

It cannot be assumed of course that the range of scores produced by different teachers will be more or less the same because they are assessing smallish groups. There could be well variation from between smallish group in the same school and there may well be variation from year to year.

But such variation should be monitored.

Timetable blocks sometimes means that there are variations in the makeup of English classes. (You might have the science group and I might have the commerce group.) There is qualitative and quantitative information in a school about the structure of different class groups.

The GAT produces statistics about each class group each year, and these statistics can be used from year to year (assuming the structuring of the timetable is the same) to setup expectations about the relative standing of different classes. Are there patterns over time about different performances between the different groups formed as a result of the blocking?

This data can inform expectations about the performance of different classes. It might suggest that a difference in mean (although not standard deviation) would be expected between a science and a commerce group. The point of this review would be to monitor the scores of different teachers.

This is both a legitimate and cost effective way to encouraging comparability of scoring between different teachers in the same school.

VCAA implies that blind second marking is best practice in moderation, but such a suggestion should not be taken as invalidating other forms of check marking. Some marking regimes use check marking in which a second marker reviews the scores of a first marker or a third marker reviews the marks of a pair of first and second markers

There is a substantial difference in time and stress needed to check mark in comparison with blind marking. Such check marking can be a much more cost-effective means of mark review than blind marking.

More Articles ...