16.7 Peer Evaluations

Peer Evaluations SoT-16-7-PeerEvaluations

This is perhaps our least preferred method we see people use.
Superficially, it may look like it is ok and working. But, is it really?
There are situations where the method can work, but these are rare.

I love you, you love me… everyone join in… we love us, us loves us…

Peer evaluations are interesting. Sounds like a great idea. Have students evaluate each other. These types of evaluations can in theory provide an instructor intel on who did what on a team and how marks should be distributed. They can be used as teaching moments for the students; learning how to evaluate. In some cases, having students evaluate each other saves the instructor time and effort (this is not a good reason).

There are times that peer evaluations can in fact work and work well. In our opinion, these are likely to be rare, very rare occurences. There are probably more times that it is not working as intended and the instructor might not be aware that it is not working! They can easily give the appearance of working when they are not. It is very easy to believe that they are! There are many assumptions and ‘necessary and sufficient’ conditions to think about. As with many other topics, there is no right way, but many possible mistakes to make.

Some of the issues are..

There is a field of research often called Social Comparison Theory. It addresses how peers view each other, the outperformers, the non-conformists, the baseline for the 'tribe' membership and how individuals are compared within the social structure/ecosystem. This includes issues such as sex, race, age, culture, economic standing, gender identification, etc. In some cases, the bias is explicit and can be considered malicious and in other cases, the individual might be clueless and not be aware of the bias. We suggest that before you consider a simple peer evaluation schema at the student level, you take some time and read up on social comparision theory. If you do, we believe that you will not consider using it. Lots of bias possible and it is not possible to control for it.
The outperformer bias, highlighted in the Social Comparison body of research, penalizes anyone who is capable of notable, superior performance. People like to compare people who are in the same level of skill and expertise, sometimes resulting in the I love you, you love me; we are all doing ok; and no one can do better mentality. Individuals rarely like to be compared to and reminded of others who can do what they cannot do. The tall tree bears the wind. No one is a prophet in their home town. The nail that stands out is hammered down. And so on. The lobster able to escape the pot is pulled back by the other lobsters. We have also heard about cases where individuals are told ‘do not work so hard, you are showing us up’. Not everyone has an outperformer bias, but it is a very common problem.
There is a nonconformity bias as well. It is about what is done, why it is done, and how it is done. Other biases can be somewhat akin to nonconformity (e.g., different gender), but the nonconformity bias will hurt those who walk to a different drummer, think differently, approach things differently from the tribe. For example, if one student is striving for high quality and the rest for good-enough, the one outlier might not be looked upon favorably.

Students might have no training in how to evaluate something. Learning how to evaluate something is something that requires training, feedback, and practice. It is a cognitive skill and is indeed at the top of the cognitive complexity ladder. Of course, people evaluate everything and have been evaluating soon after birth, but that does not mean that the evaluation process is a quality one, or that because someone can evaluate x, they can evaluate y.
Have they been taught how to choose comparators? Baselines? Do they understand what the criteria is and what the definitions are? What are the expectations? Are they knowledgeable enough and skilled enough to really evaluate another learner at the same level? Can they pick up on what has been missed or misunderstood or misapplied? Really?
Have they been taught how to avoid the biases and noise generators noted by Kahneman et al. (2021)?
It might be assumed that evaluations are anonymous to prevent unwanted dynamics and retribution, but it is not wise to assume this or bet on it. Students will compare and talk and it is unlikely that an unfavorable, blunt review will not be figured out. If the students do not care about such things (e.g., are not in a cohort system where they will be with the other students for many years, or might always have the options of choosing their own teams), this might not be a problem.
We have heard students colluding to balance out the reviews to make sure that everyone does well at the end of the day. This is more likely in a cohort situation.
The students will need a reasonable level of maturity and some sense of baselines and what ‘average’ might mean. This suggests that peer evaluations might be appropriate for more senior courses and not junior ones.
If the students are being asked to critique team members in a cohort or where the students will see other in a subsequent setting, this might create a bias. Peer evaluation might work better in senior courses and where students are not in the same cohort or network. There are lots of long term tribal dynamics at play when a member of the tribe starts to evaluate another member and this dynamic cannot be ignored or wished out of existence. There is and always be an undercurrent.
There might be cases where the peer evaluations do work out ok, but this is probably a random draw of the cards, and it should not be assumed that peer evaluations are always fair, unbiased, and are appropriately 'critical'.
Being able to evaluate at a quality and professional level implies that the evaluator ‘knows more’ about the subject than the person submitting the work for evaluation. The evaluator needs to know not only what is submitted, but what could have been submitted and have suitable baselines and expectations. They need to know if the assumptions are valid and if the persons actually comprehended the material, were able to apply it, do the proper analysis, and problem solving. Else, it is simply an activity to say that an activity took place; has little or no real value.
Still considering 'simple' peer evaluations? Please read some of the literature on Social Comparison Theory. We strongly recommend against it without putting in the necessary quality mechanisms. You will get comparisons. But, are they good comparisons and can they be trusted to be fair and equitable?
Have the evaluations done in real time to avoid collusion and debate; they need to be independent. You can help the process by instructing the students on the appropriate baselines and comparators, and how to use relative rankings.

Some of the things to think about are...

If the intent is for the students to learn how to evaluate, etc., there are ways to do it. We have asked students to prepare rubrics, swap rubrics, use the rubrics on each other’s work, but this never affects the owner’s grade. The students are marked on the rubric design, ability to be used, how it is used, and how they grade. This process is staged with the students handing in the rubric one week, several days later getting a rubric and something to assess, and part of the assessment is to develop a rubric for a good rubric and assess the rubric they were given.
If the intent is to figure out who did what and who should get what grade, there are ways to do this with or without peer evaluations. If you are going to use peer evaluations to help you grade, the student evaluations should be based on evidence and be justified and rationalized; not based on opinions and memory.
Appropriate individual and team logs can assist with this; who did what when, how long, etc. The instructor can correlate the student comments with their own separate process and sniff out the quality of the evaluation.
If the students know that the logs are taken seriously, are created weekly and uploaded (approx. ¼ page is enough for individual, 1 page for team), occasionally reviewed, mapped onto the project deliverables during the term, and that it is not possible to fake them at the end, the students will take them seriously. If the instructor does not use them or take them seriously, neither will the students.
The students will need feedback on the logs and the clarity; that is, it must be obvious who did what and how long they took. Each individual’s contribution must be clear. This is no different from what the students will have to do upon graduation in many professions and it is good practice for them. Many organizations do project-based accounting and there are many situations where there is legal requirement with respect to who did what when and for how long.
It should be clear who is accountable for what and who did what. Teams should not get away with ‘everyone worked on x equally’, ‘everyone worked on y equally’; make it clear that this is not very realistic and not likely. Tell them that there must be clear task assignments and accountability. This will help them learn project management and task planning.
With teams, it is also important to deal with team conflicts (see separate note) and not let the conflicts affect evaluations; which is why evidence, logs, etc. are necessary.
Benchmarks and baseline comparatives might be necessary to create a consistent base for the peer evaluations.
Ask the students to document their evaluation process and explain how they used appropriate methods to ascertain their assessment. The students should not be using popularity, most recent argument, etc. to bias their assessments.

Do not assume that the students will not play the system and know how to game it. Without appropriate controls, peer evaluation might look like it is useful and working, but is a mere sham.
If the students are in a cohort model, the peer pressure and the students' relationships with each other can bias things in a subtle way.

Some of what we have done in the past..

Ask the students to prepare a rubric for the assignment or project and justify the design. This gets handed in and gets marked against a rubric; e.g., what makes a rubric good?
Rotate the assignments and rubrics. That is, a team (A) has to use a rubric from team (B) to critique team (C)’s project. They have to create a rubric for ‘what is a good rubric’ and evaluate the rubric given in the context of the project given. Yes, the students still evaluating another team’s work, but it is a small piece of work (e.g., the rubric) and is not on the team’s term project.
They get the benefit of seeing another group’s rubric and justification.
The teams are marked on their critique and evaluation of the rubric.
The teams are asked what they would change about their rubric in light of the exercise and this is also marked. It is possible to ask them to create response to the marking team and discuss the pros and cons of the rubric used and way the rubric was used. If sufficient time exists, the rotation and evaluation can be done again.
The basic marking focuses on what is learned about creating rubrics and using them to evaluate something like a project, or a process. What do they need to know to do a fair and accurate evaluation and what do they need to know about how to apply a rubric; the actual evaluating.
Debriefings and dialogs are held with the class to discuss how to evaluate. The issues. The biases. Etc.
If you teach the students how to do a peer evaluation, triangulate the data, and the Social Comparison issues are controlled for, it can work. In a cohort setting, it can accidentally create stress and problems.

16.7 Peer Evaluations

I love you, you love me… everyone join in… we love us, us loves us…

Some of the issues are..

Some of the things to think about are...

Some of what we have done in the past..

Further reading