Stanford studies show benefits of forgoing traditional grading during pandemic

New research examines the impact of pass/fail and other assessment policies in 2020–21 — and the challenges they face to becoming standard practice.

Monique Harrison presenting her research over Zoom — Monique Harrison presents her research over Zoom in 2022.

By Rachel Burstein

After “Western University” temporarily switched to universal pass/fail grading in the early phase of the pandemic, more women across the university enrolled in their first math course and in a smaller sample there was evidence of an uptick in enrollment by first generation college students, and Black, Latinx, and Native American students.

Monique Harrison, who recently received her doctorate from Stanford Graduate School of Education, identified this trend in her dissertation, and presented her findings in the spring at the 2022 Pandemic Pedagogy Research Symposium (PPRS), organized by Duke Learning Innovation and co-sponsored by Stanford Digital Education. Her talk was one of a dozen or so at the event, including one from the Stanford Department of Computer Science, that examined how innovative pedagogies and assessments during the transition to emergency remote education in spring 2020 may have helped to make learning more student-centered, inclusive and equitable.

“The halt to in-person classes when COVID-19 struck led many educators to question the conventional wisdom in how we assess learning,” said Matthew Rascoff, Stanford vice provost for digital education who helped to found the symposium. “We need to understand the educational impact of alternative assessments developed during the pandemic, as Dr. Harrison and others are doing, so we can invest further in promising practices and continue them into the future.”

Although Harrison’s data comes from only one school, which she identified by the pseudonym “Western University,” her study underscores why similar enrollment changes are likely to have occurred at other schools that enacted pass/fail grading. Her interviews with students showed that the pass/fail mandate at Western was a major reason why they chose to take their first math class. Students in most introductory STEM courses are usually awarded grades in relation to one another, “on a curve.” In the pass/fail system, when these students were not being graded on a curve, they reported being less apprehensive, Harrison said.

“Especially in the first year, students have this question in their minds: ‘Where am I going to stand in comparison to my peers?'” Harrison explained. “That's where curved grading can really put particular groups of students in a worse position, psychologically.”

Spring: First math enrollments by gender

A bar graph showing how enrollment by female students in their first math course increased in 2019-20.

This graph shows spring first math enrollments over time by gender that Monique Harrison PhD ’22 found in her research at “Western University.” In the spring of the 2016-17 school year, enrollment of women in first math classes was somewhat higher than enrollment of men. In the spring of the 2017-18 school year, enrollment of women and men in first math classes was approximately equal. In the spring of the 2018-19 school year, enrollment of men in first math classes was slightly higher than enrollment of women. In the spring of the 2019-20 school year, overall enrollment in first-year math classes rose from prior years. Male enrollment was well above the 2016-17 and 2017-18 spring figures, and slightly below the 2018-19 spring level. Female enrollment rose from a previous high of around 75 students in 2018-19 to a new high of over 100 students in the spring of 2019-20.

Assessment innovations in the Computer Science Department

While Harrison’s work focused primarily on the pass/fail policy’s impact on enrollment, faculty and students in the computer science department examined how course design was affected by Stanford’s satisfactory/no credit grading policy for undergraduates and the university’s subsequent decision to eliminate final exams during remote learning.

Both of those changes paved the way for some instructors to design assessments that gauged student learning and mastery, independent of the performance of other students in the course. In the mastery learning model, students must demonstrate proficiency of material before advancing to new material.

Cynthia Lee, a lecturer in the computer science department, explained how she seized the opportunity to put aside the grading curve: “I’ve always wanted to experiment with some form of mastery learning, and [the remote learning period] was my chance.”

At the PPRS event, a study from researchers in the computer science department and Graduate School of Education examined new approaches to assessment that Lee and some of her colleagues implemented across six core computer science classes from spring 2020 through spring 2021, during remote learning.

Across these courses, the Stanford CS Pedagogy Project team observed three types of assessment innovations: frequent, low-stakes quizzes; alternatives to final exams, including multiple smaller exams, oral interviews, and projects; and the use of revise-and-resubmit policies in which students could improve their grades on each iteration.

Pandemic innovations in assessment we observed

A diagram showing the assessment innovations that John Mitchell and Jenny Han tracked across in six core computer science courses, spring 2020 through spring 2021.

A diagram from Stanford CS Pedagogy Project shows the assessment innovations in six core computer science courses during the year of remote learning (spring 2020 through spring 2021). Two courses, CS109 and CS110, included concept checks in the form of weekly lecture low-stakes quizzes. Six courses rethought final exams: CS161, CS106B, CS107, CS103, and CS109 used multiple smaller exams instead of a midterm and final; CS106B used oral interviews; and CS110 and CS106B used a project or homework instead of an exam. Two courses altered revise and resubmit policies — CS110 for homework and CS103 for exams.

Since the changes went into effect, the research team has interviewed faculty and students about the different approach to assessments. According to their findings, the faculty who experimented with the new approaches generally found them effective in encouraging students to concentrate on learning the course material, and feeling confident in their abilities, rather than focusing on the grade they might receive or comparing themselves with other students.

Lisa Yan, a lecturer in the computer science department, told the researchers that concept checks “provided an incentive for students to follow along [and] to feel confident that they understand the material before starting the assignment.”

Jenny Han, a graduate student member of the research team, said that students in the six core classes also appreciated the new assessment approaches. Students said that the innovations made learning more “iterative” and offered more opportunities to make the material stick.

Some students in the courses that the researchers examined said that low stakes concept checks and revise-and-resubmit policies made them feel less rushed or pressured to perform well. Students were less anxious. And alternative assessments such as projects or oral interviews gave students an opportunity to apply their knowledge to real-world situations.

A student in CS109 reflected in a survey that concept checks and revise-and-resubmit policies “took the pressure off having to understand the material perfectly before being able to ask questions in office hours.” Another student in CS103 explained that revise-and-resubmit prevented “settling for ‘good enough,’” and instead promoted deeper engagement and learning.

But the innovations were not without problems. Department chair and Computer Science Professor John Mitchell, who led the research, said that instructors reported that more frequent exams or alternate forms of assessment such as projects or oral exams required more time to grade. While some students liked the changes, others expressed concerns that more frequent exams actually increased stress rather than reducing it.

Faculty also expressed concerns that frequent online quizzes and other remote assessments could enable more cheating than would occur at an in-person final exam. But Mitchell said that some of the assessment innovations that computer science faculty pursued during spring 2020 could offer an antidote to the cheating problem. “Students have less incentive to cheat with a revise-and-resubmit option, because they have multiple opportunities to improve their grade,” Mitchell said. “In addition, oral interviews or project-based approaches are more difficult to cheat on.”

With the end of the universal pass/fail policy and the return to in-person classes, the researchers found that most of the assessment experiments have ended. Although some professors now weigh the final exam less heavily in calculating students’ final grades, many instructors dropped more complicated assessment innovations altogether. Faculty found it difficult to allow students unlimited opportunities to revise their work for a better grade when they were grading on a curve. And faculty found that devising and evaluating alternate forms of assessment was too time-intensive to be sustainable.

An uphill climb

Established university assessment systems—including curved grading, time-based credits, and lecture-based instruction—don’t easily lend themselves to mastery-based learning. And the challenges to imposing large scale changes to assessment policies are even more complicated by higher education’s commitment to faculty autonomy.

It’s a challenge that is not lost on Han, Mitchell and Harrison, who recognize that reaching any consensus about a department-wide grading policy will not be easy to achieve. “There's this tension between really honoring individual autonomy for teachers to make those decisions about what assessment and what learning should look like, and also having department level best practices or guidelines,” Han said.

That challenge came through in a recent survey that Han, Mitchell and Harrison worked with graduate student Philip Hernandez, lecturers Cynthia Lee and Julie Zelenski, and undergraduate student Emanuel Adamiak to design and administer to 32 computer science faculty members. Respondents had a range of perspectives on whether their grade distributions should be consistent with those of other departmental faculty members. In addition, some instructors explicitly stated that they would like departmental-wide guidelines for grading, while others resisted that idea.

CS faculty agreement about grading continuity

Graph showing numbers of computer science faculty who agree that grading consistency among instructors is good

A graph shows the distribution of agreement among computer science instructors with the following statement: “It’s important for my grade distributions to be consistent with other faculty’s grade distributions in the department.” Three chose "agree"; twelve chose "somewhat agree"; eight chose "neutral"; four chose "somewhat disagree"; and five chose "disagree."

But in a recent interview, Harrison said that getting the balance right between faculty autonomy and departmental guidance around assessment is crucial if universities are to serve students equitably.

“Math is a subject where, from a very early age, students—especially women and underrepresented minorities—can be told that they don't belong and told they are not good at math,” she said. Grading on a curve can reinforce those messages and further inequities that result from these perceptions, she said.

In another study of math enrollment at Western University, Harrison, along with her co-authors Professor Mitchell Stevens and graduate student Philip Hernandez, found that situation to be exacerbated by the tendency for first-year students from affluent backgrounds to be more likely to enroll in math classes that covered concepts they had learned in high school.

“You end up with more privileged students at the top of the grading curve,” Harrison said. “Not because they're better at math, but because they actually took that subject before.”

That’s a trend that the CS Pedagogy Project researchers have their eye on, though they recognize that Harrison’s observations about math class enrollment may not map directly onto computer science.

A minority of surveyed computer science faculty members said that they knew what percentage of their students were entering with prior knowledge. Many regarded that as a problem. Some instructors reasoned that having a departmental policy for placing experienced students in higher-level courses would level the playing field for novices. Pass/fail grading may be another way to do that, Harrison said.

But data from the CS Pedagogy Project’s survey of computer science faculty suggest that faculty don’t have a clear consensus on how to apply this type of grading. The survey asked instructors’ belief about what letter grade or below students would benefit from retaking a course. Responses ranged from B+ to NP, Stanford’s equivalent of an F. The median response was a C+.

Yet when it comes to their own grading habits, instructors reported awarding passing grades to students that were below their target for retaking the course. Han said that this disconnect suggests that some students are passing classes where they haven’t achieved mastery in the eyes of instructors.

Harrison added, “Pass/fail grading doesn't give you really fine-grained detail and feedback in the way that it might be more helpful to learning. But oftentimes regular grading doesn’t do that either.”

What should universities do instead?

During her dissertation defense in July, Harrison posted a slide that lists the policies that she would implement if she ran a university. Among other things, there would be no grading curves in math courses. Instructors would ask students about their prior experience. Course pace and clarity would reflect the needs of novices.

Mitchell, who attended the dissertation defense, suggested that more evidence is needed before colleagues would be convinced about the benefits of a full shift to pass/fail grading. He asked: How much of the change in enrollment trends was caused by universal pass/fail grading in spring 2020, and how much of it might be attributable to classes being offered remotely along with other variables?

That’s an issue that Harrison plans to investigate. In the meantime, she’s just glad that researchers are even asking these questions. It signals a new awareness of the potential effect assessment strategies can have on educational inequities and the benefits that may be achieved by rethinking them.

Rachel Burstein is an independent education researcher and writer. She previously held research roles at EdSurge, New America, and Ithaka. Rachel holds a PhD in history from the CUNY Graduate Center.

Published September 23, 2022

Follow Stanford Digital Education: Sign up for Stanford Digital Education's quarterly newsletter, New Lines, to learn about innovative research and teaching in digital spaces, as well as our team's initiatives. Subscribe to New Lines.