Retrieval practice: the ups & downs of my experience

Christian Moore Anderson
Aug 7, 2023
8 min read

Updated: Apr 13

For years I've toiled with quiz questions in my biology courses (AKA core questions). I've used quiz questions as starter quizzes, to summarise key points of an explanation, as homework, and for cover work. I warned about potential problems of quiz questions in a post back in 2020, linked to Prof. Rob Coe's doubts about quiz questions in the classroom. But I made real efforts to make it work and spent considerable time meditating on the best design for quiz questions (see my book).

Recently, data I had collected indicated such an interesting pattern I dug out some previous data to help me reflect on the use and effects of short-answer quiz questions. In this post, I'll share the data and some of what I've learnt over this time.

All of the data I'll show comes from my IB Biology courses (16-18 year olds), and three different cohorts. The content load of these courses is heavy. As an incentive for students to keep on top of it, I ran starter quizzes (testing previously covered content) and collected data once a week. The students had access to the questions and answers. Why did I start collecting the data? Because it was easy to collect and I thought it could reveal something interesting.

The questions were randomly chosen (mainly because I didn't have the time to choose them) and were collected from those found here. As you can see, the total number of questions is high, and they were designed around the course specification & exam mark schemes. An example is shown below.

The data below aren't from tightly controlled experiments. There were no control groups, and there was natural variation among students (including study effort). Another issue people may mention is the method of the quizzes. However, while there may be vast differences in the implementation of quizzes between teachers, the desired output of quizzes is the same—that students remember the answers.

The current trend of using quizzes in England (as I perceive it from social media) lies on the assumption that if the students remember these facts, they will know more, freeing up the working memory to understand more, be more successful, and therefore also more motivated. So, did knowing the answers on the quizzes, help their performance in exams?

I began collecting the data towards the end of a two-year IB biology course, so for this first set of data, the sample size is small—just four weeks, and therefore four quizzes. To begin with, I collected the data to follow my students' progress. Below is a scatter graph of the average percentage correct on the quizzes and their official IB Biology score.

R2 is a measure of correlation: 1.0 is a perfect correlation, whereas 0.0 is no correlation at all. The correlation here, between their average quiz score and their exam scores, is very low.

The correlation is very low, but so is the sample size. By the end of the course, the content already covered is high, and the number of questions they were asked was low. But, we can see some interesting trends. They were students who scored quite differently on the quizzes—60% correct, compared to 85-90%—but got a similar final score on the course (around 50). They were also students who scored highly on the quizzes (above 80% correct), yet had vastly different outcomes in the final score—from 50 up to 90.

The results interested me enough to continue taking data for a younger cohort. The next set of data comes from a sample size of 24 and 32 quizzes (the 32 includes the previous 24) that occurred over the first and second year of their IB course, respectively.

In the scatter graphs below there are three comparisons. The overall average % correct on the quizzes is compared with the first-year mock exam (May), the second-year mock exam (Nov), and the final score from exams in May. Between the first mock exam and the second, the students carried out coursework so there were no quizzes. Extra quizzes were carried out after the second mock exam, hence the different sample sizes.

Now we find a higher correlation between knowing the answers to the quiz questions and their exam results. The strongest relationship is found in the final graph, where a student bucks the trend but in general, there appears to be a positive relationship between the two. And this makes sense, because in IB Biology exams you do need to answer with the details and technical vocabulary to succeed, and knowing extra details should allow students to decipher more complex questions.

However, what separates a top and bottom official IB score, is just 15 to 20 percentage points on the quizzes. The worst performing students in the quizzes (except the outlier) still answered over 70% of the questions correctly despite the heavy content load. Those students knew a lot of those answers. So there is much more involved than simply knowing the answers to quiz questions. This can be seen more clearly with the next cohort.

With this class, the sample size is smaller but still includes 17 quizzes over 17 weeks. All of the quizzes occurred in the first year of teaching. The reason I stopped collecting the data was because I stopped the quizzes altogether. I'll explain why in a minute. Here are the scatter graphs:

The correlations this time are much lower, only hinting at a positive relationship. By the first mock exam, I had most students getting most of the quiz questions correct (around 80%). But whether or not they knew the answers to the questions told me little about their ability on exams. In the first mock exam, some students scored an average of above 80% on the quizzes, yet below 20 on the exam. This is despite the questions being designed around the IB course, its specification and its exam mark schemes.

Another interesting observation is that the weaker students improved in the second year despite the quizzes having ended in the first year. The general pattern persists in the official exam results, showing some validity in the mock exams I designed (the R2 correlation for the May mock exam with the official result was 0.68, and 0.69 for the second mock exam).

So what can we make of this data? Firstly, across the three cohorts, knowing the answers to the quiz questions—or knowing the details specified for the course's concepts—did not seem to be a dominating major factor in how well the students succeeded in showing their understanding on exams. While for one cohort the correlation was more clearly positive, it still showed that knowing a lot of details is important but simply not enough. So what happened with the final cohort and why did I choose to stop the quizzes?

There is another factor that I think is often missed in discussions on the importance of retrieval practice and knowledge recall. This is our students' conception of learning—of understanding—in our subject. This is because our students are adaptive agents and not just consumers of our explanations or our quizzes.

While I collected the quantitative data, I also strove for qualitative data about how students felt about their learning and the course. This was a constant endeavour through conversation and periodic emails. What I found was that the quizzes, despite the successes, was not motivating my students.

While they were generally motivated to get good grades, the quizzes seemed to negatively affect their view of biology as a subject. The students were more inclined to see biology as rote learning, and they didn't like that. And this ultimately affected their ways of learning and awareness—how they organised and invested their attention in lessons.

This shows why we have to critically assess the effectiveness of pedagogical assumptions. We need to always probe the system, and reflect on what emerges from the interactions between teacher, students, curriculum, and context. And it's why I encourage other teachers to keep data to check their assumptions—whether quantitative or qualitative.

While there is supporting evidence for retrieval practice in labs, we also need to check how quizzes interact in the real ecologies of the classroom. Retrieval practice is important, but it doesn't mean it has to be quizzes. It can be richer questions embedded throughout a lesson. Each subject, course, & context has its needs, so implementation will differ for many reasons.

I stopped the quizzes in my classroom, but I didn't throw out the baby with the water. The quiz questions remain within the courses, just in a more nuanced niche, due to the lessons I've learnt. I write them principally as a planning tool for me to really think about what I'm going to teach and condense it into questions. And I continue to encourage them as a revision tool. My students also really appreciate having a printed copy as it establishes the level of knowledge for the course—the technical vocabulary, and the depth of mechanistic explanation.

So my students do use the questions for revision (as retrieval practice)—typically close to exams. And this is how I prefer it, because my priority is establishing a culture of learning that helps students understand what understanding is in my subject. What it means to understand and to explain.

Without this, I've found that students study the quiz questions alone, and then bad things can happen. In my classroom, I now call them "basic knowledge questions" to help them see that they are nothing more than basics. And I put considerable groundwork in during the course, so that when self-quizzing commences among my students, it is grounded in a pursuit of understanding—not just recall.

The questions they use are grounded in the context of the mental model they are thinking with, rather than out-of-context random questions being given to them. There is a difference between self-regulated learners harnessing the testing effect with purpose, and students being fed random questions.

So what have I substituted quizzes for during my lessons?

There were two main places I tried quiz questions: as (random-question) starters, and as (non-random) questions to summarise key points after an explanation. Both have changed to activities in which students can take a more active role in their learning and can reveal more of their thinking to me. This allows for more robust feedback loops to arise in the lesson, and I can better judge where to spend my time and the direction I should go in.

To begin lessons, I now prefer to start with some type of provocation that has students mentally act and make predictions. For summaries within lessons, I utilise self-explanation, one of Fiorella and Mayer's eight generative learning strategies (2015). This activity involves students explaining concepts to themselves as if they were explaining it to a classmate who had missed a lesson (not just re-reading). During this time, students realise for themselves where links are missing in their understanding and I devote a period to answering and discussing their questions in feedback loops.

Finally, for collecting data on student progress—and for formative assessment to produce feedback loops beyond oral questioning & dialogue—I switched to open and novel questions by asking "What if?" questions, and assessing them against my own taxonomy of understanding.

All these activities build a need for explanation rather than a ticking and crossing of answers.

Do you want to co-construct meaning without lecturing, slide decks, or leaving students to discover for themselves? Learn how and why in my books. Download the first chapters of each book here.

References

Fiorella, L., and Mayer, R. 2015. Learning as a Generative Activity: Eight learning strategies that promote understanding. UK: Cambridge University Press.