Heather C. Hill is the Jerome T. Murphy Professor of Education at the Harvard Graduate School of Education. She focuses on teacher quality in mathematics, developing ways of measuring and improving teachers’ mathematical knowledge and the quality of mathematical instruction. She also studies the effects of policies aimed at improving teaching. Read full bio
By Heather C. Hill
I have two thoughts about this question. First, there is little evidence that teachers meeting to study student data, such as benchmark test scores, actually affects instruction or student learning. Of six recent rigorous studies (linked below), two showed mixed impacts on student outcomes (meaning reading scores improved, but not math scores, or vice versa) — and the rest of the studies did not show that this practice had any impact on student scores. There are two reasons this might be the case. Teachers assigned to the control groups already might have been meeting to discuss student data, meaning that all of the teachers had roughly the same opportunities to learn about student performance. Second, qualitative studies suggest that teachers actually have a difficult time adjusting what they do in the classroom based on student test scores.
Still, there are some promising classroom-based assessment programs. For instance, a formative assessment program designed at Florida State University provides K-12 teachers with complex student mathematics tasks and rubrics. An experiment at the primary grades suggests that students’ mathematics performance improve after teachers use these tasks to assess mathematical competency. Another program, Cognitively Guided Instruction, educated teachers about early grade students’ developmental trajectories in mathematics and provided time for teachers to work out strategies to assess student knowledge. This program has also shown positive results in repeated randomized trials.
But it would not be right to simply say that “formative assessment works” based on these two studies. It was not only assessment that changed in these classrooms, but also the nature of mathematical tasks; students were working on more open-ended, cognitively complex problems, and teachers were providing them with opportunities to really think those problems through. It’s likely that the package of these pedagogical techniques — new tasks, new teaching methods, formative assessment strategies — drove the programs’ success in improving student achievement. And beyond these two programs, rigorous evidence on formative assessment is difficult to find.
I’m not optimistic about teachers studying formal student data, and, if I were a principal, I’d put my eggs in another basket. For instance, I’d probably think about coaching teachers to be more aware of students’ in-classroom work product and cues. In classroom observations, I’ve seen many excellent teachers read kids’ faces and listen to their talk, then adjust instruction accordingly. And there’s such huge opportunity costs associated with the time teachers put into studying student data, and the time kids spend taking benchmark assessments and the like. While these practices MAY work, I’d recommend going with programs that we know DO work.
This answer was developed in partnership with Usable Knowledge at the Harvard Graduate School of Education.
This study found a data-driven reform initiative caused statistically significant districtwide improvements in student mathematics achievement, but not reading achievement.
This follow-up study found no significant differences between schools using quarterly benchmark exams and those not doing so after two years.
This study found no overall impact on student achievement in math or reading.
This study found statistically significant results in grades three to eight but not in kindergarten to second grade.
This study showed that the MAP program, one of the most widely used systems focused on benchmark assessments and training and differentiated instruction, had no impact on student reading.
This study found the FAST-R had a generally positive but not statistically significant impact on reading scores.