Injecting Some Moneyball into Student Testing

I’ve always been one to find love in both the art and science of a given subject. As a lifelong baseball fan – and a pretty poor baseball player through high school – I quickly embraced Ted Williams’ The Science of Hitting, believing that the charts and graphs explaining strike zones and such would somehow transform me from a doubles hitter into a homerun machine. Sadly, it never did.

I’m also an unabashed fan of the New York Mets, and have been since the early 1980s. For more than three decades, I have endured the highs and lows (mostly lows) of rooting for the Metropolitans and in believing this might just be the year.

Sadly, the 2018 season wasn’t that year for the Mets. But it was such a year for Mets ace Jacob deGrom. Last week, the All-Star received the Cy Young, recognizing the best pitcher in the National League. It was a well-deserved honor, recognizing one of the best seasons a starting pitcher has ever had, including an earned run average of only 1.70, a WHIP of 0.912, and 269 strikeouts in 217 innings pitched. DeGrom secured the first place position on all but one of the ballots cast this year, offering a rare highlight in another tough Mets season.

Leading up to the award, there were some analysts who wondered if deGrom would win the Cy Young, despite those impressive numbers. The major ding against him was that he was pitching for the Mets, and as a result posted only a 10-9 record, getting almost no run support at all all season from his team. DeGrom’s top competition in the NL had 18 wins. The Cy Young winner in the American league posted 21 victories. So when a 10-9 record won the Cy Young, some critics pounced, accusing sabermetrics and “moneyball” taking over the awards. The thinking was that one of the chief attributes of a top starting pitcher is how many wins he has. If you aren’t winning, how can you possibly be the best?

All the discussions about how sabermetrics has ruined baseball – or at least baseball awards – soon had me thinking about education and education testing. For well over a decade, we have insisted that student achievement, and the quality of our schools, is based on a single metric. Student performance on the state test is king. It was the single determinant during the NCLB era, and it remains the same during the PARCC/Smarted Balanced reign.

Sure, some have led Quixotic fights against “high-stakes testing” in general, but we all know that testing isn’t going anywhere. While PARCC may ultimately be replaced by a new state test (as my state of New Jersey is looking to do) or whether the consortium may one day be replaced by the latest and greatest, testing is here to stay. The calls for accountability are so great and the dollars spent on K-12 education so high, that not placing some sort of testing metric on schools, and kids, is fairy tale. Testing is here to stay. The only question we should be asking is whether we are administering and analyzing the right tests.

I’ve long been a believer in education data and the importance of quantifiable research, particularly when it comes to demonstrating excellence or improvement. But I still remember the moment when I realized that data was fallible. While serving on a local school board in Virginia, overseeing one of the top school districts in the nation, we were told that our nationally ranked high school had failed to make AYP. At first I couldn’t understand how this was possible. Then I realized we were the victims of a small N size. The impact of a handful of students in special education and ELL dinged up in the AYP evaluation. The same handful of students in both groups. It didn’t make our high school lesser than it was. It didn’t reduce our desire to address the learning needs of those specific students. But the state test declared we weren’t making adequate progress. The data had failed us.

The same can be said about the use of value-added measures (VAM scores) in evaluating teachers and schools. VAM may indeed remain the best method for evaluating teachers based on student academic performance. But it is a badly flawed method, at best. A method that doesn’t take into account the limitations on the subjects that are assessed on state tests, small class sizes (particularly in rural communities or in subjects like STEM), and the transience of the teaching profession, even in a given school year. Despite these flaws, we still use VAM scores because we just don’t have any better alternatives.

Which gets me back to Jake deGrom and moneyball. Maybe it is time that we look at school and student success through a sabermetric lens. Sure, some years success can be measured based on performance on the PARCC, just like many years the best pitcher in baseball has the most victories that season. But maybe, just maybe, there are other outcomes metrics we can and should be using to determine achievement and progress.

This means more than just injecting the MAP test or other interim assessments into the process. It means finding other quantifiable metrics that can be used to determine student progress. It means identifying the shortcomings of a school – or even a student – and then measuring teaching and learning based on that. It means understanding that outcomes can be measured in multiple ways, through multiple tools, including but not limited to an online adaptive test. And it means applying all we know about cognitive learning to establish evaluative tools that live and breathe and adapt based on everything we know about teaching, learning, and the human brain.

DeGrom won the Cy Young because teams feared him every time his turn in the rotation came up. We knew he had a history-making season because of traditional metrics like strikeouts and innings pitched, but also because of moneyball metrics like “wins above replacement,” or WAR, and “walks and hits per innings pitched,” or WHIP. Had he not won that 10^th game the last week of the season, thus giving him a winning record, deGrom would have had no less a stellar season. In fact, a losing record would have indicated his personal successes and impact despite what others around him were able to do.

Maybe it finally is a time a little moneyball thinking works its way into student assessment. Hopefully, this discussion will come before the Mets reach their next World Series.

	michael riccards on The End of a Cheer Era
	M Riccards on Declaring Our Independence fro…
	At A Loss For Words,… on My Fellow Americans: Reflectio…
	Gilda rorro on A Proposal for Heroes
	Mriccards on When It Comes To History, Let’…

Eduflack By Patrick R. Riccards

The Intersection of Education Communications, Policy, and Politics

Injecting Some Moneyball into Student Testing

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply