Students, parents and politicians alike obsess about GPAs, standardized test scores and national student rankings. But if we looked more closely at how we measure student success, would we still value these metrics?
Why We Test
To understand student evaluations at a gut level, surf on over to Amazon.com. Pick any category of product you like. It doesn’t matter. If you’ve shopped online at all, you’ll probably feel your eyes slipping immediately to the gold stars that adorn each and every listing. You’ve probably developed an intuitive feel for what stars mean. One or two? It’s most likely garbage. Three stars: iffy at best. Four stars: now we’re getting somewhere. Four-and-change or five solid stars: we have a winner!
People can and do narrow down the millions of offerings on Amazon simply by looking at star ratings. We develop faith that they work and so we use them more and more. Amazon stars, just like GPAs or SAT scores are an incredible convenience. You take something complex like a product or a human being and boil down everything about them into one numerical, linear measure.
Even when we accept the fact that a measure is crude, we still need it. Take away Amazon’s stars and you could still evaluate products, but only by reading screen after screen of user reviews. College admission boards charge prospective students hundreds of dollars just to apply for admission. I believe the admission boards earn their money. Although they have access to standardized test scores, the Amazon stars of the academic world, they know some of the limitations of these measures and look deeper into a student’s transcript, but here they find more of the same: GPA, course listing, perhaps a short essay at best. Assuming admission boards have good intentions to make the best and fairest choice, time constraints alone force their hands in using flawed testing measures to screen if not outright select their new students.
Is your A in Algebra the same as my A in Algebra? Likely it is not. This can happen for any number of reasons. Some are obvious and easy to see. If your school is more highly regarded than mine, you may well have had to work harder or learn more than I did. College admissions officers attempt to account for this, but there’s a chance that just because a school has a better reputation doesn’t automatically mean that your Algebra A means you learned more than I did when I earned my A.
The curriculum can vary not only from school to school, but from teacher to teacher and from year to year. Although I worked my way through dozens of classes, each underpinned by its own textbook, I don’t ever recall getting to the end of the book and the end of the term at the same time. Did my peers in other classes or different schools, even those using the same text, finish at the same place? Did their instructors skip the same sections mine did? I doubt it very much.
Speaking of instructors, the instructor makes a huge difference in both the quality of a course and the grades that students receive. The all-time best teacher I ever had worked in the worst school I ever attended. Often it falls largely to the whim of the instructor exactly what material is covered, to what depth, and how rigorously this knowledge is evaluated. One trick I learned in college was to avoid professors that grade harshly. It was very possible to lose valuable GPA simply due one professor’s reluctance to give top marks even for excellent work.
This point was brought home painfully to me when I took the Computer Science subject test for the Graduate Record Exam (GRE). Studying for the exam, I saw plenty of very specific questions that weren’t covered in my program. Computer science was (and remains) a big enough field that there’s no way a four-year degree can cover it all, even at an elementary level. No doubt parts of my undergraduate education weren’t represented in the test. Whoever made this version of the test was working from a very different curriculum than I received. Not surprisingly, I did poorly on this test, primarily because I didn’t receive the exact material that the test assumed.
What makes someone well educated? There’s little hope of finding an answer that satisfies everyone. Far back in history, merely being literate made you educated. For a time the mark of an educated person was set by educators themselves. If you understood an ancient language or two (Greek or Latin), if you were steeped in certain classical literature, then you were educated. This education for education’s sake lives on in the liberal arts ideal and undergraduate “breadth requirements.” Somewhere along the line, the economic value of education started to matter and education became associated with a more effective workforce.
If we accept the premise that education aims to make students more effective at some function in their life (not necessarily just work) then grading should aim to establish this effectiveness. But experience teaches that this correlation is weak. While some workers go back to school to earn degrees they need to advance, there is no lack of stories about high-ranking employees who lie about their educational attainment but are not found out for years. If their educational level was truly relevant, then there would be no use in faking a credential.
A perfect test would measure exactly what we care about, only what we care about, and with zero error. There would be no way for students to improve their scores without becoming better students and no way that students could improve without a corresponding uptick on the mythical perfect test. However, the glut of test-specific books, tutors and academies put the lie to this proposition. SAT preparation courses have become such big business that the Educational Testing Service is revising the SAT. Their aim is to make test-specific tutoring less effective. Perhaps they will, but depend on the tutors to counter their changes and find new ways to game the next round of SAT tests.
All through my own academic career I believe I earned more points by learning how to test well than by improving my command of the course content. Rather than poring over the text, I was reviewing past exams by the same professor. I was increasingly able to detect the style, emphasis and structure of the exams I would be facing. In many cases I would say that I predicted about a quarter of the questions that showed up on my tests. It’s easy to score highly when you know what questions are coming. Many of my peers were more than a match for me in course content but tested poorly, and I suspect that one of the brightest and most dedicated of my undergraduate peers flunked out in the first semester simply because he lacked the bag of tricks that dramatically reduces the difficulty level of nearly any test.
It’s easy to poke holes in nearly any student grading or scoring system but much harder to propose a replacement. One alternative is called Standards-Based Grading. Rather than having an A in Algebra, students would be rated as “not proficient”, “partially proficient”, “proficient” or “advanced” in a number of micro-skills such as factoring polynomials. Standards-Based Grading improves on letter grades to some degree because it replaces a categorical subject name like Algebra with a list of micro-skills that enumerates what Algebra means in this context.
What’s lost in Standards-Based Grading is the ease analogous to the Amazon gold star ratings system. Instead of a GPA or an overview of how well a student is doing, there’s a forest of individual measures which say more but which don’t help as much when it comes time to decide if a student can be promoted into a higher grade or accepted into a university.
One additional sticking point with Standards-Based Grading: describing each of the micro-skills can be difficult. The custom here in Georgia, where my son is a student, is to describe each standard in a sentence or two. Here’s an example of one of the dozens of standards required of fourth graders:
MCC4.OA.3 Solve multistep word problems posed with whole numbers and having whole-number answers using the four operations, including problems in which remainders must be interpreted. Represent these problems using equations with a letter standing for the unknown quantity. Assess the reasonableness of answers using mental computation and estimation strategies including rounding.
This is all well-meaning, but I’m having a lot of trouble coming up with a concrete representation of what this text block stands for. What kinds of word problems are we talking about? How many steps is “multistep?” All this confusion is enough to make letter grades, with all their disadvantages, start to look reasonable.
Another grade alternative, and one which I favor, is project-based or portfolio-based evaluation. If a student sets a goal to create something of use and can deliver this as a sign of their ability, then we can have individual differences in how good the product is, but there’s no doubt of its relevance. Project-based evaluation suffers from some of the same problems as Standards-Based grading: it takes longer to evaluate a portfolio than it does a transcript. Ultimately we need to accept that we can either have expedient grades or we can have accurate and relevant grades. We do our students a disservice if we believe we can have both at the same time.
All clinical material on this site is peer reviewed by one or more clinical psychologists or other qualified mental health professionals. This specific article was originally published by Dr Greg Mulhauser, Managing Editor on .on and was last reviewed or updated by