B is for Bell Curve

Dr. Ken Beatty

First, let me leave nothing to the imagination: I hate the Bell Curve. Because I teach assessment statistics to graduate students, I know I shouldn’t callously bully an innocent graph of achievement, but it isn’t the tool itself I object to, but the wicked uses to which it is put.

Bad beginnings

The Bell Curve was first called “the normal curve of error” by Abraham de Moivre in 1733. He used it to explain games of chance, but by the 19th century, the Bell Curve was being misapplied to justify differences in society such as to support Francis Galton’s theories of eugenics, a pseudo-scientific movement to breed humans to produce a master race. It was only the Nazi party’s horrific love of the idea that led to its belated rejection (Goertzel & Fashing, 1981).

In time, the Bell Curve swept into classrooms as a popular means for quantifying levels of student performance. The assumption was that among any group of students, about 10 percent of the weakest ones inhabit the low end of the curve, most linger comfortably in the middle, and 10 percent can be assumed to be highly competent.

The Bell Curve in practice

At one university, my fellow teachers and I were forced to apply the Bell Curve to the grades of each of our classes. Practically speaking, this meant 3 of 30 students would get the lowest D and E failing grades, another 3 or so would get the top A grades, and the bulk in the middle, 24 students, could expect to earn C and B grades. Through considerable protests and prayers, management might permit us to squeeze the middle of the curve and award more A grades as well as save our weakest students from expiring in an assessment train wreck, but we had to beg.

Regardless of final figure manipulations, a bad taste was left in the mouths of both students and teachers. Students felt the Bell Curve inherently made the classroom unnecessarily competitive and that success depended not so much on an individual doing well as on others doing badly. It wouldn’t matter if each and every member of my class were the secret spawn of an Einstein cloning experiment; degrees of difference would be conjured up, and otherwise brilliant students would be spread across the Bell Curve like falling blossoms on wet pavement.

At the same time, teachers felt their work was devalued because even the most highly educated, experienced, and dedicated teachers offering the most innovative lessons could not hope to have their students score higher than those of their less-inspired colleagues. Any gains in student achievement attributable to a superb teacher’s work would be washed away in the Bell Curve. Teachers who rebelled against the obvious unfairness of it all faced the subtle punishment of having to provide detailed justifications of deviations from the so-called norm. From the administrative perspective, the Bell Curve helped even out teachers’ grades and avoid grade inflation, the creeping tendency for teachers to award higher marks, in some cases to garner positive evaluations.

A subjective problem

But the greatest problem is that the individual marks that contribute to a student’s final grade tend to be wildly subjective, with little scientific basis. Applying a scientific measure to unscientific marks is nonsense.

Most teachers try their best, but few carefully pilot and revise their objective multiple-choice questions or create detailed rubrics to measure performance on their subjective essay topic and follow up with repeated or external ratings of at least a sample of the assignments. Fewer still take time to examine their marks and grades in statistical terms. Not to devalue experience, but in assessments based on what teachers merely think is best, there are likely to be questions about validity (Are we testing what we think we are testing?) and reliability (Are the questions likely to produce consistent answers if given again?).

The Bell Curve versus goals

So, what is the alternative? Those who follow normative-based assessment, which only aims to compare students, embrace the Bell Curve. In contrast, criterion-based assessment measures the ability of students to reach goals. Discussing the latter, Bailey (1998) uses the example of a driver’s license. We expect most people to get a driver’s license by meeting a clear set of goals.

Think about that driver’s license test a moment. We assume maturity and intelligence among those taking the test and rightly attribute failure to inadequate instruction or lack of study and practice. We give multiple chances to pass. We recognize that weaker test-takers who do pass will learn from their shortcomings and improve over time.We truly want everyone to pass.

Doesn’t this sound like a more humane and productive model of assessment?

Throw out normative assessment and the Bell Curve and clasp onto goals-oriented criterion testing. Our job as teachers is to make each of our students a driver of progress in society, paying particular attention to those who need more help. We should end our twin obsessions with labeling 10 percent as roadkill and mechanically classifying another 10 percent as those ready to pilot a space shuttle.

References
Bailey, K.M. (1998). Learning About Language Assessment. Boston: Heinle & Heinle

Goertzel, T. & Fashing, J. (1981). The Myth of the Normal Curve: A Theoretical Critique and Examination of Its Role in Teaching and Research. Humanity and Society (5) 14–31

Dr. Ken Beatty, TESOL Professor at Anaheim University in California, has taught in Asia, Canada, the Middle East, and the U.S. and lectures widely on language teaching and learning from the elementary through university levels. He has given 200+ teacher-training sessions in 22 countries and is author of 130+ textbooks, including books in the Pearson series Learning English for Academic Purposes (LEAP).

Browse English Language Teaching