Reform K12 logo
Main
Menu
« Previous Entry (older): To Administrators: How to Deal with Subs
» Next Entry (newer): ATESLA: Test every kid, every year.
6632

Invasion of the Student Body Snatchers: A Statistical Lie Exposed

December 22, 2005

Body Snatchers DVD coverPicture a sleepy little public school at the end of your street. The students at that school, like all such schools, take standardized tests each year, and last year was no different. The student scores were all averaged, and the school was recognized for its level of academic performance.

But prior to this year's round of standardized testing, something shocking happened. Students were replaced. Not just a few, or even half, but well over 90% of the entire student body was replaced by a different group of students, a group which had a worse history of academic achievement than the original group.

This new, alien group of students (or pod-students) took the test in place of the former group, and no one in authority said anything, it was as if it were just business as usual.

When the scores came in--surprise--the average score was lower than it had been the year before. Instead of recognizing the statistical mismatch, the school was promptly chastised for declining in academic performance!

Believe it or not, this is a true story. What's even more bizarre is that the story is replicated like those creepy pod-people every year in states and cities across America, notably in every public and charter school in Pennsylvania.

What is going on here?

It's easy, really. Schools are measured by how their students do in certain targeted grades. In Pennsylvania, for example, schools are judged mainly by the PSSA scores of their 3rd, 5th, 8th, and 11th graders. The other grades are not assessed at the state level.

Every year a different crop of students is measured, as each passes under the magnifying glass. These students are inherently different, so of course there will be some variablility of the class's scores as a whole. Each year the scores will either tick up or tick down, but rarely will stay exactly the same.

For example, take 300 students in 11th grade, test them in reading, writing, and math. The following year, take 300 different 11th grade students and test them again. What are the chances that the measured average will be exactly the same? Somewhere between Slim and None, and Slim just left town.

Yet based on this simple (and simplistic) comparison, schools are declared to be "improving" or "declining." The measurement is a very important indicator for whether a school is meeting Adequate Yearly Progress (AYP) for compliance with No Child Left Behind.

It's easy to blame the Feds, although the ire would be misplaced. According to the Federal Department of Education, under No Child Left Behind, states are supposed to be measuring every child, every year.

Measuring a group of students, and finding the mean of the data, is not a perfect measurement, as demonstrated by repeating the exact same measurement (what are the chances that every student will get the exact same score?) or by dividing the class into two random groups, and taking the mean of each. Chances are very good the means won't be identical.

Statisticians have known this for years! But what is shocking is that the folks doing the comparisons of these schools by comparing means of different populations of students either don't know the proper way to deal with such data, or don't bother. After all, it is very easy to just compare means.

Easy, but not terribly valid.



Posted by ceb into Testing & Grading
TrackBack (0) | ↑ top ↑ | « previous entry | next entry » | ReformK12 home
Comments

I wonder what would happen if the kids were tested at the beginning of the school year and again at the end to see where they started (meaning what they retained over the summer break) to what they learned?

Just a thought --

Happy Holidays!

Elizabeth

Elizabeth December 23, 2005 05:26 AM

Thanks for your comment, Elizabeth, you're right on the money. In fact, some schools do exactly as you say, calling it either a pre- and post-test, or a formative and summative assessment.

Some schools test every kid twice a year, in September and June, and you do see a performance hit after that summer break! Must be all the ice cream.

Chett December 25, 2005 12:37 AM

The pre/post idea is brilliant. Don't forget, however, that a test at the end of one year could be a pre-test for the next.

Also, just remember that we aren't doing all this testing to measure student's, but to measure teachers.

(teachers purposefully obfuscate this fact so that parents feel that junior/ette is being "tested" and may not "measure up.")

Chett,

Maybe it's time to let the "summer break" go the way of the agrarian society that created it.

More days off, longer Christmas & Spring breaks, and 6 weeks of summer with required reading. OOOOOHHHHHH, how Draconian!
___

Why is it so hard to see that a market based system would solve so many of these problems while piling never-ending mandates and reforms (or more money) into the current system will never solve them?

Bruno December 28, 2005 12:38 PM

It gets really interesting when you add in 2 additional factors:

The elementary school boundaries in my very large school district change every year. You can count on a large subdivision (either a well-to-do one, or a very poor one) to move in or out of the school boundary at least every other year) - which can dramatically affect the test scores.

Additionally, my school has a yearly 53% turnover rate. That's right - without boundary changes - 53% of the student body leaves the school every year and is replaced by entirely new children.

December 29, 2005 12:34 PM

"The following year, take 300 different 11th grade students and test them again. What are the chances that the measured average will be exactly the same?"

If you took the exact same 11th grade students and tested them again, the probability that average would be "exactly" the same would be close to zero.

You also left out any discussion of the standard deviations. An average without a standard deviation is essentially worthless for any analysis.

So, a better suggestion would be have the adults in the education system take a course on statistics so they could form coherent arguments.

bob in kentucky December 29, 2005 01:50 PM
Post a comment









Remember personal info?


HTML permitted:

<i>italics</i>
<b>bold</b>
<a href="http://URL">hyperlink</a>