Summer Help in Math

** Do your children need outside help in math?
Have them take a free placement test
to see which skills are missing.

Tuesday, September 30, 2008

Tests Lacking in Reliability and Validity

Any discussion of standardized tests should note that K-12 curricula change. They’re flipped in and out, often at a huge cost in time, resources and money, but there are often no control groups against which new curricula are compared. Change is frequently complete and district wide.

Therefore, instead of testing the children’s learning – as they're intended to do – or the learning environment as they should do – perhaps the tests end up testing the effectiveness of the new curricula. Perhaps the math portions test the effectiveness of the entire reform math philosophy. (Hey, now, there’s a thought.)

When 40-60% of the students don’t pass the WASL, how do administrators know the fault isn’t with administration, the curricula or the tests themselves?

In this article, I want to talk about Washington’s standardized tests (the WASL), in terms of statistical reliability and validity. The following contains information from Brickell & Lyon (2003); and from James Dean Brown, University of Hawaii (used with permission).
Statistical Reliability

Statistical reliability is the degree to which a test’s results can be consistently replicated. Every change in procedure and every consequence to the testing environment can affect the level of test reliability. Here are three of my concerns:

1. The WASL has changed over time.

Since its inception, the WASL has evolved, with new questions, new emphases and new scoring techniques. This evolution might well be valid and necessary, but administrators continually compare scores of today with those of 8 or 10 years ago as if they represent trends in the same tests. They are not the same tests.

2. Lowered WASL passing scores.

In April 2008, I began asking the state about drops in cut scores (the point at which a test score is a passing score), trying to determine if the numbers represented the same expectations. (It’s like asking if a size 10 in women’s clothing is the same size in 2008 as it was in 1967, which of course it is not.)

On July 2, after repeated requests and a request for public information, the state confirmed that several cut scores had been lowered in 2004 and 2005. This likely lowered the level of achievement needed to reach a passing score. Again, even if the changes are valid and necessary, they still interrupt the trends.

3. Possibly flawed scoring.

My husband and I found questionable grading on our daughter’s 3rd- and 4th-grade math tests. On some questions requiring a written answer, her math, spelling and sentence structure were correct, her writing was legible, and she answered all that she was supposed to answer. Yet, points were docked. Had we been inclined to do so, we could not appeal the results. (Only scores for the 10th-grade WASL may be appealed.) Therefore, the questionable scores stand.
Statistical Validity

Statistical validity is the degree to which a test measures what it’s designed to measure. Validity can be calculated in several ways, including whether the test matches the testing objectives and whether it correlates with a comparable outside measure. Here are two of my concerns:

1. No comparisons against comparable outside measures.

Perhaps scores that went up did so because the WASL became easier. Perhaps the students aren’t progressing but rather going backward with higher numbers and less knowledge. Without an outside measure to compare against, how would we know? (Other than by noticing, for example, that students are having trouble with basic skills.)

2. Poor alignment with testing objectives.

The math portion includes many exercises in literacy. Students must write short and long answers that use little math and lots of words. When students get answers wrong, how do we know if they didn’t understand the math, didn’t read the question properly, didn’t understand the question, didn’t write legibly, weren’t able to put together a coherent answer, or just ran out of time? No grading marks were made on our daughter’s tests and no explanations were given.

Mixing variables like this can cause unfortunate consequences in the scoring (which can also affect the test’s reliability). Teachers and parents have told me that correct math answers can get fewer points than wrong answers that include expected written words. The math WASL doesn’t appear to be a good indicator of the quality of the math instruction. A math teacher can be brilliant – the best teacher ever – but if the test is in literacy and not math, the math teacher must bow to the English teacher.

The No. 1 thing the WASL appears to be for is to instigate macro machinations at the state level, some flutter of frenzied activity that will eventually filter down to the district level, which will eventually flutter down to the school level, which will have an effect – possibly positive – on a certain subset of the students. Besides being clunky, costly and inefficient, this supposed effort to be publicly accountable is just pretense.

If we are to retain the WASL or some other standardized measure, there must be a stronger, more direct connection between the subject and the test - and between the test and the teacher, parent and student.

Please note: The information in this post is copyrighted. The proper citation is: Rogers, L. (September, 2008). "Tests lacking in reliability and validity." Retrieved (date) from the Betrayed Web site:

No comments: