Monday, May 7, 2012

Reinvent standardized tests for openness

Identical tests kept secret until administration should be replaced by large open banks of test questions an appropriate sampling of which are faced by individual test takers. The only secrecy that should remain is in the particular combination of questions that the individual test taker faces when taking the exam.

It is always possible for test content secrecy to be compromised. There seem to be such problems now in California. It is claimed that breaches in security "could lead to invalidating test scores for entire schools or prevent the state from using certain tests." A system in which this can happen is not a robust system, and it is always possible for this to happen in a system that relies on test content secrecy. Perfect test content secrecy is not possible.

Limiting access to test materials before administering exams, especially when this means limiting it to companies motivated to produce tests at the lowest possible expense, does not effectively prevent  problems with the exams. And since there is only one version of the exams, problems affect every test taker. This has been the case with English and math tests recently in New York.

A far better solution to testing is to curate large banks of questions accessible to the public at large. This would allow many eyes to identify problems with questions, and give every student a fair chance to prepare for the kinds of questions that will appear on their exam.

The number of questions should be large enough that the probability of a test taker encountering a previously viewed question is fairly small. A test taker, especially one who prepares voraciously, may recognize a question during an examination, but this can also happen with secret content tests, and in any event it is the result of the test taker's learning.

If the banks of questions are sufficiently large and rich, this method may ameliorate the problems of "teaching to the test". It should become clearly more efficient to learn general principles and problem-solving skills, rather than simply reciting every existing question. Of course some practice with questions from the banks for practice is not necessarily bad, but by itself such an approach is unlikely to yield optimal results.

Since taking the test does not yield any new information about the test anyone else will face, it becomes possible for test takers to take and retake exams at any time with no new costs in exam development. This eliminates problems of exam-day sickness and the like.

Appropriate scoring of exams may be more difficult since test takers do not face the same questions. An intelligent and open system of evaluating question difficulty should be developed and marking should be aligned with the purpose of the exam. It may not be that the goal of the exam is to perfectly rank every test taker, and it may be easier to determine if a test taker has demonstrated competency and is ready to move on.

The modern world is advancing where transparency increases, and so should it be with testing.

8 comments:

Jeff N said...

I agree with you overall, though I don't see why the test questions would need to be made public or why the role of such tests would lean more toward measuring competency rather than competitive ranking. Language tests and grad school entrance exams already work this way. Their primary purpose is competitive ranking, and they've already come up with solutions to almost every problem that public standardized tests would face. Not sure about the difficulty/fairness of a new question? After 50,000 people answer it as an unweighted practice question, you will be.

Aaron said...

Thanks Jeff! You may be right that competitive ranking could be retained. Computer adaptive approaches like those used for the GRE are pretty neat, and getting data from real people to evaluate question difficulty or catch bad questions is certainly good, but I do think that making question banks public is a necessary move. We shouldn't have to trust that the few people running the GRE are doing the right kinds of evaluations of their questions, which we only see samplings of when we take the test. I'm opposed to keeping the control of test questions in the hands of just a few people who guard their content as if people don't have the right to know - or perhaps as if people would criticize it if they did know. I think sunlight could make the whole process better.

Aaron said...

Even more problematic problems coming out of Pearson: http://www.nytimes.com/schoolbook/2012/05/02/state-officials-throw-out-another-pearson-test-question/?ref=nyregion

Aaron said...

Brianna comments via email:

"Not a bad idea - its a similar thought to CAT, but CAT is much more restrictive - in CAT the computer tailors the test based on how well a student answers the previous questions, to result in the student answering the most parsimonious amount of questions. The only issue I would see here with increasing the size of a test bank for all students is that its not always the content of the test question - but the format of the test people complain about "teaching to" so its not necessarily there need be a infinite set of items, but really there would need to also be an infinite amount of formats for the items. Then as you say it is much easier for teachers to teach the skills and concepts needed to answer the question - no matter what the format.

"In general you are right - its costly to create items for a test and then to "test" the test to make sure it is psychometrically sound. That would be the one issue to solve with having a large test bank of items, how can the item parameters be estimated, that is how well the item discriminates between a test taker that knows the skill and one who does not, and how difficult the question is to answer if there is no comparative data (although i could imaging some matrix that links test takers together to estimate this NAEP uses a method like this I believe) ? - there's also a 3rd parameter which is the "guessing" parameter - how easy it it for someone to just guess the right answer?"

Aaron said...

Even more problems with translated versions of exams:

http://www.nydailynews.com/new-york/ay-caramba-foreign-language-versions-school-exams-riddled-errors-article-1.1074805

Aaron said...

A very nice similar line of thinking here:

http://www.nytimes.com/schoolbook/2012/05/11/a-request-to-make-the-pearson-tests-public

Aaron said...

Another example of silliness from exam companies:

http://www.nytimes.com/2012/05/17/nyregion/scores-of-sat-taken-at-packer-collegiate-institute-are-invalidated.html?_r=1&ref=education

Aaron said...

Right direction, but a panel is still too small a group to do the review:

"Brooklyn: Manhattan Borough President Scott Stringer wants an independent panel to assess the reliability of the Pearson state exams before they are used for high-stakes purposes. I support having an independent panel, including testing experts, parents, teachers and principals, look at these ridiculous, unreliable exams before they are used to make decisions about schools, kids, principals and teachers. Julie Cavanagh, teacher, PS 15K"

http://www.nydailynews.com/opinion/readers-sound-a-9-11-anniversary-obama-chris-quinn-article-1.1080180#ixzz1vEbYPw9y