IRT scoring and the principle of consistent order
IRT models are being increasingly used worldwide for test construction and scoring. The study examines the practical implications of estimating individual scores in a paper-and-pencil high-stakes test using 2PL and 3PL models, specifically whether the principle of consistent order holds when scoring with IRT. The principle states that student A, who answers the same (or a larger) number of items of greater difficulty than student B, should outscore B. Results of analyses conducted using actual scores from the Chilean national admission test in mathematics indicate the principle does not hold when scoring with 2PL or 3PL models. Students who answer more items and of greater difficulty may be assigned lower scores. The findings can be explained by examining the mathematical models, since estimated ability scores are an increasing function of the accumulated estimated discriminations for the correct items, not their difficulty. For high stakes tests the decision to use complex model should therefore be a matter of serious deliberation for policy makers and test experts, since fairness and transparency may be compromised.
READ FULL TEXT