TY - JOUR
T1 - Can High-Dimensional Questionnaires Resolve the Ipsativity Issue of Forced-Choice Response Formats?
AU - Schulte, Niklas
AU - Holling, Heinz
AU - Bürkner, Paul Christian
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Forced-choice questionnaires can prevent faking and other response biases typically associated with rating scales. However, the derived trait scores are often unreliable and ipsative, making interindividual comparisons in high-stakes situations impossible. Several studies suggest that these problems vanish if the number of measured traits is high. To determine the necessary number of traits under varying sample sizes, factor loadings, and intertrait correlations, simulations were performed for the two most widely used scoring methods, namely the classical (ipsative) approach and Thurstonian item response theory (IRT) models. Results demonstrate that while especially Thurstonian IRT models perform well under ideal conditions, both methods yield insufficient reliabilities in most conditions resembling applied contexts. Moreover, not only the classical estimates but also the Thurstonian IRT estimates for questionnaires with equally keyed items remain (partially) ipsative, even when the number of traits is very high (i.e., 30). This result not only questions earlier assumptions regarding the use of classical scores in high-dimensional questionnaires, but it also raises doubts about many validation studies on Thurstonian IRT models because correlations of (partially) ipsative scores with external criteria cannot be interpreted in a usual way.
AB - Forced-choice questionnaires can prevent faking and other response biases typically associated with rating scales. However, the derived trait scores are often unreliable and ipsative, making interindividual comparisons in high-stakes situations impossible. Several studies suggest that these problems vanish if the number of measured traits is high. To determine the necessary number of traits under varying sample sizes, factor loadings, and intertrait correlations, simulations were performed for the two most widely used scoring methods, namely the classical (ipsative) approach and Thurstonian item response theory (IRT) models. Results demonstrate that while especially Thurstonian IRT models perform well under ideal conditions, both methods yield insufficient reliabilities in most conditions resembling applied contexts. Moreover, not only the classical estimates but also the Thurstonian IRT estimates for questionnaires with equally keyed items remain (partially) ipsative, even when the number of traits is very high (i.e., 30). This result not only questions earlier assumptions regarding the use of classical scores in high-dimensional questionnaires, but it also raises doubts about many validation studies on Thurstonian IRT models because correlations of (partially) ipsative scores with external criteria cannot be interpreted in a usual way.
KW - forced-choice format
KW - ipsative data
KW - multidimensional IRT
KW - Thurstonian IRT model
UR - http://www.scopus.com/inward/record.url?scp=85088467150&partnerID=8YFLogxK
U2 - 10.1177/0013164420934861
DO - 10.1177/0013164420934861
M3 - Article
AN - SCOPUS:85088467150
JO - EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT
JF - EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT
SN - 0013-1644
ER -