Journal of Applied Measurement

A publication of the Department of Educational Psychology and Counseling
National Taiwan Normal University

Volume 25, Issue 3/4 (2024)
Special Issue in Commemoration of Prof. Wen-Chung Wang Part 2

Introduction to the Special Issue in Honour of Prof. Wang Wen-Chung (Part 2)

Mark Wilson
University of California, Berkeley
Karen Drane
University of California, Berkeley
Xiaoting Huang
Peking University
Xuelan (Sherry) Qiu
City University of Macau

n/a

Citation:
Wilson, M., Draney, K., Huang, X., & Qiu, X. (2024). Special thanks to Professor Wen-Chung Wang―From three of his doctoral students. Journal of Applied Measurement,, 25(3/4), i–iii.

Download PDF

Special Thanks to Professor Wen-Chung Wang―From Three of His Doctoral Students

Cheng-Te Chen
National Tsing-Hua University, Taiwan
Ching-Lin Shih
National Sun Yat-Sen University, Taiwan
Chia-Wen Chen
University of Cambridge

Professor Wen-Chung Wang has mentored numerous graduate students at both the National Chung Cheng University in Taiwan and at the Education University of Hong Kong, leaving a lasting impact on them through his dedicated guidance. Three of his former doctoral students have volunteered to share their experiences working with Professor Wang during their graduate studies and as they embarked on their careers in higher education.

Citation:
Chen, C.-T., Shih, C.-L., & Chen, C.-W. (2024). Special thanks to Professor Wen-Chung Wang―From three of his doctoral students. Journal of Applied Measurement , 25(3/4), 150–154.

A Systematic Review of Forced-Choice Measures and Item Response Theory Modelling

Xuelan Qiu
City University of Macau
Jimmy de la Torre
University of Hong Kong

Forced-choice (FC) items have been used for decades to assess psychological traits, such as personality, interests, and values, because of their strengths in preventing response biases. Recently, there has been increased research interest in developing item response theory (IRT) models for FC items. This review paper discusses various formats of FC measures with specific examples of tests and the underlying humans’ preferential choice theories for forced-choice. Besides, it presents the existing IRT models and discusses the specification of the models and the properties of the models. Based on the review, the paper suggests theoretical and application-oriented research lines for future research on the FC measures.

Keywords: Forced-choice; ipsative data; item response theory; systematic review

Citation:
Qiu, X., & de la Torre, J. (2024). A systematic review of forced-choice measures and item response theory modelling. Journal of Applied Measurement, 25(3/4), 155–171.

Retrofitting the Partially Confirmatory Cognitive Diagnosis Modelling to Large-Scale Educational Assessments

Yi Jin
The University of Hong Kong
Jinsong Chen
The University of Hong Kong

Cognitive diagnosis models are increasingly being applied in large-scale educational assessments. The construction of Q-matrices allows for the adaptation of non-diagnostic assessments for diagnostic use. In this study, we propose retrofitting a newly developed partially confirmatory diagnosis model to large-scale educational assessments, in which the Q-matrix can be partially specified by expert knowledge and partially inferred from response data. The efficacy of this framework is demonstrated by comparing it with the fully expert specification method and the PVAF-based Q-matrix validation method through real application scenarios. The results reveal the significant practical potential of our proposed approach.

Keywords: partially confirmatory, cognitive diagnosis model, large-scale assessments

Citation:
Jin, Y., & Chen, J. (2024). Retrofitting the partially confirmatory cognitive diagnosis modelling to large-scale educational assessments. Journal of Applied Measurement, 25(3/4), 172–189.

A Graphical Framework Using Item Response Modeling to Detect Nonuniform Differential Item Functioning

Bunyong Dejanipont
University of California, Berkeley
Mark Wilson
University of California, Berkeley

Detecting differential item functioning (DIF) is fundamental to improving the fairness and validity of virtually any assessment. However, many commonly used DIF indices are suitable for uniform DIF, but not for nonuniform DIF. Furthermore, using graphs such as category characteristic curves (also known as category probability functions) to examine DIF in a polytomous item is relatively involved and convoluted. To address such limitations, we propose a graphical DIF detection framework that is intuitive and sensitive to both nonuniform DIF and uniform DIF. Using the partial credit model as an illustration, we compare a DIF graphical-based approach that leverages the proposed framework against a DIF parametric test approach that utilizes a DIF index and its accompanying DIF test statistic. For uniform DIF detection, there is a substantial agreement between the two approaches. But for nonuniform DIF detection, the discrepancy is stark: the graphical-based approach flags many more items having nonuniform DIF than the parametric test approach does. Altogether, the results suggest that a DIF investigation that only relies on a uniform-DIF-oriented index and its DIF test statistic is likely to provide specious conclusions about item bias, since such an index is generally much less sensitive to nonuniform DIF.

Keywords: nonuniform differential item functioning, item bias, category characteristic curve, item response modeling, partial credit model

Citation:
Dejanipont, B., & Wilson, M. (2024). A graphical framework using item response modeling to detect nonuniform differential item functioning. Journal of Applied Measurement, 25(3/4), 190–202.

Supplemental Materials

Dirichlet Item Response Models for Multidimensional Compositional Items with Specific Objectivity

Chia-Wen Chen
The Psychometrics Centre, University of Cambridge
Wen-Chung Wang
National Taiwan Normal Universit
National Sun Yat-sen University
Beijing Normal University
The Education University of Hong Kong
Magdalena Mo Ching Mok
The Education University of Hong Kong
National Taichung University of Education

Compositional items belong to a forced-choice format in which respondents are requested to allocate a fixed total number of points to a set of statements in an item. Constant-sum scales using compositional items are applied in noncognitive tests to measure personality, values, and interests. Brown’s Thurstonian model was created for compositional items; however, the Thurstonian model lacks the good measurement property of specific objectivity. This problem means that the model does not allow for making comparisons between measures not located on a common scale. The aim of this study was to develop a new, practically feasible IRT model for compositional items that would be sufficient for the good measurement property of specific objectivity. The Dirichlet Rasch model was developed in the current study. The parameter estimation, standard error, Fisher information function, and evaluation of model fit to an empirical real data set were explored. The simulation studies presented the acceptable parameter recovery and precision of estimation in the various conditions of the newly developed model. An empirical study collecting data with a sample size of 512, taking an online value test based on Schwartz’s values theory, showed that the Dirichlet Rasch model had acceptable model-data fit, and the reliabilities were higher than 0.85.

Keywords: Compositional data, forced-choice items, ipsative data, Rasch model, item response theory

Citation:
Chen, C.-W., Wang, W.-C., & Mok, M. M. C. (2024). Dirichlet item response models for multidimensional compositional items with specific objectivity. Journal of Applied Measurement, 25(3/4), 203–235.

Relating Selected Response to Constructed Response Items: Systematic Effects of Item Format

Mark Wilson
University of California, Berkeley
Weeraphat Suksiri
University of California, Berkeley
Linda Morell
University of California, Berkeley
Jonathan Osborne
Stanford University
Sara Dozier
California State University, Long Beach

In this study, we explore the relationship between constructed-response (CR) item types and selected-response (SR) item types. We constructed stem-equivalent sets of SR and CR items, designed to assess multiple (high) levels of competency in argumentation using the construct-modelling approach, which is based on a previously validated construct map for argumentation. We analyzed data obtained from 741 middle school and high school students who were randomly assigned to the two different assessment conditions (i.e., CR and SR). Our findings indicate that the two assessment conditions generate two different but correlated psychometric dimensions. In particular, the item difficulty parameters from the SR items are highly correlated with those from the paired CR items, indicating that both sets are consistent with the original construct map for argumentation. However, the CR items were much harder for the students than the SR items, which were the equivalent of a grade level, and appeared even more difficult to them. We interpret this finding to show that in the CR case, the students are hampered by the requirement to write their responses in sentences that communicate their higher-level reasoning and capabilities. Thus, their facility with expression is a problem only when constructed response items are used to assess student knowledge. We use these results to review the usage of the two item formats and find value for both uses.

Keywords: Assessment, constructed responses, higher-order reasoning, selected responses, item format effect

Citation:
Wilson, M., Suksiri, W., Morell, L., Osborne, J., & Dozier, S. (2024). Relating selected response to constructed response items: Systematic effects of item format. Journal of Applied Measurement, 25(3/4), 236–253.