Journal of Applied Measurement

A publication of the Department of Educational Psychology and Counseling
National Taiwan Normal University

Volume 25, Issue 3/4 (2024)
Special Issue in Commemoration of Prof. Wen-Chung Wang Part 2

Introduction to the Special Issue in Honour of Prof. Wang Wen-Chung (Part 2)

Mark Wilson
University of California, Berkeley
Karen Drane
University of California, Berkeley
Xiaoting Huang
Peking University
Xuelan (Sherry) Qiu
City University of Macau

n/a

Citation:
Wilson, M., Draney, K., Huang, X., & Qiu, X. (2024). Special thanks to Professor Wen-Chung Wang―From three of his doctoral students. Journal of Applied Measurement,, 25(3/4), x–x.

Download PDF


Special Thanks to Professor Wen-Chung Wang―From Three of His Doctoral Students

Cheng-Te Chen
National Tsing-Hua University, Taiwan
Ching-Lin Shih
National Sun Yat-Sen University, Taiwan
Chia-Wen Chen
University of Cambridge

Professor Wen-Chung Wang has mentored numerous graduate students at both the National Chung Cheng University in Taiwan and at the Education University of Hong Kong, leaving a lasting impact on them through his dedicated guidance. Three of his former doctoral students have volunteered to share their experiences working with Professor Wang during their graduate studies and as they embarked on their careers in higher education.

Citation:
Chen, C.-T., Shih, C.-L., & Chen, C.-W. (2024). Special thanks to Professor Wen-Chung Wang―From three of his doctoral students. Journal of Applied Measurement , 25(3/4), x–x.

Download PDF


A Systematic Review of Forced-Choice Measures and Item Response Theory Modelling

Xuelan Qiu
City University of Macau
Jimmy de la Torre
University of Hong Kong

Forced-choice (FC) items have been used for decades to assess psychological traits, such as personality, interests, and values, because of their strengths in preventing response biases. Recently, there has been increased research interest in developing item response theory (IRT) models for FC items. This review paper discusses various formats of FC measures with specific examples of tests and the underlying humans’ preferential choice theories for forced-choice. Besides, it presents the existing IRT models and discusses the specification of the models and the properties of the models. Based on the review, the paper suggests theoretical and application-oriented research lines for future research on the FC measures.

Keywords: Forced-choice; ipsative data; item response theory; systematic review

Citation:
Qiu, X., & de la Torre, J. (2024). A systematic review of forced-choice measures and item response theory modelling. Journal of Applied Measurement, 25(3/4), x–x.

Download PDF


Retrofitting the Partially Confirmatory Cognitive Diagnosis Modelling to Large-Scale Educational Assessments

Yi Jin
The University of Hong Kong
Jinsong Chen
The University of Hong Kong

Cognitive diagnosis models are increasingly being applied in large-scale educational assessments. The construction of Q-matrices allows for the adaptation of non-diagnostic assessments for diagnostic use. In this study, we propose retrofitting a newly developed partially confirmatory diagnosis model to large-scale educational assessments, in which the Q-matrix can be partially specified by expert knowledge and partially inferred from response data. The efficacy of this framework is demonstrated by comparing it with the fully expert specification method and the PVAF-based Q-matrix validation method through real application scenarios. The results reveal the significant practical potential of our proposed approach.

Keywords: partially confirmatory, cognitive diagnosis model, large-scale assessments

Citation:
Jin, Y., & Chen, J. (2024). Retrofitting the partially confirmatory cognitive diagnosis modelling to large-scale educational assessments. Journal of Applied Measurement, 25(3/4), x–x.

Download PDF


A Graphical Framework Using Item Response Modeling to Detect Nonuniform Differential Item Functioning

Bunyong Dejanipont
University at Buffalo
Mark Wilson
University of California, Berkeley

Detecting differential item functioning (DIF) is fundamental to improving the fairness and validity of virtually any assessment. However, many commonly used DIF indices are suitable for uniform DIF, but not for nonuniform DIF. Furthermore, using graphs such as category characteristic curves (also known as category probability functions) to examine DIF in a polytomous item is relatively involved and convoluted. To address such limitations, we propose a graphical DIF detection framework that is intuitive and sensitive to both nonuniform DIF and uniform DIF. Using the partial credit model as an illustration, we compare a DIF graphical-based approach that leverages the proposed framework against a DIF parametric test approach that utilizes a DIF index and its accompanying DIF test statistic. For uniform DIF detection, there is a substantial agreement between the two approaches. But for nonuniform DIF detection, the discrepancy is stark: the graphical-based approach flags many more items having nonuniform DIF than the parametric test approach does. Altogether, the results suggest that a DIF investigation that only relies on a uniform-DIF-oriented index and its DIF test statistic is likely to provide specious conclusions about item bias, since such an index is generally much less sensitive to nonuniform DIF.

Keywords: nonuniform differential item functioning, item bias, category characteristic curve, item response modeling, partial credit model

Citation:
Dejanipont, B., & Wilson, M. (2024). A graphical framework using item response modeling to detect nonuniform differential item functioning. Journal of Applied Measurement, 25(3/4), x–x.

Download PDF
Supplemental Materials


Dirichlet Item Response Models for Multidimensional Compositional Items with Specific Objectivity

Chia-Wen Chen
The Psychometrics Centre, University of Cambridge
Wen-Chung Wang
National Taiwan Normal Universit
National Sun Yat-sen University
Beijing Normal University
The Education University of Hong Kong

Magdalena Mo Ching Mok
The Education University of Hong Kong
National Taichung University of Education

Compositional items belong to a forced-choice format in which respondents are requested to allocate a fixed total number of points to a set of statements in an item. Constant-sum scales using compositional items are applied in noncognitive tests to measure personality, values, and interests. Brown’s Thurstonian model was created for compositional items; however, the Thurstonian model lacks the good measurement property of specific objectivity. This problem means that the model does not allow for making comparisons between measures not located on a common scale. The aim of this study was to develop a new, practically feasible IRT model for compositional items that would be sufficient for the good measurement property of specific objectivity. The Dirichlet Rasch model was developed in the current study. The parameter estimation, standard error, Fisher information function, and evaluation of model fit to an empirical real data set were explored. The simulation studies presented the acceptable parameter recovery and precision of estimation in the various conditions of the newly developed model. An empirical study collecting data with a sample size of 512, taking an online value test based on Schwartz’s values theory, showed that the Dirichlet Rasch model had acceptable model-data fit, and the reliabilities were higher than 0.85.

Keywords: Compositional data, forced-choice items, ipsative data, Rasch model, item response theory

Citation:
Chen, C.-W., Wang, W.-C., & Mok, M. M. C. (2024). Dirichlet item response models for multidimensional compositional items with specific objectivity. Journal of Applied Measurement, 25(3/4), x–x.

Download PDF


Relating Selected Response to Constructed Response Items: Systematic Effects of Item Format

Mark Wilson
University of California, Berkeley
Weeraphat Suksiri
University of California, Berkeley
Linda Morell
University of California, Berkeley
Jonathan Osborne
Stanford University
Sara Dozier
California State University, Long Beach

In this study, we explore the relationship between constructed-response (CR) item types and selected-response (SR) item types. We constructed stem-equivalent sets of SR and CR items, designed to assess multiple (high) levels of competency in argumentation using the construct-modelling approach, which is based on a previously validated construct map for argumentation. We analyzed data obtained from 741 middle school and high school students who were randomly assigned to the two different assessment conditions (i.e., CR and SR). Our findings indicate that the two assessment conditions generate two different but correlated psychometric dimensions. In particular, the item difficulty parameters from the SR items are highly correlated with those from the paired CR items, indicating that both sets are consistent with the original construct map for argumentation. However, the CR items were much harder for the students than the SR items, which were the equivalent of a grade level, and appeared even more difficult to them. We interpret this finding to show that in the CR case, the students are hampered by the requirement to write their responses in sentences that communicate their higher-level reasoning and capabilities. Thus, their facility with expression is a problem only when constructed response items are used to assess student knowledge. We use these results to review the usage of the two item formats and find value for both uses.

Keywords: Assessment, constructed responses, higher-order reasoning, selected responses, item format effect

Citation:
Wilson, M., Suksiri, W., Morell, L., Osborne, J., & Dozier, S. (2024). Relating selected response to constructed response items: Systematic effects of item format. Journal of Applied Measurement, 25(3/4), x–x.

Download PDF


The Efficiency of Integrating Multidimensional Rasch Analysis with Machine Learning Algorithms to Predict Mathematical Proficiency Waypoints in Probability and Statistics

Putcharee Junpeng
Khon Kaen University
Mark Wilson
University of California, Berkeley  

This study presents a novel integration of the multidimensional random coefficients multinomial logit model (MRCMLM) with decision tree algorithms to enhance diagnostic efficiency in mathematical proficiency assessment. Analyzing responses from 495 seventh-grade Thai students on probability and statistics assessments, the research employed a two-dimensional framework evaluating Mathematical Procedures (MAP) and Structure of the Observed Learning Outcome (SLO). Following Wilson’s construct modeling approach, MRCMLM analysis established five proficiency waypoints through the Wright map calibration, with classifications serving as decision tree input features across depths 3, 4, and 5. Results demonstrated systematic accuracy improvements with increasing tree depth: depth-3 models achieved moderate performance (MAP: 0.79, SLO: 0.76), depth-4 models balanced efficiency with accuracy (MAP: 0.82, SLO: 0.83), and depth-5 models maximized diagnostic precision (MAP: 0.86, SLO: 0.86). However, depth-5 implementations face significant item pool constraints with only 10 available items, limiting unique diagnostic pathways and practical scalability. The SLO dimension consistently outperformed MAP across all configurations, exhibiting superior stability and error control. This represents the first systematic psychometric-algorithmic integration for mathematical proficiency assessment, establishing a replicable computational framework that substantially reduces testing burden while maintaining measurement quality. The findings support context-sensitive implementation strategies, contributing to Wang’s legacy of advancing practical educational measurement solutions.

Keywords: Multidimensional Rasch analysis, decision tree algorithms, mathematical proficiency diagnosis, computational psychometrics, educational measurement efficiency

Citation:
Junpeng, P., & Wilson, M. (2024). The efficiency of integrating multidimensional Rasch analysis with machine learning algorithms to predict mathematical proficiency waypoints in probability and statistics.  Journal of Applied Measurement, 25(3/4), x–x.

Download PDF


Engaging in Free Open Access Medical Education (FOAMed): Development and Validation of a FOAMed Engagement Instrument in Nephrology Fellows

Dana M. Larsen
University of California, San Francisco
San Francisco VA Medical Center
 
Perman Gochyyev
University of California, Berkeley
Christy K. Boscardin
University of California, San Francisco
Mark Wilson
University of California, Berkeley  

In 2017, a national survey of nephrology fellows identified that only 55% of nephrology fellows felt “fully prepared” for practice after fellowship (Rope et al., 2017). Free Open Access Medical education (FOAMed), defined as “networks of blogs and microblogs, videos, podcasts, and other freely available medical resources” in which “the exchange of information and ease of accessibility around the world allows for collaboration” (Colbert et al., 2018; Ting et al., 2020), offers potential solutions for over-burdened training programs to provide the necessary education to their fellows (Cadogan et al., 2014; T. M. Chan, Stehman, et al., 2020; Colbert et al., 2018; Nkomo et al., 2021; Ting et al., 2020). This study developed and investigated the validity of an instrument designed to measure learner engagement with FOAMed. It followed Wilson’s four building blocks for construct modeling approach, operationalized via a novel criterion-referenced engagement framework (Bond et al., 2020; Wilson, 2023). Item construction by experts was revised following initial pilot testing with extensive response process evaluation. Rasch analysis of the revised items was conducted on 404 US nephrology fellows’ responses, and item fit, reliability estimates, external validity correlation, and DIF supported the validity assessment of the final 6-item instrument. The resultant instrument provides a brief assessment of nephrology fellows’ engagement with FOAMed. Application of this assessment will allow future researchers to correlated FOAMed engagement with desired learning outcomes.

Keywords: Free open access medical education, nephrology

Citation:
Larsen, D. M., Gochyyev, P., Boscardin, C. K., & Wilson, M. (2024). Engaging in Free Open Access Medical Education (FOAMed): Development and validation of a FOAMed engagement instrument in nephrology Fellows.   Journal of Applied Measurement, 25(3/4), x–x.

Download PDF


Chinese Teachers’ Assessment Self-Efficacy and the Effects of Gender and Teaching Experience

Jinxin Zhu
The Education University of Hong Kong 

Understanding teachers’ assessment self-efficacy is critical in teaching. Whereas past studies showed unsatisfactory factor solutions for assessment self-efficacy, this study investigates the factorial structure and the psychometric properties of an instrument for teachers’ assessment self-efficacy. Moreover, past studies found mixed results regarding the effects of teaching experience and gender on assessment self-efficacy. This study addresses the gap with a sample of 158 in-service Chinese teachers (63.9% female) from Mainland China (22.2%) and Hong Kong (77.8%). Results showed one factor underlying Chinese teachers’ assessment self-efficacy and acceptable psychometric properties of the instrument, including acceptable item fits, good category structures, and good reliabilities. However, the study found items with differential item functioning related to location and gender. When comparing the teachers’ assessment self-efficacy across Hong Kong and Mainland China or across genders, researchers should pay special attention to these items. Finally, teaching experience is positively associated with assessment self-efficacy, but gender is not.

Keywords: Assessment self-efficacy; differential item functioning; teaching experience; gender difference

Citation:
Zhu, J. (2024). Chinese teachers’ assessment self-efficacy and the effects of gender and teaching experience. Journal of Applied Measurement, 25(3/4), x–x.

Download PDF


Using an Exploratory Item Response Modeling Approach to Develop a Teacher Continuing Professional Development Progress Variable

Jerred Jolin
Eastern Oregon University
Alexander Blum
Stanford University

Continuing professional development (CPD) involves ongoing learning activities undertaken by teachers to enhance their professional practice. Despite the importance of CPD for improving schools and enhancing student learning, there is limited research on the nature of CPD participation among rural school teachers. To address this gap, we administered the Continuing Professional Development Questionnaire (CPDQ) to 220 K-12 teachers from 15 rural school districts. We analyzed the data with a unidimensional partial credit model (PCM) to produce a CPD participation profile (CPP) for the sample of teachers, in the form of an item-person map. The CPP was developed by arranging the calibrated CPDQ items in an ascending order based on the locations of the first item threshold, which is the transition point within each item where respondents were more likely to report engaging in a given CPD activity at any frequency, versus never engaging in it. We then used the CPP to develop a CPD Engagement Progress Variable (CEPV), which defines four progressively more involved levels of CPD engagement. In support of the validity of the CEPV, we present evidence for the reliability and validity of the CPDQ. These findings contribute to the understanding of CPD engagement in rural schools by providing a picture of what CPD engagement actually looks like for this sample of rural teachers. The findings also have practical implications, such as a proposed sequence for the delivery of CPD activities for rural teachers by local educational agencies and providing logical “next steps” for CPD activities, based on a teacher’s location within the levels of the CEPV. The research that we report here is in the spirit of Professor Wen-Chung Wang’s work because it demonstrates the practical utility of using advanced measurement models in real-world educational settings to improve educational outcomes, which was a theme of much of his work.

Keywords: Item response model, continuing professional development, professional development questionnaire

Citation:
Jolin, J., & Blum, A. (2024). Using an exploratory item response modeling approach to develop a teacher continuing professional development progress variable. Journal of Applied Measurement, 25(3/4), x–x.

Download PDF