COMPARISON OF ITEM RESPONSE THEORY MODELS FOR AKM NUMERACY ASSESSMENT IN SENIOR HIGH SCHOOL STUDENTS IN SOUTH SULAWESI
DOI: https://doi.org/10.30605/tcxgxk51
Item response theory, Rasch model, 2PL, 3PL, Numeracy assessment
Abstract
The Asesmen Kompetensi Minimum (AKM) constitutes the cornerstone of Indonesia's national large-scale assessment framework, designed to measure foundational numeracy competencies across the student population. Selecting the most appropriate psychometric model for calibrating AKM items is critical for ensuring valid score interpretations, equitable measurement, and evidence-based instructional policy. This study presents an empirical comparison of three Item Response Theory (IRT) models—the one-parameter logistic (Rasch) model, the two-parameter logistic (2PL) model, and the three-parameter logistic (3PL) model—applied to a 30-item AKM numeracy instrument administered to 500 senior high school students in South Sulawesi, Indonesia. Parameter estimation, model data fit, and measurement precision were evaluated using marginal maximum likelihood (MML) methods. Results revealed that The Rasch model produced the lowest Akaike Information Criterion (AIC = 15,178.11) and Bayesian Information Criterion (BIC = 15,304.54), alongside the highest marginal test information (TIF = 5.427) and reliability (.844), indicating superior parsimony and precision relative to the 2PL and 3PL models. Item difficulty parameters ranged from b = −2.788 (Item 23) to b = 0.541 (Item 22), reflecting the adequate breadth of the numeracy construct. The 2PL yielded the smallest mean chi-square item misfit, whereas the 3PL introduced unnecessary parameter complexity without meaningful gain-in-fit. These findings suggest that the Rasch model is the preferred framework for operational AKM calibration, with practical guidance provided for contexts in which 2PL or 3PL models may be appropriate.
Downloads
References
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723. https://doi.org/10.1109/TAC.1974.1100705
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. AERA.
Azwar, S. (2016). Penyusunan skala psikologi [Development of psychological scales] (2nd ed.). Pustaka Pelajar.
Badan Standar, Kurikulum, dan Asesmen Pendidikan. (2022). Laporan teknis Asesmen Nasional 2022 [Technical report of National Assessment 2022]. Kementerian Pendidikan, Kebudayaan, Riset, dan Teknologi.
Baker, F. B. (2001). The basics of item response theory (2nd ed.). ERIC Clearinghouse on Assessment and Evaluation.
Baker, F. B., & Kim, S.-H. (2017). The basics of item response theory using R. Springer. https://doi.org/10.1007/978-3-319-54205-9
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Addison-Wesley.
Bond, T. G., Yan, Z., & Heene, M. (2021). Applying the Rasch model: Fundamental measurement in the human sciences (4th ed.). Routledge. https://doi.org/10.4324/9780429030499
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). Springer.
Carstensen, C. H. (2013). Linking PISA competencies over three cycles—Results from Germany. In M. Prenzel, M. Kobarg, K. Schöps, & S. Rönnebeck (Eds.), Research on PISA (pp. 199–213). Springer. https://doi.org/10.1007/978-94-007-4458-5_12
de Ayala, R. J. (2022). The theory and practice of item response theory (2nd ed.). Guilford Press.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Lawrence Erlbaum Associates.
Gal, I., Grotlüschen, A., Tout, D., & Kaiser, G. (2020). Numeracy, adult education, and vulnerable adults: A critical view of a neglected field. ZDM Mathematics Education, 52(3), 377–394. https://doi.org/10.1007/s11858-020-01155-9
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.
Kementerian Pendidikan dan Kebudayaan. (2020). Asesmen Kompetensi Minimum: Panduan teknis [Minimum Competency Assessment: Technical guide]. Kemendikbud.
Kiefer, T., Mayer, A., & Zeileis, A. (2023). TAM: Test analysis modules (R package version 4.1-4). https://CRAN.R-project.org/package=TAM
Kreiner, S., & Christensen, K. B. (2014). Analyses of model fit and robustness: A new look at the PISA scaling model underlying ranking of countries according to reading literacy. Psychometrika, 79(2), 210–231. https://doi.org/10.1007/s11336-013-9347-z
Liu, Y., & Zumbo, B. D. (2007). The impact of outliers on Cronbach's coefficient alpha estimate of reliability: Visual analogue scales. Educational and Psychological Measurement, 67(4), 620–634. https://doi.org/10.1177/0013164406296976
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates.
Mair, P., & Hatzinger, R. (2007). Extended Rasch modeling: The eRm package for the application of IRT models in R. Journal of Statistical Software, 20(9), 1–20. https://doi.org/10.18637/jss.v020.i09
Mardapi, D. (2012). Pengukuran penilaian dan evaluasi pendidikan [Educational measurement, assessment, and evaluation]. Nuha Medika.
Masters, G. N. (2022). National assessment programs: Their purposes and limitations. Assessment in Education: Principles, Policy & Practice, 29(4), 396–413. https://doi.org/10.1080/0969594X.2022.2116157
Mullis, I. V. S., Martin, M. O., Foy, P., Kelly, D. L., & Fishbein, B. (2020). TIMSS 2019 international results in mathematics and science. TIMSS & PIRLS International Study Center.
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). McGraw-Hill.
Nylund, K. L., Asparouhov, T., & Muthén, B. O. (2007). Deciding on the number of classes in latent class analysis and growth mixture modeling: A Monte Carlo simulation study. Structural Equation Modeling, 14(4), 535–569. https://doi.org/10.1080/10705510701575396
OECD. (2017). PISA 2015 technical report. OECD Publishing. https://doi.org/10.1787/9789264255425-en
OECD. (2019). PISA 2018 assessment and analytical framework. OECD Publishing. https://doi.org/10.1787/b25efab8-en
Purwanto, A., Pambudi, A., & Lestari, I. (2021). Analisis butir soal menggunakan model Rasch pada instrumen asesmen literasi numerasi [Item analysis using the Rasch model for numeracy literacy assessment instruments]. Jurnal Pengukuran Psikologi dan Pendidikan Indonesia, 10(1), 45–58. https://doi.org/10.15408/jp3i.v10i1.20123
R Core Team. (2024). R: A language and environment for statistical computing (Version 4.4.0). R Foundation for Statistical Computing. https://www.R-project.org/
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Danish Institute for Educational Research.
Reckase, M. D. (2009). Multidimensional item response theory. Springer. https://doi.org/10.1007/978-0-387-89976-3
Reise, S. P., & Waller, N. G. (1990). Fitting the two-parameter model to personality data. Applied Psychological Measurement, 14(1), 45–58. https://doi.org/10.1177/014662169001400105
Rizopoulos, D. (2006). ltm: An R package for latent variable modelling and item response analysis. Journal of Statistical Software, 17(5), 1–25. https://doi.org/10.18637/jss.v017.i05
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464. https://doi.org/10.1214/aos/1176344136
Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51(4), 567–577. https://doi.org/10.1007/BF02295596
van der Linden, W. J. (Ed.). (2016). Handbook of item response theory: Vol. 1. Models. CRC Press. https://doi.org/10.1201/9781315374512
Vrieze, S. I. (2012). Model selection and psychological theory: A discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychological Methods, 17(2), 228–243. https://doi.org/10.1037/a0027127
Waller, N. G., & Reise, S. P. (2010). Measuring psychopathology with non-standard IRT models: Fitting the four-parameter model to the Minnesota Multiphasic Personality Inventory. In S. E. Embretson (Ed.), Measuring psychological constructs (pp. 147–173). APA. https://doi.org/10.1037/12074-007
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. MESA Press.
Wright, B. D., & Stone, M. H. (1979). Best test design: Rasch measurement. MESA Press.
Wu, M., Tam, H. P., & Jen, T.-H. (2016). Educational measurement for applied researchers: Theory into practice. Springer. https://doi.org/10.1007/978-981-10-3302-5
Yang, C., & Mao, X. (2014). Model selection in IRT: A comparison of model selection criteria and data recovery. Applied Psychological Measurement, 38(2), 105–122. https://doi.org/10.1177/0146621613490218
Yen, W. M. (1981). Using simulation results to choose a latent trait model. Applied Psychological Measurement, 5(2), 245–262. https://doi.org/10.1177/014662168100500212.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Pedagogy : Jurnal Pendidikan Matematika

This work is licensed under a Creative Commons Attribution 4.0 International License.
In submitting the manuscript to the journal, the authors certify that:
- They are authorized by their co-authors to enter into these arrangements.
- The work described has not been formally published before, except in the form of an abstract or as part of a published lecture, review, thesis, or overlay journal.
- That it is not under consideration for publication elsewhere,
- That its publication has been approved by all the author(s) and by the responsible authorities – tacitly or explicitly – of the institutes where the work has been carried out.
- They secure the right to reproduce any material that has already been published or copyrighted elsewhere.
- They agree to the following license and copyright agreement.
License and Copyright Agreement
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution License (CC BY 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.













