Item Response Theory in ML

Item Response Theory (IRT) is a suite of methods in educational psychometrics used to estimate characteristics of test questions and students from assessment scores. IRT has also become a popular tool in machine learning research for analysing the characteristics of learning algorithms and test instances. This set of papers explores the use of IRT in machine learning contexts.

Fairness evaluation with Item Response Theory

Authors: Ziqi Xu, Sevvandi Kandanaarachchi, Cheng Soon Ong, Eirini Ntoutsi
Venue: WWW 2025
Fairness evaluation with IRT
TLDR: This paper proposes an IRT framework to evaluate fairness in a suite of classification and regression models.

Fair-IRT — The general scenario in fairness evaluation.

Item Response Theory (IRT) is widely used in educational psychometrics to model student ability and question difficulty, and more recently to evaluate the performance of machine learning models. This work introduces the first use of IRT for fairness evaluation, proposing a new framework called Fair‑IRT that jointly assesses predictive models and individuals. The framework captures a model’s ability to make fair predictions, as well as individual‑level difficulty and discrimination effects that influence outcomes. Experiments and real‑world case studies demonstrate that Fair‑IRT provides meaningful insights into fairness across both classification and regression tasks.

An Item Response Theory-based R module for algorithm portfolio analysis

Authors: Brodie Oldfield, Sevvandi Kandanaarachchi, Ziqi Xu, Mario Andrés Muñoz
Venue: SoftwareX 2025
AIRT-Module
AIRT‑Module is an Item Response Theory–based tool for evaluating algorithm portfolios. It provides fine‑grained insights into algorithm strengths and weaknesses by modelling both algorithm behaviour and test‑instance difficulty. Delivered as a Shiny web app and an R package, AIRT‑Module enables more comprehensive and interpretable experimental evaluation across diverse tasks. AIRT Shiny App

Comprehensive algorithm portfolio evaluation using Item Response Theory

Authors: Sevvandi Kandanaarachchi, Kate Smith-Miles
Venue: Journal of Machine Learning Research, 2023
TLDR: This paper proposes an IRT framework in an inverted setting, which allows itself to gain more insights on algorithms.
Comprehensive algorithm portfolio evaluation with Item Response Theory

Algorithmic-IRT — Mapping IRT to algorithm evaluation.

Item Response Theory (IRT), originally developed in educational psychometrics, has recently been adapted to evaluate machine learning algorithms. This work presents a modified IRT‑based framework for analysing algorithm portfolios across multiple datasets, uncovering interpretable characteristics such as consistency and anomalousness. The approach provides clear, explainable insights into algorithm strengths and weaknesses without requiring additional dataset features, and is shown to be broadly applicable across diverse problem domains.

Unsupervised anomaly detection ensembles using Item Response Theory

Author: Sevvandi Kandanaarachchi
Venue: Information Sciences, 2022
TLDR: This paper proposes an unsupervised anomaly detection ensemble using Item Response Theory.
Unsupervised anomaly detection ensembles using Item Response Theory

Anomaly-IRT — A simple example with anomalies in the annulus. Dark red signifies larger anomaly scores. This comparison with other unsupervised anomaly detection ensemble methods show that IRT achieves better performance.

Ensemble learning combines multiple models to improve predictive performance, but building ensembles for unsupervised anomaly detection is challenging due to the lack of labels. This work introduces an Item Response Theory (IRT)–based ensemble that leverages latent traits to infer hidden structure in the data. The approach downplays noisy detectors, emphasises more informative ones, and consistently outperforms existing ensemble methods—even when individual detectors are weakly correlated.