NLIP at BEA2025 Shared Task: Evaluation of Pedagogical Ability of AI Tutors

Trishita Saha, Shrenik Ganguli, Maunendra Sankar Desarkar


BEA Shared Task, SIGEDU, ACL 2025

Abstract

This paper presents our system entry in the Building Educational Applications (BEA) 2025 Shared Task on Pedagogical Ability Assessment of AI-powered Tutors. The task evaluates multiple dimensions of AI tutor responses within student-teacher educational dialogues, including mistake identification, mistake location, providing guidance, and actionability. Our approach leverages transformer-based models (Vaswani et al., 2017), especially DeBERTa and RoBERTa, and incorporates ordinal regression, threshold tuning, oversampling, and multi-task learning. Our best-performing systems are capable of assessing tutor response quality across all tracks. This highlights the effectiveness of tailored transformer architectures and pedagogically motivated training strategies for AI tutor evaluation.

NLIP at BEA2025 Shared Task: Evaluation of Pedagogical Ability of AI Tutors image

BibTeX

@inproceedings{saha_nlip_2025,
    address    = {Vienna, Austria},
    title      = {{NLIP} at {BEA} 2025 {Shared} {Task}: {Evaluation} of {Pedagogical} {Ability} of {AI} {Tutors}},
    isbn       = {979-8-89176-270-1},
    shorttitle = {{NLIP} at {BEA} 2025 {Shared} {Task}},
    url        = {https://aclanthology.org/2025.bea-1.99/},
    abstract   = {This paper describes the system created for the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-powered Tutors. The task aims to assess how well AI tutors identify and locate errors made by students, provide guidance and ensure actionability, among other features of their responses in educational dialogues. Transformer-based models, especially DeBERTa and RoBERTa, are improved by multitask learning, threshold tweaking, ordinal regression, and oversampling. The efficiency of pedagogically driven training methods and bespoke transformer models for evaluating AI tutor quality is demonstrated by the high performance of their best systems across all evaluation tracks.},
    urldate    = {2025-07-28},
    booktitle  = {Proceedings of the 20th {Workshop} on {Innovative} {Use} of {NLP} for {Building} {Educational} {Applications} ({BEA} 2025)},
    publisher  = {Association for Computational Linguistics},
    author     = {Saha, Trishita and Ganguli, Shrenik and Desarkar, Maunendra Sankar},
    editor     = {Kochmar, Ekaterina and Alhafni, Bashar and Bexte, Marie and Burstein, Jill and Horbach, Andrea and Laarmann-Quante, Ronja and Tack, Anaïs and Yaneva, Victoria and Yuan, Zheng},
    month      = jul,
    year       = {2025},
    pages      = {1242--1253},
}