Evaluating Language Models for Generating and Judging Programming Feedback

Charles Koutcheme, Nicola Dainese, Sami Sarsa, Arto Hellas, Juho Leinonen, Syed Ashraf, Paul Denny

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaConference article in proceedingsScientificvertaisarvioitu

7 Lataukset (Pure)

Abstrakti

The emergence of large language models (LLMs) has transformed research and practice across a wide range of domains. Within the computing education research (CER) domain, LLMs have garnered significant attention, particularly in the context of learning programming. Much of the work on LLMs in CER, however, has focused on applying and evaluating proprietary models. In this article, we evaluate the efficiency of open-source LLMs in generating high-quality feedback for programming assignments and judging the quality of programming feedback, contrasting the results with proprietary models. Our evaluations on a dataset of students’ submissions to introductory Python programming exercises suggest that state-of-the-art open-source LLMs are nearly on par with proprietary models in both generating and assessing programming feedback. Additionally, we demonstrate the efficiency of smaller LLMs in these tasks and highlight the wide range of LLMs accessible, even for free, to educators and practitioners.

AlkuperäiskieliEnglanti
OtsikkoSIGCSE TS 2025 - Proceedings of the 56th ACM Technical Symposium on Computer Science Education
KustantajaACM
Sivut624-630
Sivumäärä7
Vuosikerta1
ISBN (elektroninen)979-8-4007-0531-1
DOI - pysyväislinkit
TilaJulkaistu - 18 helmik. 2025
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaACM Technical Symposium on Computer Science Education - Pittsburgh, Yhdysvallat
Kesto: 26 helmik. 20251 maalisk. 2025
Konferenssinumero: 56

Conference

ConferenceACM Technical Symposium on Computer Science Education
LyhennettäSIGCSE
Maa/AlueYhdysvallat
KaupunkiPittsburgh
Ajanjakso26/02/202501/03/2025

Sormenjälki

Sukella tutkimusaiheisiin 'Evaluating Language Models for Generating and Judging Programming Feedback'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä