G is for Generalisation: Predicting Student Success from Keystrokes

Zac Pullar-Strecker, Filipe Dwan Pereira, Paul Denny, Andrew Luxton-Reilly, Juho Leinonen

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

2 Citations (Scopus)
24 Downloads (Pure)

Abstract

Student performance prediction aims to build models to help educators identify struggling students so they can be better supported. However, prior work in the space frequently evaluates features and models on data collected from a single semester, of a single course, taught at a single university. Without evaluating these methods in a broader context there is an open question of whether or not performance prediction methods are capable of generalising to new data. We test three methods for evaluating student performance models on data from introductory programming courses from two universities with a total of 3,323 students. Our results suggest that using cross-validation on one semester is insufficient for gauging model performance in the real world. Instead, we suggest that where possible future work in student performance prediction collects data from multiple semesters and uses one or more as a distinct hold-out set. Failing this, bootstrapped cross-validation should be used to improve confidence in models' performance. By recommending stronger methods for evaluating performance prediction models, we hope to bring them closer to practical use and assist teachers to understand struggling students in novice programming courses.

Original languageEnglish
Title of host publicationSIGCSE 2023 - Proceedings of the 54th ACM Technical Symposium on Computer Science Education
PublisherACM
Pages1028-1034
Number of pages7
ISBN (Electronic)978-1-4503-9431-4
DOIs
Publication statusPublished - 2 Mar 2023
MoE publication typeA4 Conference publication
EventACM Technical Symposium on Computer Science Education - Toronto, Canada
Duration: 15 Mar 202318 Mar 2023
Conference number: 54

Conference

ConferenceACM Technical Symposium on Computer Science Education
Abbreviated titleSIGCSE
Country/TerritoryCanada
CityToronto
Period15/03/202318/03/2023

Keywords

  • computing education
  • educational data mining
  • learning analytics
  • predicting performance
  • programming process data

Fingerprint

Dive into the research topics of 'G is for Generalisation: Predicting Student Success from Keystrokes'. Together they form a unique fingerprint.

Cite this