Abstract
Student performance prediction aims to build models to help educators identify struggling students so they can be better supported. However, prior work in the space frequently evaluates features and models on data collected from a single semester, of a single course, taught at a single university. Without evaluating these methods in a broader context there is an open question of whether or not performance prediction methods are capable of generalising to new data. We test three methods for evaluating student performance models on data from introductory programming courses from two universities with a total of 3,323 students. Our results suggest that using cross-validation on one semester is insufficient for gauging model performance in the real world. Instead, we suggest that where possible future work in student performance prediction collects data from multiple semesters and uses one or more as a distinct hold-out set. Failing this, bootstrapped cross-validation should be used to improve confidence in models' performance. By recommending stronger methods for evaluating performance prediction models, we hope to bring them closer to practical use and assist teachers to understand struggling students in novice programming courses.
Original language | English |
---|---|
Title of host publication | SIGCSE 2023 - Proceedings of the 54th ACM Technical Symposium on Computer Science Education |
Publisher | ACM |
Pages | 1028-1034 |
Number of pages | 7 |
ISBN (Electronic) | 978-1-4503-9431-4 |
DOIs | |
Publication status | Published - 2 Mar 2023 |
MoE publication type | A4 Conference publication |
Event | ACM Technical Symposium on Computer Science Education - Toronto, Canada Duration: 15 Mar 2023 → 18 Mar 2023 Conference number: 54 |
Conference
Conference | ACM Technical Symposium on Computer Science Education |
---|---|
Abbreviated title | SIGCSE |
Country/Territory | Canada |
City | Toronto |
Period | 15/03/2023 → 18/03/2023 |
Keywords
- computing education
- educational data mining
- learning analytics
- predicting performance
- programming process data