Abstract
Text-form student feedback is an indispensable source of information for all university lecturers. Since manual analysis of such feedback is laborious, it has been suggested that text clustering methods could be used to automate the process. However, the success of text clustering depends heavily on the vector space presentation of documents. In this paper, a comprehensive evaluation of eight vector space models (VSMs) is presented in combination with different linguistic preprocessing techniques in English and Finnish student feedback data. The results show that the choice of VSM has a strong effect on the clustering performance. The models based on short and long character n-grams work best, while word2vec models perform worst. In general, stop word removal has a positive effect, while stemming and lemmatization may be detrimental with many VSMs. The main themes of the data could be well identified from cluster centroids. An alternative approach of describing clusters by frequent word n-grams worked also well for sufficiently large, distinct classes with clear keywords.
Original language | English |
---|---|
Title of host publication | ICBDE '24: Proceedings of the 2024 7th International Conference on Big Data and Education |
Place of Publication | New York |
Publisher | ACM |
Pages | 57-64 |
Number of pages | 8 |
ISBN (Electronic) | 979-8-4007-1698-0 |
DOIs | |
Publication status | Published - 24 Jan 2025 |
MoE publication type | A4 Conference publication |
Event | International Conference on Big Data and Education - Trinity College, University of Oxford, Oxford, United Kingdom Duration: 24 Sept 2024 → 26 Sept 2024 Conference number: 7 https://icbde.org/ |
Conference
Conference | International Conference on Big Data and Education |
---|---|
Abbreviated title | ICBDE |
Country/Territory | United Kingdom |
City | Oxford |
Period | 24/09/2024 → 26/09/2024 |
Internet address |
Keywords
- clustering
- vector space models
- student feedback