Abstract

Text-form student feedback is an indispensable source of information for all university lecturers. Since manual analysis of such feedback is laborious, it has been suggested that text clustering methods could be used to automate the process. However, the success of text clustering depends heavily on the vector space presentation of documents. In this paper, a comprehensive evaluation of eight vector space models (VSMs) is presented in combination with different linguistic preprocessing techniques in English and Finnish student feedback data. The results show that the choice of VSM has a strong effect on the clustering performance. The models based on short and long character n-grams work best, while word2vec models perform worst. In general, stop word removal has a positive effect, while stemming and lemmatization may be detrimental with many VSMs. The main themes of the data could be well identified from cluster centroids. An alternative approach of describing clusters by frequent word n-grams worked also well for sufficiently large, distinct classes with clear keywords.
Original languageEnglish
Title of host publicationICBDE '24: Proceedings of the 2024 7th International Conference on Big Data and Education
Place of PublicationNew York
PublisherACM
Pages57-64
Number of pages8
ISBN (Electronic)979-8-4007-1698-0
DOIs
Publication statusPublished - 24 Jan 2025
MoE publication typeA4 Conference publication
EventInternational Conference on Big Data and Education - Trinity College, University of Oxford, Oxford, United Kingdom
Duration: 24 Sept 202426 Sept 2024
Conference number: 7
https://icbde.org/

Conference

ConferenceInternational Conference on Big Data and Education
Abbreviated titleICBDE
Country/TerritoryUnited Kingdom
CityOxford
Period24/09/202426/09/2024
Internet address

Keywords

  • clustering
  • vector space models
  • student feedback

Fingerprint

Dive into the research topics of 'Clustering students’ text form feedback data: comparison of eight vector space models'. Together they form a unique fingerprint.

Cite this