Abstrakti

Text-form student feedback is an indispensable source of information for all university lecturers. Since manual analysis of such feedback is laborious, it has been suggested that text clustering methods could be used to automate the process. However, the success of text clustering depends heavily on the vector space presentation of documents. In this paper, a comprehensive evaluation of eight vector space models (VSMs) is presented in combination with different linguistic preprocessing techniques in English and Finnish student feedback data. The results show that the choice of VSM has a strong effect on the clustering performance. The models based on short and long character n-grams work best, while word2vec models perform worst. In general, stop word removal has a positive effect, while stemming and lemmatization may be detrimental with many VSMs. The main themes of the data could be well identified from cluster centroids. An alternative approach of describing clusters by frequent word n-grams worked also well for sufficiently large, distinct classes with clear keywords.
AlkuperäiskieliEnglanti
OtsikkoICBDE '24: Proceedings of the 2024 7th International Conference on Big Data and Education
JulkaisupaikkaNew York
KustantajaACM
Sivut57-64
Sivumäärä8
ISBN (elektroninen)979-8-4007-1698-0
DOI - pysyväislinkit
TilaJulkaistu - 24 tammik. 2025
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInternational Conference on Big Data and Education - Trinity College, University of Oxford, Oxford, Iso-Britannia
Kesto: 24 syysk. 202426 syysk. 2024
Konferenssinumero: 7
https://icbde.org/

Conference

ConferenceInternational Conference on Big Data and Education
LyhennettäICBDE
Maa/AlueIso-Britannia
KaupunkiOxford
Ajanjakso24/09/202426/09/2024
www-osoite

Sormenjälki

Sukella tutkimusaiheisiin 'Clustering students’ text form feedback data: comparison of eight vector space models'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä