Abstrakti
Text-form student feedback is an indispensable source of information for all university lecturers. Since manual analysis of such feedback is laborious, it has been suggested that text clustering methods could be used to automate the process. However, the success of text clustering depends heavily on the vector space presentation of documents. In this paper, a comprehensive evaluation of eight vector space models (VSMs) is presented in combination with different linguistic preprocessing techniques in English and Finnish student feedback data. The results show that the choice of VSM has a strong effect on the clustering performance. The models based on short and long character n-grams work best, while word2vec models perform worst. In general, stop word removal has a positive effect, while stemming and lemmatization may be detrimental with many VSMs. The main themes of the data could be well identified from cluster centroids. An alternative approach of describing clusters by frequent word n-grams worked also well for sufficiently large, distinct classes with clear keywords.
Alkuperäiskieli | Englanti |
---|---|
Otsikko | ICBDE '24: Proceedings of the 2024 7th International Conference on Big Data and Education |
Julkaisupaikka | New York |
Kustantaja | ACM |
Sivut | 57-64 |
Sivumäärä | 8 |
ISBN (elektroninen) | 979-8-4007-1698-0 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 24 tammik. 2025 |
OKM-julkaisutyyppi | A4 Artikkeli konferenssijulkaisussa |
Tapahtuma | International Conference on Big Data and Education - Trinity College, University of Oxford, Oxford, Iso-Britannia Kesto: 24 syysk. 2024 → 26 syysk. 2024 Konferenssinumero: 7 https://icbde.org/ |
Conference
Conference | International Conference on Big Data and Education |
---|---|
Lyhennettä | ICBDE |
Maa/Alue | Iso-Britannia |
Kaupunki | Oxford |
Ajanjakso | 24/09/2024 → 26/09/2024 |
www-osoite |