types2: Exploring word-frequency differences in corpora

Tanja Säily, Jukka Suomela

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaChapterScientificvertaisarvioitu


We demonstrate the use of the types2 tool to explore, visualize, and assess the significance of variation in word frequencies. Based on accumulation curves and the statistical technique of permutation testing, this freely available tool is especially well suited to the study of types and hapax legomena, which are common measures of morphological productivity and lexical diversity. We have developed a new version of the tool that provides improved linking between the visualizations, metadata, and corpus texts, which facilitates the analysis of rich data.

The new version of our tool is demonstrated using two data sets extracted from the Corpora of Early English Correspondence (CEEC) and the British National Corpus (BNC), both of which are rich in sociolinguistic metadata. We show how to use our software to analyse such data sets, and how the new version of our tool can turn the results into interactive web pages with visualizations that are linked to the underlying data and metadata. Our paper illustrates how the linked data facilitates exploring and interpreting the results.
OtsikkoBig and Rich Data in English Corpus Linguistics, Methods and Explorations
KustantajaResearch Unit for Variation, Contacts and Change in English (VARIENG)
TilaJulkaistu - 2017
OKM-julkaisutyyppiA3 Kirjan osa tai toinen tutkimuskirja


NimiStudies in Variation, Contacts and Change in English
KustantajaResearch Unit for Variation, Contacts and Change in English (VARIENG), University of Helsinki
ISSN (elektroninen)1797-4453


Sukella tutkimusaiheisiin 'types2: Exploring word-frequency differences in corpora'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä