types2: Exploring word-frequency differences in corpora

Tanja Säily, Jukka Suomela

Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaChapterScientificvertaisarvioitu

Abstrakti

We demonstrate the use of the types2 tool to explore, visualize, and assess the significance of variation in word frequencies. Based on accumulation curves and the statistical technique of permutation testing, this freely available tool is especially well suited to the study of types and hapax legomena, which are common measures of morphological productivity and lexical diversity. We have developed a new version of the tool that provides improved linking between the visualizations, metadata, and corpus texts, which facilitates the analysis of rich data.

The new version of our tool is demonstrated using two data sets extracted from the Corpora of Early English Correspondence (CEEC) and the British National Corpus (BNC), both of which are rich in sociolinguistic metadata. We show how to use our software to analyse such data sets, and how the new version of our tool can turn the results into interactive web pages with visualizations that are linked to the underlying data and metadata. Our paper illustrates how the linked data facilitates exploring and interpreting the results.
AlkuperäiskieliEnglanti
OtsikkoBig and Rich Data in English Corpus Linguistics, Methods and Explorations
KustantajaResearch Unit for Variation, Contacts and Change in English (VARIENG)
TilaJulkaistu - 2017
OKM-julkaisutyyppiA3 Kirjan osa tai toinen tutkimuskirja

Julkaisusarja

NimiStudies in Variation, Contacts and Change in English
KustantajaResearch Unit for Variation, Contacts and Change in English (VARIENG), University of Helsinki
Vuosikerta19
ISSN (elektroninen)1797-4453

Sormenjälki

Sukella tutkimusaiheisiin 'types2: Exploring word-frequency differences in corpora'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

Siteeraa tätä