types2: Type and Hapax Accumulation Curves

    Dataset

    Description

    types 2 is a tool for analysing textual diversity, richness, and productivity in text corpora and other data sets.

    With this tool, we can analyse data sets from the perspective of the following statistics:

    number of words: the total number of running words in the text corpusnumber of tokens: the words of interest in our studynumber of types: how many tokens we have seennumber of hapaxes: how many tokens have occurred only once

    We are usually interested in comparing the number of types or hapaxes vs. the number of words or tokens. With types 2, it is possible to analyse the relationship between types, hapaxes, words, and tokens.

    The tool can be used for visualisation, statistical hypothesis testing, and exploratory data analysis. In the statistical analysis, we use nonparametric methods (more specifically, Monte Carlo permutation tests). The only modelling assumption is that, under the null hypothesis, individual “samples” are exchangeable.

    The software is written by Jukka Suomela, and the system is designed and developed in collaboration with Tanja Säily.

    The title and description of this software/code correspond with the situation when the software metadata was imported to ACRIS. The most recent version of metadata is available in the original repository.
    Date made available2014
    PublisherZenodo

    Dataset Licences

    • Other

    Cite this