Quantifying variation and estimating the effects of sample size on the frequencies of linguistic variables

Heikki Mannila*, Terttu Nevalainen, Helena Raumolin-Brunberg

*Corresponding author for this work

    Research output: Chapter in Book/Report/Conference proceedingChapterScientificpeer-review

    3 Citations (Scopus)

    Abstract

    The work we report in this chapter began with the aim of finding techniques to minimize the problems that arise from small data samples in fields such as historical sociolinguistics. However, the solutions we propose are not limited to historical sociolinguistics, but are applicable to quantitative sociolinguistic and corpus studies in general. Establishing the frequency of given linguistic forms is a crucial issue in studying differences in linguistic usage between populations or points in time. In its simplest form, the question can be posed as follows: suppose there are two alternative forms, A and B, of a linguistic variable - alternative pronunciations, words or phrases meaning the same, functionally equivalent grammatical structures - what is the frequency of use of each? The basic questions we address include the use of aggregate data and its relation to individual variation when individuals contribute different amounts of data to the aggregate. The other problem we discuss is similarly a fundamental one: what is the minimum sample size - number of speakers, writers or texts, depending on the research topic - that is required to yield consistent results for a given linguistic variable? For a historical sociolinguist using a public corpus, this may be a question of a scarcity of data due to a high rate of illiteracy in a particular period. For sociolinguists who have to elicit their interview data, it is an issue of research economy. In Tagliamonte’s words (2006: 33): ‘The size of the sample must necessarily be balanced with the available time and resources for data handling.’ Looking back at 40 years of sociolinguistic research, Labov (2006 [1st edn. 1966]: 400-401) notes that the analysis of the stratification by age, gender and social class of a given city has usually required 60-100 speakers. Without introducing any testing of sample size, he considers the 120 speakers used in a Montreal study to be ideal, although he emphasizes the care with which the sampling was designed.

    Original languageEnglish
    Title of host publicationResearch Methods in Language Variation and Change
    PublisherCambridge University Press
    Pages337-360
    Number of pages24
    ISBN (Electronic)9780511792519
    ISBN (Print)9781107004900
    DOIs
    Publication statusPublished - 1 Jan 2013
    MoE publication typeA3 Book section, Chapters in research books

    Fingerprint

    Dive into the research topics of 'Quantifying variation and estimating the effects of sample size on the frequencies of linguistic variables'. Together they form a unique fingerprint.

    Cite this