Quantifying variation and estimating the effects of sample size on the frequencies of linguistic variables

Heikki Mannila*, Terttu Nevalainen, Helena Raumolin-Brunberg

*Tämän työn vastaava kirjoittaja

    Tutkimustuotos: Artikkeli kirjassa/konferenssijulkaisussaChapterScientificvertaisarvioitu

    3 Sitaatiot (Scopus)


    The work we report in this chapter began with the aim of finding techniques to minimize the problems that arise from small data samples in fields such as historical sociolinguistics. However, the solutions we propose are not limited to historical sociolinguistics, but are applicable to quantitative sociolinguistic and corpus studies in general. Establishing the frequency of given linguistic forms is a crucial issue in studying differences in linguistic usage between populations or points in time. In its simplest form, the question can be posed as follows: suppose there are two alternative forms, A and B, of a linguistic variable - alternative pronunciations, words or phrases meaning the same, functionally equivalent grammatical structures - what is the frequency of use of each? The basic questions we address include the use of aggregate data and its relation to individual variation when individuals contribute different amounts of data to the aggregate. The other problem we discuss is similarly a fundamental one: what is the minimum sample size - number of speakers, writers or texts, depending on the research topic - that is required to yield consistent results for a given linguistic variable? For a historical sociolinguist using a public corpus, this may be a question of a scarcity of data due to a high rate of illiteracy in a particular period. For sociolinguists who have to elicit their interview data, it is an issue of research economy. In Tagliamonte’s words (2006: 33): ‘The size of the sample must necessarily be balanced with the available time and resources for data handling.’ Looking back at 40 years of sociolinguistic research, Labov (2006 [1st edn. 1966]: 400-401) notes that the analysis of the stratification by age, gender and social class of a given city has usually required 60-100 speakers. Without introducing any testing of sample size, he considers the 120 speakers used in a Montreal study to be ideal, although he emphasizes the care with which the sampling was designed.

    OtsikkoResearch Methods in Language Variation and Change
    KustantajaCambridge University Press
    ISBN (elektroninen)9780511792519
    ISBN (painettu)9781107004900
    DOI - pysyväislinkit
    TilaJulkaistu - 1 tammik. 2013
    OKM-julkaisutyyppiA3 Kirjan tai muun kokoomateoksen osa


    Sukella tutkimusaiheisiin 'Quantifying variation and estimating the effects of sample size on the frequencies of linguistic variables'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.

    Siteeraa tätä