Testing the significance of patterns with complex null hypotheses

Niko Vuokko

    Research output: ThesisDoctoral ThesisCollection of Articles

    Abstract

    In data mining large amounts of data are searched through for useful information, pieces of which are called patterns. Significance testing is an important part of this task as the found patterns need to be assessed for their relevance and significance before further actions. Advances in science have brought along the need to evaluate the significance of complicated data patterns within complicated datasets. Significance testing has been historically conducted with specialized methods that cannot be adapted to new applications and many of these methods have problems with their theoretical justification. This thesis suggests using the framework of property-based randomization for building reliable and flexible significance testing tools that can be adapted and extended for a wide variety of applications. The concepts of representation-based randomization and iterative pattern mining are also discussed as ways to enlarge the scope of these tools. The final chapter of the thesis makes a review of the use of these general ideas in various applications such as databases and time series collections. The publications of the thesis are discussed along with selected introductions to other randomization methods that have been proposed.
    Translated title of the contributionMonimutkaisten nollahypoteesien käyttö tietohahmojen merkitsevyyden arvioinnissa
    Original languageEnglish
    QualificationDoctor's degree
    Awarding Institution
    • Aalto University
    Supervisors/Advisors
    • Mannila, Heikki, Supervising Professor
    • Kaski, Petteri, Thesis Advisor
    Publisher
    Print ISBNs978-952-60-4494-1
    Electronic ISBNs978-952-60-4495-8
    Publication statusPublished - 2012
    MoE publication typeG5 Doctoral dissertation (article)

    Keywords

    • data mining
    • significance testing
    • randomization
    • null hypothesis
    • null model
    • Markov chain Monte Carlo
    • frequent pattern
    • clustering
    • classification
    • time series

    Fingerprint Dive into the research topics of 'Testing the significance of patterns with complex null hypotheses'. Together they form a unique fingerprint.

    Cite this