A Statistical Framework for Hypothesis Testing in Real Data Comparison Studies

Anne Laure Boulesteix, Robert Hable, Sabine Lauer, Manuel J A Eugster

Research output: Contribution to journalArticleScientificpeer-review

25 Citations (Scopus)

Abstract

In computational sciences, including computational statistics, machine learning, and bioinformatics, it is often claimed in articles presenting new supervised learning methods that the new method performs better than existing methods on real data, for instance in terms of error rate. However, these claims are often not based on proper statistical tests and, even if such tests are performed, the tested hypothesis is not clearly defined and poor attention is devoted to the Type I and Type II errors. In the present article, we aim to fill this gap by providing a proper statistical framework for hypothesis tests that compare the performances of supervised learning methods based on several real datasets with unknown underlying distributions. After giving a statistical interpretation of ad hoc tests commonly performed by computational researchers, we devote special attention to power issues and outline a simple method of determining the number of datasets to be included in a comparison study to reach an adequate power. These methods are illustrated through three comparison studies from the literature and an exemplary benchmarking study using gene expression microarray data. All our results can be reproduced using R codes and datasets available from the companion website http://www.ibe.med.uni-muenchen.de/organisation/mitarbeiter/020_professuren/boulesteix/compstud2013.

Original languageEnglish
Pages (from-to)201-212
Number of pages12
JournalAMERICAN STATISTICIAN
Volume69
Issue number3
DOIs
Publication statusPublished - 3 Jul 2015
MoE publication typeA1 Journal article-refereed

Keywords

  • Benchmarking
  • Comparison
  • Performance
  • Supervised learning
  • Testing

Fingerprint

Dive into the research topics of 'A Statistical Framework for Hypothesis Testing in Real Data Comparison Studies'. Together they form a unique fingerprint.

Cite this