Influence of multiple hypothesis testing on reproducibility in neuroimaging research: A simulation study and Python-based software

Research output: Contribution to journalArticleScientificpeer-review

Standard

Influence of multiple hypothesis testing on reproducibility in neuroimaging research : A simulation study and Python-based software. / Puoliväli, Tuomas; Palva, Satu; Palva, J. Matias.

In: Journal of Neuroscience Methods, Vol. 337, 108654, 01.05.2020.

Research output: Contribution to journalArticleScientificpeer-review

Harvard

APA

Vancouver

Author

Bibtex - Download

@article{03450f9dad73480b95ba86f80feddaa0,
title = "Influence of multiple hypothesis testing on reproducibility in neuroimaging research: A simulation study and Python-based software",
abstract = "Background: Reproducibility of research findings has been recently questioned in many fields of science, including psychology and neurosciences. One factor influencing reproducibility is the simultaneous testing of multiple hypotheses, which entails false positive findings unless the analyzed p-values are carefully corrected. While this multiple testing problem is well known and studied, it continues to be both a theoretical and practical problem. New method: Here we assess reproducibility in simulated experiments in the context of multiple testing. We consider methods that control either the family-wise error rate (FWER) or false discovery rate (FDR), including techniques based on random field theory (RFT), cluster-mass based permutation testing, and adaptive FDR. Several classical methods are also considered. The performance of these methods is investigated under two different models. Results: We found that permutation testing is the most powerful method among the considered approaches to multiple testing, and that grouping hypotheses based on prior knowledge can improve power. We also found that emphasizing primary and follow-up studies equally produced most reproducible outcomes. Comparison with existing method(s): We have extended the use of two-group and separate-classes models for analyzing reproducibility and provide a new open-source software “MultiPy” for multiple hypothesis testing. Conclusions: Our simulations suggest that performing strict corrections for multiple testing is not sufficient to improve reproducibility of neuroimaging experiments. The methods are freely available as a Python toolkit “MultiPy” and we aim this study to help in improving statistical data analysis practices and to assist in conducting power and reproducibility analyses for new experiments.",
keywords = "False discovery rate, Family-wise error rate, Multiple hypothesis testing, Neurophysiological data, Python, Reproducibility",
author = "Tuomas Puoliv{\"a}li and Satu Palva and Palva, {J. Matias}",
year = "2020",
month = "5",
day = "1",
doi = "10.1016/j.jneumeth.2020.108654",
language = "English",
volume = "337",
journal = "Journal of Neuroscience Methods",
issn = "0165-0270",
publisher = "Elsevier",

}

RIS - Download

TY - JOUR

T1 - Influence of multiple hypothesis testing on reproducibility in neuroimaging research

T2 - A simulation study and Python-based software

AU - Puoliväli, Tuomas

AU - Palva, Satu

AU - Palva, J. Matias

PY - 2020/5/1

Y1 - 2020/5/1

N2 - Background: Reproducibility of research findings has been recently questioned in many fields of science, including psychology and neurosciences. One factor influencing reproducibility is the simultaneous testing of multiple hypotheses, which entails false positive findings unless the analyzed p-values are carefully corrected. While this multiple testing problem is well known and studied, it continues to be both a theoretical and practical problem. New method: Here we assess reproducibility in simulated experiments in the context of multiple testing. We consider methods that control either the family-wise error rate (FWER) or false discovery rate (FDR), including techniques based on random field theory (RFT), cluster-mass based permutation testing, and adaptive FDR. Several classical methods are also considered. The performance of these methods is investigated under two different models. Results: We found that permutation testing is the most powerful method among the considered approaches to multiple testing, and that grouping hypotheses based on prior knowledge can improve power. We also found that emphasizing primary and follow-up studies equally produced most reproducible outcomes. Comparison with existing method(s): We have extended the use of two-group and separate-classes models for analyzing reproducibility and provide a new open-source software “MultiPy” for multiple hypothesis testing. Conclusions: Our simulations suggest that performing strict corrections for multiple testing is not sufficient to improve reproducibility of neuroimaging experiments. The methods are freely available as a Python toolkit “MultiPy” and we aim this study to help in improving statistical data analysis practices and to assist in conducting power and reproducibility analyses for new experiments.

AB - Background: Reproducibility of research findings has been recently questioned in many fields of science, including psychology and neurosciences. One factor influencing reproducibility is the simultaneous testing of multiple hypotheses, which entails false positive findings unless the analyzed p-values are carefully corrected. While this multiple testing problem is well known and studied, it continues to be both a theoretical and practical problem. New method: Here we assess reproducibility in simulated experiments in the context of multiple testing. We consider methods that control either the family-wise error rate (FWER) or false discovery rate (FDR), including techniques based on random field theory (RFT), cluster-mass based permutation testing, and adaptive FDR. Several classical methods are also considered. The performance of these methods is investigated under two different models. Results: We found that permutation testing is the most powerful method among the considered approaches to multiple testing, and that grouping hypotheses based on prior knowledge can improve power. We also found that emphasizing primary and follow-up studies equally produced most reproducible outcomes. Comparison with existing method(s): We have extended the use of two-group and separate-classes models for analyzing reproducibility and provide a new open-source software “MultiPy” for multiple hypothesis testing. Conclusions: Our simulations suggest that performing strict corrections for multiple testing is not sufficient to improve reproducibility of neuroimaging experiments. The methods are freely available as a Python toolkit “MultiPy” and we aim this study to help in improving statistical data analysis practices and to assist in conducting power and reproducibility analyses for new experiments.

KW - False discovery rate

KW - Family-wise error rate

KW - Multiple hypothesis testing

KW - Neurophysiological data

KW - Python

KW - Reproducibility

UR - http://www.scopus.com/inward/record.url?scp=85080150061&partnerID=8YFLogxK

U2 - 10.1016/j.jneumeth.2020.108654

DO - 10.1016/j.jneumeth.2020.108654

M3 - Article

AN - SCOPUS:85080150061

VL - 337

JO - Journal of Neuroscience Methods

JF - Journal of Neuroscience Methods

SN - 0165-0270

M1 - 108654

ER -

ID: 41505023