TY - GEN
T1 - Optimizing Feature Selection for Unbalanced Metabolomics Data with A Background Factor : A Comparative Study in Parkinson's Disease
AU - Zhang, Yinjia
AU - Hämäläinen, Wilhelmiina
AU - Reinikka, Paavo
AU - Herukka, Sanna Kaisa
AU - Leinonen, Ville
AU - Lehtonen, Marko
AU - Lehtonen, Šárka
N1 - Publisher Copyright: © 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Exploring associations between molecular features and categorical factors, like disease status, is a crucial task in metabolomics data analysis, demanding exceptional rigor. However, data imbalance and crossed factorial design complicate the selection of appropriate analysis settings. This paper presents a comparative study of candidate analysis settings for unbalanced metabolomics data, focusing on Parkinson's disease with gender as a background factor. The study evaluates two statistical test methods, pairwise t-tests and 2-way ANOVA, combined with two multiple hypothesis correction strategies: global correction and two-stage correction. Additionally, an unconditional analysis setting is examined, which can be easily misapplied due to its disregard for data imbalance and factor interactions. We profile the characteristics of each setting through experiments on real-world datasets from human samples and provide practical guidance for selecting appropriate settings to enhance the reliability of metabolomics studies.
AB - Exploring associations between molecular features and categorical factors, like disease status, is a crucial task in metabolomics data analysis, demanding exceptional rigor. However, data imbalance and crossed factorial design complicate the selection of appropriate analysis settings. This paper presents a comparative study of candidate analysis settings for unbalanced metabolomics data, focusing on Parkinson's disease with gender as a background factor. The study evaluates two statistical test methods, pairwise t-tests and 2-way ANOVA, combined with two multiple hypothesis correction strategies: global correction and two-stage correction. Additionally, an unconditional analysis setting is examined, which can be easily misapplied due to its disregard for data imbalance and factor interactions. We profile the characteristics of each setting through experiments on real-world datasets from human samples and provide practical guidance for selecting appropriate settings to enhance the reliability of metabolomics studies.
KW - biomarker
KW - data analysis
KW - metabolomics
KW - multiple hypothesis correction
KW - statistical test
UR - http://www.scopus.com/inward/record.url?scp=85217277090&partnerID=8YFLogxK
U2 - 10.1109/BIBM62325.2024.10821949
DO - 10.1109/BIBM62325.2024.10821949
M3 - Conference article in proceedings
AN - SCOPUS:85217277090
T3 - Proceedings (IEEE international conference on bioinformatics and biomedicine)
SP - 5884
EP - 5891
BT - Proceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024
A2 - Cannataro, Mario
A2 - Zheng, Huiru
A2 - Gao, Lin
A2 - Cheng, Jianlin
A2 - de Miranda, Joao Luis
A2 - Zumpano, Ester
A2 - Hu, Xiaohua
A2 - Cho, Young-Rae
A2 - Park, Taesung
PB - IEEE
T2 - IEEE International Conference on Bioinformatics and Biomedicine
Y2 - 3 December 2024 through 6 December 2024
ER -