Optimizing Feature Selection for Unbalanced Metabolomics Data with A Background Factor : A Comparative Study in Parkinson's Disease

Yinjia Zhang*, Wilhelmiina Hämäläinen, Paavo Reinikka, Sanna Kaisa Herukka, Ville Leinonen, Marko Lehtonen, Šárka Lehtonen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference article in proceedingsScientificpeer-review

3 Downloads (Pure)

Abstract

Exploring associations between molecular features and categorical factors, like disease status, is a crucial task in metabolomics data analysis, demanding exceptional rigor. However, data imbalance and crossed factorial design complicate the selection of appropriate analysis settings. This paper presents a comparative study of candidate analysis settings for unbalanced metabolomics data, focusing on Parkinson's disease with gender as a background factor. The study evaluates two statistical test methods, pairwise t-tests and 2-way ANOVA, combined with two multiple hypothesis correction strategies: global correction and two-stage correction. Additionally, an unconditional analysis setting is examined, which can be easily misapplied due to its disregard for data imbalance and factor interactions. We profile the characteristics of each setting through experiments on real-world datasets from human samples and provide practical guidance for selecting appropriate settings to enhance the reliability of metabolomics studies.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024
EditorsMario Cannataro, Huiru Zheng, Lin Gao, Jianlin Cheng, Joao Luis de Miranda, Ester Zumpano, Xiaohua Hu, Young-Rae Cho, Taesung Park
PublisherIEEE
Pages5884-5891
Number of pages8
ISBN (Electronic)979-8-3503-8622-6
DOIs
Publication statusPublished - 2024
MoE publication typeA4 Conference publication
EventIEEE International Conference on Bioinformatics and Biomedicine - Lisbon, Portugal
Duration: 3 Dec 20246 Dec 2024

Publication series

NameProceedings (IEEE international conference on bioinformatics and biomedicine)
ISSN (Electronic)2156-1133

Conference

ConferenceIEEE International Conference on Bioinformatics and Biomedicine
Abbreviated titleBIBM
Country/TerritoryPortugal
CityLisbon
Period03/12/202406/12/2024

Keywords

  • biomarker
  • data analysis
  • metabolomics
  • multiple hypothesis correction
  • statistical test

Fingerprint

Dive into the research topics of 'Optimizing Feature Selection for Unbalanced Metabolomics Data with A Background Factor : A Comparative Study in Parkinson's Disease'. Together they form a unique fingerprint.

Cite this