Skip to main navigation Skip to search Skip to main content

Technical note : Towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learning

  • University of Helsinki
  • Tampere University
  • Karsa Ltd.
  • Technical University of Munich
  • Munich Center for Machine Learning

Research output: Contribution to journalArticleScientificpeer-review

5 Citations (Scopus)
51 Downloads (Pure)

Abstract

Chemical ionization mass spectrometry (CIMS) is widely used in atmospheric chemistry studies. However, due to the complex interactions between reagent ions and target compounds, chemical understanding remains limited and compound identification difficult. In this study, we apply machine learning to a reference dataset of pesticides in two standard solutions to build a model that can provide insights from CIMS analyses in atmospheric science. The CIMS measurements were performed with an Orbitrap mass spectrometer coupled to a thermal desorption multi-scheme chemical ionization inlet unit (TD-MION-MS) with both negative and positive ionization modes utilizing Br-, O2-, H3O+ and (CH3)2COH+ (AceH+) as reagent ions. We then trained two machine learning methods on these data: (1) random forest (RF) for classifying if a pesticide can be detected with CIMS and (2) kernel ridge regression (KRR) for predicting the expected CIMS signals. We compared their performance on five different representations of the molecular structure: the topological fingerprint (TopFP), the molecular access system keys (MACCS), a custom descriptor based on standard molecular properties (RDKitPROP), the Coulomb matrix (CM) and the many-body tensor representation (MBTR). The results indicate that MACCS outperforms the other descriptors. Our best classification model reaches a prediction accuracy of 0.85 ± 0.02 and a receiver operating characteristic curve area of 0.91 ± 0.01. Our best regression model reaches an accuracy of 0.44 ± 0.03 logarithmic units of the signal intensity. Subsequent feature importance analysis of the classifiers reveals that the most important sub-structures are NH and OH for the negative ionization schemes and nitrogen-containing groups for the positive ionization schemes.

Original languageEnglish
Pages (from-to)685-704
Number of pages20
JournalAtmospheric Chemistry and Physics
Volume25
Issue number1
DOIs
Publication statusPublished - 17 Jan 2025
MoE publication typeA1 Journal article-refereed

Funding

We acknowledge GALAB Laboratories for providing the pesticide standards; the CSC-IT Center for Science, Finland; and the Aalto Science-IT project. Federica Bortolussi personally thanks Siddharth Iyer for the useful discussions on chemical ionization. Financial support. This research has been supported by the Research Council of Finland (grant nos. 353836, 346373 and 346377); the European Cooperation in Science and Technology (grant no. CA22154); and the European Research Council, H2020 European Research Council (grant no. 101002728). This research has been supported by the Research Council of Finland (grant nos. 353836, 346373 and 346377); the European Cooperation in Science and Technology (grant no. CA22154); and the European Research Council, H2020 European Research Council (grant no. 101002728).Open-access funding was provided by the Helsinki University Library.

Fingerprint

Dive into the research topics of 'Technical note : Towards atmospheric compound identification in chemical ionization mass spectrometry with pesticide standards and machine learning'. Together they form a unique fingerprint.
  • Science-IT

    Hakala, M. (Manager)

    School of Science

    Facility/equipment: Facility

Cite this