Abstract
The study of aerosol formation and chemistry using machine learning is limited by the lack of molecular descriptors suited to atmospheric compounds. Interpretable models are particularly affected because they often rely on dictionary-based descriptors tied to specific molecular substructures, which currently fail to capture the full range of organic atmospheric compounds, including large, highly oxidized molecules common in the atmosphere. We introduce ATMOMACCS, an interpretable descriptor combining the 166 binary keys of the MACCS fingerprint with motifs inspired by the SIMPOL method for estimating saturation vapor pressures. We show that ATMOMACCS outperforms the RDKit topological fingerprint in kernel ridge regression models, improving predictions of saturation vapor pressures (7%, 8%, 29%, and 43% error reduction), equilibrium partition coefficients (5% and 9% error reduction), glass transition temperatures (22% error reduction), and enthalpies of vaporization (61% error reduction) on six datasets with atmospheric compounds. Feature analysis shows that saturation vapor pressure and partition coefficients are governed by carbon number and oxygen-related features, whereas other phase-transition properties (e.g., enthalpy of vaporization and glass transition temperature) depend on carbon–hydrogen bond types and the presence of heteroatoms other than oxygen. This highlights the generalizability of ATMOMACCS across different datasets and properties as an interpretable molecular descriptor.
| Original language | English |
|---|---|
| Article number | 084115 |
| Pages (from-to) | 1-19 |
| Number of pages | 19 |
| Journal | Journal of Chemical Physics |
| Volume | 164 |
| Issue number | 8 |
| DOIs | |
| Publication status | Published - 28 Feb 2026 |
| MoE publication type | A1 Journal article-refereed |
Funding
This study was supported by the Research Council of Finland through Project No. 346377, the EU COST Actions Grant Nos. CA18234 and CA22154, and the European Commission through the Marie Skłodowska-Curie Actions (MSCA) under Grant Agreement No. 101203938. We further acknowledge CSC-IT Center for Science, Finland, and the Aalto Science-IT project. The authors acknowledge Theo Kurtén for insightful discussions.
Fingerprint
Dive into the research topics of 'An interpretable molecular descriptor for machine learning predictions in atmospheric science'. Together they form a unique fingerprint.Projects
- 1 Finished
-
VILMA: Virtual laboratory for molecular level atmospheric transformations
Rinke, P. (Principal investigator), Löfgren, J. (Project Member), Lind, L. (Project Member), Laakso, J. (Project Member), Sorvisto, D. (Project Member), Henkel, P. (Project Member) & Sandström, H. (Project Member)
01/01/2022 → 31/12/2024
Project: RCF Centre of Excellence
Equipment
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver