Abstract

Molecular representation learning is a fundamental challenge in AI-driven drug discovery, with traditional unimodal approaches relying solely on chemical structures often failing to capture the biological context necessary for accurate toxicity and activity predictions. To address this, we propose a multimodal representation learning framework that integrates molecular data with biological modalities, including morphological features from Cell Painting assays and transcriptomic profiles from the LINCS L1000 dataset. Unlike traditional approaches that require complete triplets (molecule, morphological, genomic), our model only requires paired data—(molecule-morphological) and (molecule-genomic)—making it more practical and scalable. Our approach leverages contrastive learning to align molecular representations with biological data, even in the absence of fully paired datasets. We evaluate our framework on the ChEMBL20 dataset using linear probing across 1,320 tasks, demonstrating improvements in predictive performance. By incorporating diverse biological modalities, our approach enables more robust and biologically informed molecular representations, enhancing the predictive power of AI models in drug discovery.
Original languageEnglish
Number of pages8
DOIs
Publication statusPublished - 6 Mar 2025
MoE publication typeNot Eligible
EventLearning Meaningful Representations of Life - Singapore EXPO, Signapore 486150, Signapore, Singapore
Duration: 28 Apr 202528 Apr 2025
https://www.lmrl.org/

Workshop

WorkshopLearning Meaningful Representations of Life
Abbreviated titleLMRL
Country/TerritorySingapore
CitySignapore
Period28/04/202528/04/2025
OtherWorkshop at ICLR 2025
Internet address

Funding

This study was partially funded by the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Innovative Training Network European Industrial Doctorate grant agreement No. 956832 “Advanced Machine Learning for Innovative Drug Discovery”. Further, this work was supported by the Academy of Finland Flagship program: the Finnish Center for Artifcial Intelligence FCAI. Samuel Kaski was supported by the UKRI Turing AI World-Leading Researcher Fellowship, [EP/W002973/1]

Keywords

  • multi-modal feature extraction
  • contrastive learning
  • drug design

Fingerprint

Dive into the research topics of 'Multi-Modal Representation learning for molecules'. Together they form a unique fingerprint.

Cite this