Machine Learning for Systems Pharmacology
Systems pharmacology aims to transform large-scale heterogenous clinical and biological data into actionable therapeutic strategies. This thesis develops practical machine learning frameworks that contribute to different aspects of systems pharmacology, including the determination of therapeutic drug targets for complex diseases through genome-wide association studies (GWAS), and elucidation of molecular and phenotypic drug responses. GWAS has identified thousands of associations between genetic variants (genotype) and disease traits (phenotype), many of which can be used to prioritise the corresponding gene products as potential drug targets. Further discoveries will likely be unveiled with larger experimental sample sizes and analysing multiple variants and disease traits together (i.e. multivariate analysis), instead of the standard association testing between each variant and trait separately (i.e. univariate analysis). However, the current challenges of GWAS include modest sample sizes of separate study cohorts and restricted access to individual-level data across the cohorts for the multivariate meta-analysis. Machine learning methods provide a cost-effective and complementary approach to experimental drug bioactivity profiling, including elucidation of both direct interaction partners and overall phenotypic responses of drugs. Recently, especially kernel-based methods have received significant attention in pharmacology offering, among others, the advantage of modelling the nonlinearities between chemical and genomic features and drug bioactivity profiles. The main contributions of this thesis are as follows. We developed metaCCA, a framework for the multivariate meta-analysis of GWAS that extends canonical correlation analysis to the setting where individual-level genotype and phenotype data are not available. metaCCA is the first summary statistics-based method that allows testing for associations between multiple genetic variants and multiple traits. It holds a great potential to identify novel multivariate signals from already published univariate results of individual study cohorts. Further, we demonstrated that kernel regression model offers practical benefits for probing novel insights into the mode of action of new drug candidates. Importantly, we predicted and experimentally validated four novel off-targets of an investigational drug tivozanib. Motivated by these results, we extended the model to take advantage of various chemical and genomic information sources simultaneously. In particular, we developed pairwiseMKL, the first time- and memory-efficient method for learning with multiple pairwise kernels constructed using various data sources. pairwiseMKL is well-suited for predictive modelling of both molecular and phenotypic drug response profiles. Finally, we systematically examined transcriptional signatures of Mycobacterium tuberculosis extracted from patients before and during drug therapy, and we demonstrated their power in modelling early treatment efficacy.
|Tila||Julkaistu - 2018|
|OKM-julkaisutyyppi||G5 Tohtorinväitöskirja (artikkeli)|