Projekteja vuodessa
Abstrakti
Motivation: Exploring the relationship between human proteins and abnormal phenotypes is of great importance in the prevention, diagnosis and treatment of diseases. The human phenotype ontology (HPO) is a standardized vocabulary that describes the phenotype abnormalities encountered in human diseases. However, the current HPO annotations of proteins are not complete. Thus, it is important to identify missing protein-phenotype associations.
Results: We propose HPOFiller, a graph convolutional network (GCN)-based approach, for predicting missing HPO annotations. HPOFiller has two key GCN components for capturing embeddings from complex network structures: (i) S-GCN for both protein-protein interaction network and HPO semantic similarity network to utilize network weights; (ii) Bi-GCN for the protein-phenotype bipartite graph to conduct message passing between proteins and phenotypes. The core idea of HPOFiller is to repeat run these two GCN modules consecutively over the three networks, to refine the embeddings. Empirical results of extremely stringent evaluation avoiding potential information leakage including cross-validation and temporal validation demonstrates that HPOFiller significantly outperforms all other state-of-the-art methods. In particular, the ablation study shows that batch normalization contributes the most to the performance. The further examination offers literature evidence for highly ranked predictions. Finally using known disease-HPO term associations, HPOFiller could suggest promising, unknown disease-gene associations, presenting possible genetic causes of human disorders.
Results: We propose HPOFiller, a graph convolutional network (GCN)-based approach, for predicting missing HPO annotations. HPOFiller has two key GCN components for capturing embeddings from complex network structures: (i) S-GCN for both protein-protein interaction network and HPO semantic similarity network to utilize network weights; (ii) Bi-GCN for the protein-phenotype bipartite graph to conduct message passing between proteins and phenotypes. The core idea of HPOFiller is to repeat run these two GCN modules consecutively over the three networks, to refine the embeddings. Empirical results of extremely stringent evaluation avoiding potential information leakage including cross-validation and temporal validation demonstrates that HPOFiller significantly outperforms all other state-of-the-art methods. In particular, the ablation study shows that batch normalization contributes the most to the performance. The further examination offers literature evidence for highly ranked predictions. Finally using known disease-HPO term associations, HPOFiller could suggest promising, unknown disease-gene associations, presenting possible genetic causes of human disorders.
Alkuperäiskieli | Englanti |
---|---|
Sivut | 3328–3336 |
Sivumäärä | 9 |
Julkaisu | Bioinformatics |
Vuosikerta | 37 |
Numero | 19 |
DOI - pysyväislinkit | |
Tila | Julkaistu - 15 syysk. 2021 |
OKM-julkaisutyyppi | A1 Alkuperäisartikkeli tieteellisessä aikakauslehdessä |
Sormenjälki
Sukella tutkimusaiheisiin 'HPOFiller: identifying missing protein–phenotype associations by graph convolutional network'. Ne muodostavat yhdessä ainutlaatuisen sormenjäljen.Projektit
- 1 Päättynyt
-
-: Älykäs kasvintuotanto: Aineistoja integroiva koneoppiminen yhdistyy satosimulaattoreihin
Mamitsuka, H. (Vastuullinen tutkija), Nariman Zadeh, H. (Projektin jäsen), Strahl, J. (Projektin jäsen), Guvenc, B. (Projektin jäsen), Ji, S. (Projektin jäsen), Rissanen, S. (Projektin jäsen), Honkamaa, J. (Projektin jäsen), Pöllänen, A. (Projektin jäsen), Hiremath, S. (Projektin jäsen) & Ojala, F. (Projektin jäsen)
01/01/2018 → 31/12/2022
Projekti: Academy of Finland: Other research funding