Background: As a standardized vocabulary of phenotypic abnormalities associated with human diseases, the Human Phenotype Ontology (HPO) has been widely used by researchers to annotate phenotypes of genes/proteins. For saving the cost and time spent on experiments, many computational approaches have been proposed. They are able to alleviate the problem to some extent, but their performances are still far from satisfactory. Method: For inferring large-scale protein-phenotype associations, we propose HPOAnnotator that incorporates multiple Protein-Protein Interaction (PPI) information and the hierarchical structure of HPO. Specifically, we use a dual graph to regularize Non-negative Matrix Factorization (NMF) in a way that the information from different sources can be seamlessly integrated. In essence, HPOAnnotator solves the sparsity problem of a protein-phenotype association matrix by using a low-rank approximation. Results: By combining the hierarchical structure of HPO and co-annotations of proteins, our model can well capture the HPO semantic similarities. Moreover, graph Laplacian regularizations are imposed in the latent space so as to utilize multiple PPI networks. The performance of HPOAnnotator has been validated under cross-validation and independent test. Experimental results have shown that HPOAnnotator outperforms the competing methods significantly. Conclusions: Through extensive comparisons with the state-of-the-art methods, we conclude that the proposed HPOAnnotator is able to achieve the superior performance as a result of using a low-rank approximation with a graph regularization. It is promising in that our approach can be considered as a starting point to study more efficient matrix factorization-based algorithms.
- Hierarchical structure
- Human phenotype ontology
- Low-rank approximation
- Protein-protein interaction networks