Predicting Protein Producibility In Filamentous Fungi

Karmen Dykstra, Juho Rousu, Mikko Arvas

Research output: Working paperProfessional


In this paper we study the problem of predicting the producibility of recombinant proteins in filamentous fungi, especially T. reesei, using machine learning methods. We train supervised and semi-supervised support vector machines with protein sequences, represented by their amino acid composition as well as protein family and domain information. Our results indicate, somewhat surprisingly, that quite modest amount of proteins with experimental data are required to build a state-of-the-art classifier and that additional unlabeled sequences in semi-supervised models do not bring increased predictive performance. Our experiments in cross-species prediction show that models trained for the filamentous fungus A. niger protein dataset can be generalized to predict protein producibility in T. reesei, and vice versa, without sacrificing too much accuracy, regardless of their approximately 500 millions years of divergence. However, predictors trained on E. coli and S. cerevisiae datasets gave variable performance when applied to the filamentous fungi datasets, indicating that while protein producibility prediction can be generalized across related species, fully generic prediction tools applicable to any protein production host may not be realistic to achieve.
Original languageEnglish
PublisherCold Spring Harbor Laboratory Press
Publication statusPublished - 2017
MoE publication typeD4 Published development or research report or study


  • recombinant protein production
  • ilamentous fungi
  • machine learning


Dive into the research topics of 'Predicting Protein Producibility In Filamentous Fungi'. Together they form a unique fingerprint.

Cite this